The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • Meteor2 - Thursday, June 30, 2016 - link

    Well, if you only want to spend $100 on the CPU and $199 on the GPU, I can...
  • Hrel - Thursday, June 30, 2016 - link

    Yeah, I can afford better. Sorry you can't yet, keep working at it!
  • fanofanand - Thursday, June 30, 2016 - link

    Arrogance doesn't play well here Captain GiantWallet. Price is a consideration for 99.9% of consumers, STFU with your one-person use case.
  • praeses - Wednesday, June 29, 2016 - link

    Seems like the RX480 should have only come with 8ghz 4GB of ram which would have yielded a slight power efficiency increase and cost reduction to move from 6pin/6phase to 8 and a better cooler. 6 should have been left for the RX470. I think marketing must have got in the way again.
  • fanofanand - Thursday, June 30, 2016 - link

    GDDR5X is still expensive. At $200 some concessions had to be made, nothing to do with marketing.
  • tipoo - Wednesday, June 29, 2016 - link

    Unfortunately for AMD, they're on completely different fabs this time than Nvidia, Glofo vs tsmc. I've wondered if that's part of their efficiency disadvantage. We've seen this with the 6S load testing. That's the thing now, with different fabs, the playing field is not even, and not only does the architecture matter, but the fab process does too when comparing them to Nvidia. Which kind of sucks for AMD.

    With the iPhone it mattered less because it's mostly idle, even with the screen on, but for a high performance GPU it's the full throttle aspect that matters.

    Interesting though that TSMC will still make their high end parts (I don't know if that means just Vega, or the 300 dollar Polaris too), so maybe it's not all lost on the efficiency side if the fabs are to blame.

    I think this is to fuffil the WSA, makes sense, higher end part gets the higher end fab, the 200 dollar part isn't particularly efficient but they hit this performance and price.

    They even switched Zen to TSMC after Glofo efficiency concerns.

    So I do have hope that the more expensive TSMC parts will provide them much needed efficiency to go up against Nvidias higher end, and hopefully it doesn't mean Polaris as a whole is just inefficient.
  • T1beriu - Wednesday, June 29, 2016 - link

    1. The $300 Polaris is AIB RX480. There are no faster Polaris chips coming confirmed by Raja.

    2. Zen will not be built by TSMC. This was a fake rumor. GloFlo announced they're working on Zen. Source: http://www.extremetech.com/computing/217664-global...

    3. A couple of months back TSMC released the list of partners building chips on 16nm. AMD wasn't on that list.
  • vladpetric - Wednesday, June 29, 2016 - link

    It seems to me that the leadership of AMD still doesn't get it that "drivers matter" ... While NVidia does not generally make more computationally powerful cards, they spend a lot of resources on good drivers.

    AMD as we know it today is the marriage of two hardware-first companies (old AMD and ATI). The sad part is that after losing a lot of marketshare, market cap, etc over the last decade, good software is still a second class concern for them.
  • K_Space - Saturday, July 9, 2016 - link

    I'm not sure what world you've been living on but RTG drivers have been head & shoulder above anything ATI or even 'old AMD Radeon' delivered. Even old GCN cards continue to benefit from these long after their sell by date.
  • tynopik - Wednesday, June 29, 2016 - link

    pg1: comfortable reach it > comfortably

Log in

Don't have an account? Sign up now