The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • TheinsanegamerN - Thursday, June 30, 2016 - link

    So it cant to 60FPS constant at 1080p, but it CAN do 90FPS constant at 2160x1200? Did you fail math?
  • Sushisamurai - Friday, July 1, 2016 - link

    I think the point of yojimbo's post is that it should be able to hit 90fps @2160x1200 at medium to low settings. It can't hit 60FPS at ultra high settings at 1080p
  • Yojimbo - Friday, July 1, 2016 - link

    Yes exactly. Ironically, math is my area of expertise.
  • cocochanel - Thursday, June 30, 2016 - link

    You must be smarter than the engineers at AMD. They said this card was designed for VR, they would not make such a claim if the card did not deliver. 3-4x higher performance ? Where do you live ?
  • CiccioB - Friday, July 1, 2016 - link

    In a world where marketing claims results to be false for most of the times.
    Wasn't Polaris 10 going to have 2.5x efficiency gain vs GCN? An AMD engineer told that as well. And has even put it on a slide.
    I just saw 40% gain. While Pascal gained more than 60% over Maxwell. Which was still 40% better than GCN.
    If an AMD engineer tells you that this card can fly, would you accelerate the fan at the level to try that claim? You know, an engineer has told you that it can! And it was an AMD engineer, nothing less!
  • FriendlyUser - Wednesday, June 29, 2016 - link

    Perf/W would be much better if they had used GDDR5X, which they did not, for cost reasons. HBM is even more power efficient. Then you have the board itself, which probably is not as electrically sophisticated as the much more expensive nVidia 1080 board. Finally, you don't know which of the two process technologies is better for perf/W (two different foundries). In the end, I don't think the chip design is the main difference.
  • Yojimbo - Thursday, June 30, 2016 - link

    The RX 480's perf/W is really no better than the GTX 970, which uses GDDR 5 RAM like the RX 480 as well as a 28nm process compared with the 14nm process of the RX 480. I do think the architecture is the main difference. Polaris 10's architecture seems to be significantly less efficient than Maxwell's, after accounting for the advantage of the 14nm process of the RX 480. Pascal is even more efficient architecturally than Maxwell.
  • Meteor2 - Wednesday, June 29, 2016 - link

    The 1080/1070 take the performance/power crown. But the 480 comfortably takes the performance/price crown. What's interesting is that the 1080 isn't quite fast enough for AAA titles at 4K and the 1070 sits in no man's land, while the 480 runs AAA and 1080p and does VR. It's clear which option is the solid buy.
  • Yojimbo - Thursday, June 30, 2016 - link

    Yes the RX 480 will take the performance/price crown assuming supply can keep up with demand, but for how long? The GTX 1060 will be out in a month or two and be very competitive in price/performance.

    The 1080 is fast enough for AAA titles at 4K if one doesn't max out the settings. A similar thing can be said for the 1070. Also similar is RX 480's VR claim. It can only manage VR gaming when settings are not maxed out. Are you a console gamer or you just have selective memory? This paragraph should be redundant for a PC gamer.
  • Demibolt - Friday, July 1, 2016 - link

    Not here to argue, just fact checking.

    GTX 970 can be purchased for ~$240 from several online retailers (less if you get a used one from ebay). Given the close performance figures between the 2 cards and the inevitable price-drop that will happen with the GTX 970, It is objectively too soon to say the price/performance benefit of one cards beats out the other.

Log in

Don't have an account? Sign up now