The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • ffleader1 - Thursday, June 30, 2016 - link

    Can you maybe redo the test with updated driver.
    It seems that newer driver really makes a HUGE difference.
    https://www.reddit.com/r/Amd/comments/4qiffg/rx_48...
  • Ryan Smith - Friday, July 1, 2016 - link

    This was done with the latest driver to begin with: 16.6.2. The press was not distributed any other driver AFAIK.
  • Falko83 - Thursday, June 30, 2016 - link

    I hope that in the review you can also address 1080p 144hz gaming.
  • davide445 - Thursday, June 30, 2016 - link

    "Relative to last-generation mainstream cards like the GTX 960 or the Radeon R9 380, with the Radeon RX 480 we’re looking at performance gains anywhere between 45% and 70%, depending on the card, the games, and the memory configuration. As the mainstream market was last refreshed less than 18 months ago, the RX 480 generally isn’t enough to justify an upgrade"
    This is something I didn't understand. 45-70% performance increase it's not enough for a card that request less power and cost less?
    Also where is the problem with the fact the power consumption is above equivalent Nvidia? At 150w I doubt anyone upgrading need a new PSU. I suppose below a threshold power request it's not part of the game anymore, otherwise next request will be that a GPU will produce power and not need some....
  • dragonsqrrl - Thursday, June 30, 2016 - link

    Ryan addressed some of this in response to an earlier comment:

    "Although it's not an objective metric, generally I'm looking for an average 65%+ performance increase to justify replacing a video card. Against 380 in particular, 480 doesn't quite reach that mark."

    150W TDP wouldn't be a problem in a vacuum, but unfortunately AMD is competing against Nvidia, not just at the midrange, but the whole product stack including the high-end. That means AMD is going to have more trouble scaling performance this coming generation, not unlike the last. In modern microarchitectures performance and efficiency are basically the same thing because we've hit the TDP ceiling for these form factors. I think Ryan actually explained it quite elegantly:

    "power efficiency and overall performance are two sides of the same coin. There are practical limits for how much power can be dissipated in different card form factors, so the greater the efficiency, the greater the performance at a specific form factor. This aspect is even more important in the notebook space, where GPUs are at the mercy of limited cooling and there is a hard ceiling on heat dissipation."
  • davide445 - Thursday, June 30, 2016 - link

    Maybe I'm asking too much objectivity. Asking for +65% for performance increase is nothing without a specific reason. 960 vs 760 show a 6% increase in 3dmark performances.760 vs 660 show a 20% increase. 660 vs 560 show a 55% increase (data from GPU boss). So why a 40-70% (average 55%) is not enough? It's the same you can achieve with two generations of Nvidia equivalent.
    About TDP ok for competing, but what I'm saying is it's not more relevant. Ok when AMD GPU was 200-250w hungry and Nvidia equivalent was 100w less, but with 150w really make difference for anyone for maybe 50w? Also didn't understand about the whole stack competition, we need to compare same price level: 480 vs 380 or 390 or 960 or 970, 1070 is available on Newegg for $430 min, 1060 is not available at all.
  • dragonsqrrl - Thursday, June 30, 2016 - link

    It sounds like you're under the impression he recommended upgrading from a 760 to a 960, he did not.

    "Also didn't understand about the whole stack competition"

    Then am I correct in assuming you don't understand the significance of TDP ceilings for a given form factor? If you have a less efficient architecture, you're going to have a less performant GPU once you approach that ceiling. It's not as concrete for desktop discrete GPUs as it is for notebooks, but there are still upper limits, which tend to be around 250-300W. AMD can always price competitively, but the last thing they need right now is a repeat of last gen. They need competitive 'products' in their lineup, and for AMD, in terms of financial and competitive viability, that means a lot more than just price/performance ratios. This has less to do with the RX480 right now and more to do with prospects for the rest of the generation.
  • Ananke - Thursday, June 30, 2016 - link

    Let me simplify for you: The question is "What can I get for $200?". RX480, RX380, GT960, GT950.
    Very simple choice at the moment.
  • dragonsqrrl - Thursday, June 30, 2016 - link

    @Ananke

    Yes, yes it is. But if you read the last sentence in my previous comment, and the rest of my responses the the thread, you'd hopefully realize that's besides the point.
  • davide445 - Thursday, June 30, 2016 - link

    I suppose depends what these reviews are expected for. IMHO (and of course this is my personal opinion) these need to be guides useful for informed purchase, similar to what you did expect from stock investment advice. So you need to be specific about your assumptions and audience.
    These cards are not for mobile, so mobile is not in the equation. We are discussing about a specific card, not a corporate or tech startegy. We are discussing about mostly gaming or DX performances, so the audience are using a PC and didn't consider enough integrated GPU. So need to decide mostly if upgrade, not to purchase the first one considering how PC market is going. So the decision this review need to address is what GPU on the market this people need to purchase, considering a target price and average PC specs.

Log in

Don't have an account? Sign up now