The Polaris Architecture: In Brief

For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.

In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.

At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.

Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.

Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.

Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.

Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.

Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.

AMD's Path to Polaris Gaming Performance
Comments Locked

449 Comments

View All Comments

  • FriendlyUser - Wednesday, June 29, 2016 - link

    Warframe is very, very light on the GPU. I get ~100fps at 1440p with a much older card and almost everything maxed. Try Witcher 3 for a challenge at 4k.
  • Murloc - Tuesday, July 5, 2016 - link

    I can play age of empires 2 @4K on a gtx 275 get on my level
  • Questor - Wednesday, June 29, 2016 - link

    Bandwagon much? One picture and you are already condemning a product that hasn't had a fair chance. You hurt yourself in the long when you subscribe to bandwagon jumping by spreading fanboy-ship; opinions not based on a clear factual completeness, but rather a possible detractor that is as yet unproven across the entirety of the products. Competition serves all of us. It brings prices more under control and forces innovation.
  • mikato - Friday, July 29, 2016 - link

    "It's a terrible product. Look at the temps."

    I don't know about everyone else, but I don't buy my GPUs based on thermal images and point temps. Amirite?
  • poohbear - Wednesday, June 29, 2016 - link

    How is it a bit disappointing??? do you really think most of us are running GTX970s?? The vast majority of people have gtx950 class cards, and this would be a nice step up considering the price.
  • sharath.naik - Wednesday, June 29, 2016 - link

    Its disappointing because, 970 can overclock 10-15% sometimes more. You need to look at the thermals to understand that these are like already overclocked from the factory and cannot do more.
  • smartthanyou - Wednesday, June 29, 2016 - link

    In no situation will a 10-15% overclock ever produce a performance difference that an end user would notice. In a benchmark application? Sure, numbers will increase but frames in a game will not increase to a point to make a difference.

    Overclocking 10-15% in almost all cases is pointless.
  • FriendlyUser - Wednesday, June 29, 2016 - link

    All these are reference and have zero electrical margin for overclock. Reviews have shown that the board uses all juice it can and is almost constantly at the limit (or over) of the PCIe slot power delivery! You will only be able to judge overclock in cards with more complicated designs. The chip itself is probably quite variable, being the first of the 14nm AMD generation. Some will overclock well, others wont.
  • wumpus - Wednesday, June 29, 2016 - link

    Last I heard, 970 was at the top of the steam surveys (I won't enable whatever kludge they wanted to find out). It isn't a bad goal, but my confidence in AMD shipping it to newegg faster than nvidia can ship an as of yet hypothetical 1060 isn't all that great. Assuming they do, it doesn't really mean they have a long window of "the card to buy at ~$200".

    A bigger worry is how many of those 970s are going to be hitting the market. Until AMD can claw back some marketshare, there could easily be a used 970 for every new 480 buyer out there. And this is coming from someone who had been assuming that I would get a 390 (or two) and DYI some watercooling for an ideal VR rig (before prices skyrocketed. I'm guessing lose the watercooling and go with nvidia once both VR and nvidia 14nm prices come back to Earth). This card isn't helping AMD all that much.
  • lunarmit - Wednesday, June 29, 2016 - link

    It is, but it's just one card. Add in 1% for the 980, and 1% for the 980 ti and you have ~90% of the cards are powered below that once you factor in the AMD comparables.

Log in

Don't have an account? Sign up now