Polaris Refined: Better Fab Yields & a New Memory State

One of the components of AMD’s marketing angle for the Radeon RX 500 series is that it’s Polaris Refined. These are still the same Polaris 10 GPUs as in the Radeon RX 400 series, but AMD wants to highlight the small improvements that they have made/gained in the past year. For RX 400 owners this doesn’t amount to much – your cards are still shiny and chrome – but it serves to help differentiate the new cards from the old cards. And for the owners of cards like the R9 280 and 380 series that AMD is trying to reach and convince them to upgrade, it’s the justification for why AMD thinks they should want to upgrade now after having passed on the RX 400 series.

There are two elements to Polaris Refined: silicon improvements, and a new memory clock state. The former in turn is comprised of both the benefits in improving fab yields and quality that AMD has enjoyed over the past year, and a new revision of Polaris 10.

In the case of fab yields, all of the revised Polaris chips are being manufactured on what AMD is calling the “Latest Generation FinFET 14” process. This is a bit of a mouthful, but in short it’s AMD calling attention to the improvements partners GlobalFoundries and Samsung have made to their 14nm LPP processes in the last year. Yields are up and overall chip quality is better, which improves the average performance (clockspeed & power) characteristics of the chips. Both foundries have also been making other undisclosed, small tweaks to their lines to further boost chip quality. It’s not a new fab process (it’s still 14nm LPP) but it’s an improvement over where Polaris 10 production started nearly a year ago.

Typically these kinds of yearly gains would simply be rolled into a product line without any fanfare – these improvements are gradual over time anyhow, not a binary event – but for the RX 500 series AMD wants to call attention to them to explain why clockspeeds are improved versus the RX 400 series cards released last year. Though to be clear here, the difference isn’t dramatic; the gains from a year’s optimization to a manufacturing line are a fraction of a full node improvement.

Meanwhile AMD is also releasing a new revision of Polaris 10, which is being used in the RX 580/570 launch. These revised chips have received further tweaking to reach higher clockspeeds, allowing AMD to reliably clock up a bit higher and/or reduce power consumption a bit. The new revision also fixes a couple of minor issues with the GPUs. Specifically, AMD is adding a new mid-power memory clock state so that applications that require memory clocks faster than idle – primarily mixed-resolution multi-monitor and video decoding – no longer cause the memory to clock up to its most power-demanding speeds, keeping overall power consumption down.

One thing to note here is that while AMD’s chip quality has improved here though the combination of manufacturing improvements and revised silicon, for the desktop AMD is investing all of those gains into improving clockspeeds. This is why the TBPs have gone up by 30-35W over the RX 480 and RX 470.

Power Consumption: By the Numbers

Since all of AMD’s optimizations are focused on bringing down power consumption, let’s take a look at that now. There are a few different things we can look at here, and I’ll start with what’s probably the most burning question: just how much better is the new revision of Polaris 10 over the old revision?

To test this, I’ve taken the Radeon RX 580 sample AMD sent over – PowerColor’s Red Devil RX 580 – and underclocked it to the same clockspeeds as the RX 480. It should be noted however that this process is a bit more complex than just underclocking to the RX 480’s official boost clock of 1266MHz. Because the RX 480 power throttles under both FurMark and Crysis 3, it’s necessary to match the RX 480’s specific clockspeeds in those scenarios.

After doing so, what we find are mixed results.

Load Power Testing, Normalized GPU Clockspeeds (Power Draw at Wall)
Crysis 3
Radeon RX 480 231W 301W
Radeon RX 580 205W 314W

Even after dialing the RX 580 down to 1230MHz for Crysis 3 to match the reference RX 480, power consumption at the wall is still 11W higher than the RX 480. Performance is the same, so the RX 580 isn’t doing more work, but none the less power system at a system level is still a bit higher.

On the other hand, turning the RX 580 down to 740MHz to match the RX 480 on FurMark (power viruses cause significant throttling), we find the RX 580 ahead by a rather shocking 26W. Power consumption at the wall is 205W, versus 231W for the RX 480.

Broadly speaking, although FurMark isn’t always the best tool for load power measurement on cross-vendor cards, it has proven to be very reliable when looking at cards based on the same architecture. It suffers from a very specific limitation: it will push a card to its TDP limit, and this can vary among manufacturers, but even with that it typically gives you a consistent and sane metric to compare like-cards.

Consequently I tend to favor the FurMark numbers here. However it doesn’t change the fact that power consumption numbers under Crysis 3 are wildly different, and paint the RX 580 as being worst. So they can’t both be right, can they?

As it stands, I suspect we’re getting into the area of random variation – with a sample size of 1 on each Radeon card, the random variations in quality from GPU to GPU are downing out the actual data. It’s entirely possible we’re looking at a worse-than-average RX 480 and a better-than-average RX 580, especially as the latter has been binned for factory overclocking. However I’m not ready to rule out that something more complex may be going on here: that the improvements Polaris 10’s power curve aren’t linear/consistent. It may be that AMD’s greatest gains are at lower clockspeeds and voltages, and that those improvements taper off at higher clockspeeds and voltages.

But for the moment, I’m ruling it a push. The FurMark data is interesting, but without Crysis 3 being in agreement it’s not enough to say anything definitive.

That New Memory State

Finally, let’s take a look at the specific benefits AMD is touting for the new memory state that the company has included with the new Polaris 10 revision. The new mid-power state allows the memory to be clocked at 4Gbps GDDR5 on the RX 580. The other power states on the RX 580 (and the RX 480) are 1.2Gbps (idle) and 8Gbps (full load), so on the RX 480 if AMD ever needed to increase the memory clocks above idle, their only option was to go to full clocks, which on GDDR5 is relatively expensive.

The two scenarios AMD is looking to address with this new memory clock state are multi-monitor configurations and video playback. In the case of the former, mismatched monitors would require the RX 480 to go to its full memory clocks even when idling. Due to the timing differences, the higher memory clock is needed to avoid flickering. Matched monitors avoid this problem, as they have identical timings. Otherwise in the case of video playback, while AMD has their fixed function decoder to offload most of the work, it still generates a lot of video data, which can require the memory to jump to a higher clock state to keep up. Though the video playback scenario is particularly complex as the GPU clock itself can also jump up if the video decoder needs a higher performance state for itself.

Putting this to the test, I ran both the RX 480 and RX 580 through a mix of multi-monitor and video playback scenarios.

Multi-Monitor Power Testing (Power Draw at Wall)
  Single Monitor
(1080p + 1440p)
Radeon RX 480 76W 76W 100W
Radeon RX 580 74W 74W 100W
GeForce GTX 1060 6GB 73W 73W 73W

Starting with the multi-monitor testing, the results were not what I was expecting. While AMD tells me that this should trigger the new mid-power state, I haven’t been able to successfully trigger it. With matched monitors the RX 580 can go to full idle, just like the RX 480. Otherwise with mismatched monitors, it always goes to 8Gbps, skipping past 4Gbps and never returning. Even with a few different monitors, the results were always the same. Due to the quick launch I haven’t had time to further debug the issue, so I’m not sure if it’s related to the monitors or if it’s something specific to the Red Devil RX 580.

Video Playback Power Testing (Power Draw at Wall)
  Idle High Bitrate H.264 High Bitrate HEVC
Radeon RX 480 76W 125W 125W
Radeon RX 580 74W 90W 93W
GeForce GTX 1060 6GB 73W 96W 96W

On the plus side however, AMD’s new memory state worked as expected with video playback. Whereas the RX 480 would have to settle for an 8Gbps memory clock when playing back high-biterate H.264 and HEVC video in Media Player Classic – Home Cinema, the RX 580 would settle at 4Gbps. In fact the RX 580 actually performed a bit better than expected; the RX 480 would typically have to go to higher core clock speeds as well, compounding the power cost. As a result power consumption at the wall was notably lower on the RX 580 than the RX 480.

And just for reference, this is actually a bit better than the GeForce GTX 1060 6GB. NVIDIA’s midrange card goes to its maximum memory clock in the same tests, and as a result power consumption at the wall was a few watts higher than the RX 580.

The AMD Radeon RX 580 & RX 570 Review Meet the Cards: PowerColor Red Devil RX 580 & Sapphire Nitro+ RX 570


View All Comments

  • CiccioB - Thursday, April 20, 2017 - link

    Yes, better in the few DX12 games optimized for AMD architecture. Where it gains at most 10%... yes, a really selling point up to now, until real DX12 games with no ad-hoc AMD optimization will be released making many user wake up from their wet dreams. Reply
  • Outlander_04 - Thursday, April 20, 2017 - link

    Its not optimization its asynchronous compute . The nVidia architecture cant do it and will never be able to keep up in DX12 Reply
  • tipoo - Thursday, April 20, 2017 - link

    Define "can't do it". Pascal does async, just not with per-clock interleaving like AMD Reply
  • Outlander_04 - Thursday, April 20, 2017 - link

    Then it is not asynchronous which quite literally means "at the same time".
    AMD's compute strength is well established by the legions of people who wisely use their cards for bitcoin mining .
  • CiccioB - Friday, April 21, 2017 - link

    Async doesn't really mean "at the same time" at all.
    Possibly, the opposite.
  • CiccioB - Thursday, April 20, 2017 - link

    No optimizations?
    Tell me why DICE's engine runs better on AMD GPUs even in DX11 while all other engines do not.
    Async in DX11? A miracle that suddenly allowed AMD drivers to pass nvidia one in draw calls? Better geometry handling? Better memory and bandwith handling?
    Come on. You AMD fanboy are all looking to the first games in (pseudo) DX12 sponsored by AMD. The future ones will be different (maybe also using nvidia functionalities that AMD does not support and not biased on AMD HW.. AMD can't surely support all AAA developer for working more to use Async, which is not a free functionality, did you know? and tune it for all cards) and for the time DX12 will become mainstream Volta will be old.
    But it's nice that you all go and suggest to buy AMD HW. It should make nvidia one cheaper... should in theory,... probably you do not advertise too much as the prices keeps on staying at the high level. Please suggest to buy CrossFire solutions, so that AMD will sell double the HW and all those new AMD customers can enjoy double performance in..ermm... welll... yes, you know, DX12 does not support CF/SLI natively, so they'll happily play DX11 games at nvidia levels with their CF configurations.

    I bet the Async thing you just said was heard from an AMD friend... wasn't it?
  • Outlander_04 - Thursday, April 20, 2017 - link

    Why is game optimization in DX11 in various game engines [ which could favor either AMD or nvidia] of any relevance to me pointing out the strengths of AMD's architecture in DX12?

    Please try and address what is said, not what you want to think is said . Thanks
  • CiccioB - Friday, April 21, 2017 - link

    It's you that is looking at what you want.
    There are 2 scenarios to analyze:
    DX11 and DX12
    You just pick DX12 ignoring DX11 because it is what you want to advertise and to make you own consideration based only on what you want to see.
    I just made you notice that in DX11 the game is well optimized for AMD architecture seen the performances it obtains, performances that with respect to nvidia no other games have ever reached in DX11.
    So you can't dismiss the simple and clear assertion thati it is an AMD optimized game (engine).
    It is and DX11 demonstrates it. What you see in DX12 is what will be if ALL future games will be optimized for AMD architecture this way. Which won't happen. Other games (always supporting DX12) just shows that they can run better on nvidia HW. Both because they do not have all those work payed by AM to make the game run better on AMD HW and because not all games take advantage of the Async compute (which costs in terms of development, did you understand this or you are living in your own world of bunnies and rainbows?)

    So extrapolating that AMD work well in DX12 just by looking at one engine that is created for running better on their HW (and as I said it is a fact seen also in DX11) it is stupid and just demonstrates a pure lie.
  • Mugur - Thursday, April 20, 2017 - link

    I'm sorry to be another one that points out that the testbed is obsolete (the best approach should be 2 testbeds with i7 7700k and R7 1800X or R5 1600X) and it's missing a few new games (Doom, Battlefield 1, etc.).

    About the cards: they are ok-ish, in my opinion. Nothing spectacular, but it's still a refresh, same price or a bit lower than last year, both cool and quiet even factory overclocked. Nobody should care for a few Watts more than 1060 (which was actually warmer and noisier in the tests), as long as they have a decent PSU.

    As an owner of 2 Freesync monitors, I may go for a 580 8 GB to replace my 470 that would go into the kid's PC. After I see Vega, of course. :-)
  • CiccioB - Thursday, April 20, 2017 - link

    "Few watts"
    It uses double the power for the same work!
    And yes, a bit warmer and noisier.. it was the FE with the blower solution. Take a custom card, it will be still faster than this OC over OC sh*t and with use half the power and be much more cool with less than half the noise.

    It is fascinating to try to understand how people can justify certain incomprehensible choices.

Log in

Don't have an account? Sign up now