The New PowerTune: Adding Further States

In 2010 AMD introduced their PowerTune technology alongside their Cayman GPU. PowerTune was a new, advanced method of managing GPU voltages and clockspeeds, with the goal of offering better control over power consumption at all times so that AMD could be more aggressive with their clockspeeds. PowerTune’s primary task was to reign in on programs like FurMark – power viruses as AMD calls them – so that these programs would not push a card past its thermal/electrical limits. Consequently, with PowerTune in place AMD would not need to set their maximum GPU clocks as conservatively merely to handle the power virus scenario.

This technology was brought forward for the entire Southern Islands family of GPUs, and remained virtually unchanged. PowerTune as implemented on SI cards without Boost had 3 states – idle, intermediate (low-3D), and high (full-3D). When for whatever reason PowerTune needed to clamp down on power usage to stay within the designated limits, it could either jump states or merely turn down the clockspeed, depending on how far over the limit the card was trying to go. In practice state jumps were rare – it’s a big gap between high and intermediate – so for non-boost cards it would merely turn down the GPU clockspeed until power consumption was where it needed to be.

Modulating clockspeeds in such a manner is a relatively easy thing to implement, but it’s not without its drawbacks. That drawback being that semiconductor power consumption scales at a far greater rate with voltage than it does with clockspeed. So although turning down clockspeeds does reduce power consumption, it doesn’t do so by a large degree. If you want big power savings, you need to turn down the voltage too.

Starting with 7790 and Bonaire, this is exactly what AMD is doing. Gone is pure clockspeed modulation – inferred states in AMD’s nomenclature – and instead AMD is moving to using a larger number of full states. GCN 1.1 has 8 states altogether, with no inferred states between them. With this change, when PowerTune needs to reduce clockspeeds it can drop to a nearby state, reducing power consumption through both clockspeed and voltage reductions at the same time.

With this change state jumping will also be a far more frequent occurrence. The lack of intermediate states and the lack of granularity (8 states over 700MHz is not fine-grained) effectively makes fast state jumping a requirement, as there’s a very good chance dropping down a state will leave some power/performance on the table. So if it’s throttling, 7790 will be able to state jump as quickly as every 10ms (that’s 100 jumps a second), typically bouncing between two or more states in order to keep the card within its limits.

At the same time, AMD’s formula for picking states on non-boost cards has changed. In a move similar to what AMD has done with Richland, AMD’s temperature-agnostic state selection system has been ditched in favor of one that includes temperatures into the calculation, making it a system that is now based on power, temperature, and load. There are some minor benefits to being temperature-agnostic that AMD is giving up – mainly that performance is going to vary a bit with temperature now – but at the end of the day this allows AMD to better min-max their GPUs to hit higher frequencies more often. This also brings them to parity with Intel and NVIDIA, who have long taken temperature into account.

The fact that this is a very boost-like system is not lost on us, and with these changes the line between PowerTune with and without boost starts to become foggy. Both are ultimately going to be doing the same thing – switching states based on power and temperature considerations – the only difference being whether a card adjusts down, or if it adjusts both up and down. In practice we rarely see cards adjust down outside of FurMark, so while PowerTune doesn’t dictate a clockspeed floor, base clocks are still base clocks. In which case the practical difference between whether an AMD card has boost or not is whether it can access some higher voltage, higher clockspeed states that it may not be able to maintain for long periods of time across all workloads. The 7790 isn’t a boost part of course, but AMD’s own presentation neatly lays out where boost would fit in, so if we do see future GCN 1.1 products with boost we have a good idea of what to expect.

Moving on, with the changes to PowerTune will also come changes to AMD’s API for 3rd party utilities, and what information is reported. First and foremost, due to the frequency of state changes with the new PowerTune, AMD will no longer be reporting the instantaneous state. Instead they will be reporting an average of the states used. We don’t know how big the averaging window is – we suspect it’s no more than 2 seconds – but the end result will be that MSI Afterburner, GPU-Z, and other utilities will now see those averages reported as the clockspeed. This will give most users a better idea of what the effective clockspeed (and thereby effective performance) is, but it does mean that it’s going to be virtually impossible to infer the clockspeeds/voltages of AMD’s new states.

The other change is that with the new PowerTune AMD will be exposing new tweaking options to 3rd parties. The current PowerTune (TDP) setting is going to be joined by a separate setting for adjusting a limit called Total Design Current (TDC), which as the name implies is how much current is allowed to be passed into the GPU. AMD limits cards by both TDP and TDC to keep total power, temperatures, and total currents in check, so this will open up the latter to tweakers. Unfortunately utilities with TDC controls were not ready in time for our 7790 review, so we can’t really comment on TDC at this time. With AMD’s changes to PowerTune however (and their insistence on calling TDP thermal management), TDP may be turning into a temperature control while TDC becomes the new power control.

Finally, since these controls are going to be user-accessible, this will spill-over to AMD’s partners. Partners will be able to set their own TDP and TDC limits if they wish, which will help them fine-tune their factory overclocked cards. This will give partners more headroom for such cards as opposed to being stuck shipping cards at AMD’s reference limits, but it means that different cards from different vendors may have different base TDP and TDC limits, along with different clockspeeds. This also means that in the future equalizing clockspeeds may not be enough to equalize two cards.

Bonaire’s Microarchitecture - What We’re Calling GCN 1.1 Meet The Radeon HD 7790 & Sapphire HD 7790 Dual-X Turbo
Comments Locked

107 Comments

View All Comments

  • Spunjji - Friday, March 22, 2013 - link

    ...forgive my stupidity. Actual figures of the 7790 here:
    http://www.techpowerup.com/reviews/Sapphire/HD_779...

    Depends on whether we focus on Peak / Max figures to decide whether you or I am closer to the truth. :)
  • Ryan Smith - Friday, March 22, 2013 - link

    Typical Board Power, not Total. TBP is an average rather than a peak like TDP, which is why it's a lower number than TDP.
  • dbcoopernz - Friday, March 22, 2013 - link

    Any details on UVD module? Any changes?

    The Asus Direct Cu-II might make an interesting high power but quiet HTPC card. Any chance of a review?
  • Ryan Smith - Friday, March 22, 2013 - link

    There are no changes that we have been made aware of.
  • haplo602 - Friday, March 22, 2013 - link

    somebody please make this a single slot card and I am sold ... otherwise I'll wait for the 8k radeons ...
  • Shut up and drink - Friday, March 22, 2013 - link

    Has it occurred to anyone else that this is in all probability an OEM release of the "semi-custom" silicon that will find its way into Sony's Playstation 4 in the fall?

    Word has it that Sony has some form of GPU switching tech integrated into the PS4.

    - apologies for the link to something other than Anand but I don't think they ran anything on the story http://www.tomshardware.com/news/sony-ps4-patent-p...

    Initially I presumed this to be some "Optimus"-esque dynamic context switching power saving routine. However, the patent explicitly states, "This architecture lets a user run one or more GPUs in parallel, but only for the purpose of increasing performance, not to reduce power consumption."
    Which struck me as some kind of expansion on the nebulous "hybrid crossfire" tech that AMD has been playing w/since they birthed the 3000 series 780G igpu

    Based off of AMD's previous endeavors in this area on the PC side I would be skeptical of the benefits/merit of pairing the comparatively anemic iGPU's of Kabini w/a presumably Bonaire derived GPU.
    As an aside; since SLI/CFX work by issuing frames to the next GPU available, if one GPU is substantially faster than the other(s), frames get finished out-of-order and the IGP/slower-GPU's tardy frames simply get dropped which may make the final rendered video stuttery/choppy.

    Pairing an IGP with a disproportionately powerful discrete GPU simply does not work for realtime rendering.

    It is certainly possible that with the static nature of the console and perhaps especially the unified nature of the GDDR5 memory pool/bank that performance gains could be had

    However, my digression on the merits of the tech thus far is
    128 + 128 = 256 + 896 = Anand's own deduction of 1152sp's)
  • Shut up and drink - Friday, March 22, 2013 - link

    I pushed submit by mistake...damn...

    oh well...my last point of arithmetic was simply that 1 fully enabled 4 core Kabini's I'm suspecting would have a 128 shader count igpu. Factor in the much ballyhooed 8-core Cpu in the PS4 we would have two Kabini's (128+128=256) + a Bonaire derived 896sp GPU all on some kind of custom MCM style packaging "semi-custom APU" (rumor had it that the majority of Sony's R&D contributions were in the stacking/packaging dept.)

    Anyone concur?
  • Shut up and drink - Friday, March 22, 2013 - link

    ...which jives w/Anand's own piece that ran on the console's unveiling, "Sony claims the GPU features 18 compute units, which if this is GCN based we'd be looking at 1152 SPs and 72 texture units"

    http://www.anandtech.com/show/6770/sony-announces-...
  • A5 - Friday, March 22, 2013 - link

    Yeah, once this came in at 14 CUs with minor architecture changes, it seemed like a likely scenario to me.

    Obviously it isn't going to give you PS4 performance on ports with only 1GB of memory, though.
  • crimson117 - Friday, March 22, 2013 - link

    Good thought, but I sure hope Sony doesn't hamstring its PS4 with a 128-bit memory bus!

Log in

Don't have an account? Sign up now