PowerTune, Cont

PowerTune’s functionality is accomplished in a two-step process. The first step is defining the desired TDP of a product. Notably (and unlike NVIDIA) AMD is not using power monitoring hardware here, citing the costs of such chips and the additional design complexities they create. Instead AMD is profiling the performance of their GPUs to determine what the power consumption behavior is for each functional block. This behavior is used to assign a weighted score to each functional block, which in turn is used to establish a rough equation to find the power consumption of the GPU based on each block’s usage.

AMD doesn’t provide the precise equations used, but you can envision it looking something like this:

Power Consumption =( (shaderUsage * shaderWeight) + (ropUsage * ropWeight) + (memoryUsage * memoryWeight) ) * clockspeed

In the case of the Radeon HD 6970, the TDP is 250W, while the default clockspeed is 880MHz.

With a power equation established, AMD can then adjust GPU performance on the fly to keep power consumption under the TDP. This is accomplished by dynamically adjusting just the core clock based on GPU usage a few times a second. So long as power consumption stays under 250W the 6970 stays at 880MHz, and if power consumption exceeds 250W then the core clock will be brought down to keep power usage in check.

It’s worth noting that in practice the core clock and power usage do not have a linear relationship, so PowerTune may have to drop the core clock by quite a bit in order to maintain its power target. The memory clock and even the core voltage remain unchanged (these are only set with PowerPlay states), so PowerTune only has the core clock to work with.

Ultimately PowerTune is going to fundamentally change how we measure and classify AMD’s GPUs. With PowerTune the TDP really is the TDP; as a completely game/application agonistic way of measuring and containing power consumption, it’s simply not possible to exceed the TDP. The power consumption of the average game is still below the TDP – sometimes well below – so there’s still an average case and a worst case scenario to discuss, but the range between them just got much smaller.

Furthermore as a result, real world performance is going to differ from theoretical performance that much more. Just as is the case with CPUs where the performance you get is the performance you get; teraFLOPs, cache bandwidth, and clocks alone won’t tell you everything about the performance of a product. The TDP and whether the card regularly crosses it will factor in to performance, just as how cooling factors in to CPU performance by allowing/prohibiting higher turbo modes. At least for AMD’s GPUs, we’re now going to be talking about how much performance you can get for any given TDP instead of specific clockspeeds, bringing performance per watt to the forefront of importance.

So by now you’re no doubt wondering what the impact of PowerTune is, and the short answer is that there’s virtually no impact. We’ve gone ahead and compiled a list of all the games and applications in our test suite, and whether they triggered PowerTune throttling. Of the dozen tests, only two triggered PowerTune: FurMark as expected, and Metro 2033. Furthermore as you can see there was a significant difference between the average clockspeed of our 6970 in these two situations.

AMD Radeon HD 6970 PowerTune Throttling
Game/Application Throttled?
Crysis: Warhead No
BattleForge No
Metro Yes (850Mhz)
HAWX No
Civilization V No
Bad Company 2 No
STALKER No
DiRT 2 No
Mass Effect 2 No
Wolfenstein No
3DMark Vantage Yes
MediaEspresso 6 No
Unigine Heaven No
FurMark Yes (600MHz)
Distributed.net Client No

In the case of Metro the average clockspeed was 850MHz; Metro spent 95% of the time running at 880MHz, and only at a couple of points did the core clock drop to around 700MHz. Conversely FurMark, a known outlier, drove the average core clock down to 600MHz for a 30% reduction in the core clock. So while PowerTune definitely had an impact on FurMark performance it did almost nothing to Metro, never mind any other game/application. To illustrate the point, here are our Metro numbers with and without PowerTune.

Radeon HD 6970: Metro 2033 Performance
PowerTune 250W PowerTune 300W
2560x1600 25.5 26
1920x1200 39 39.5
1680x1050 64.5 65

The difference is no more than .5fps on average, which may as well be within our experimental error range for this benchmark. For everything we’ve tested on the 6970 and the 6950, the default PowerTune settings do not have a meaningful performance impact on any game or application we test. Thus at this point we’re confident that there are no immediate drawbacks to PowerTune for desktop use.

Ultimately this is a negative feedback mechanism, unlike Turbo which is a positive feedback mechanism. Without overclocking the best a 6970 will run at is 880MHz, whereas Turbo would increase clockspeeds when conditions allow. Neither one is absolutely the right way to do things, but there’s a very different perception when performance is taken away, versus when performance is “added” for free. I absolutely like where this is going – both as a hardware reviewer and as a gamer – but I’d be surprised if this didn’t generate at least some level of controversy.

Finally, while we’ve looked at PowerTune in the scope of desktop usage, we’ve largely ignored other cases so far. AMD will be the first to tell you that PowerTune is more important for mobile use than it is desktop use, and mobile use is all the more important as the balance between desktops and laptops sold continues to slide towards laptops. In the mobile space not only does PowerTune mean that AMD will absolutely hit their TDPs, but it should allow them to produce mobile GPUs that come with higher stock core clocks, comfortable in the knowledge that PowerTune will keep power usage in check for the heaviest games and applications. The real story for PowerTune doesn’t even begin until 2011 – as far as the 6900 series is concerned, this may as well be a sneak peak.

Even then there’s one possible exception we’re waiting to see: 6990 (Antilles). The Radeon HD 5970 put us in an interesting spot: it was and still is the fastest card around, but unless you can take advantage of CrossFire it’s slower than a single 5870, a byproduct of the fact that AMD had to use lower core and memory clocks to make their 300W TDP. This is in stark comparison to the 4870X2, which really was 2 4870s glued together with the same single GPU performance. With PowerTune AMD doesn’t necessarily need to repeat the 5970’s castrated clocks; they could make a 6970X2, and let PowerTune clip performance as necessary to keep it under 300W. If something is being used without CrossFire for example, then there’s no reason not to run the 1 GPU at full speed. It would be the best of both worlds.

In the meantime we’re not done with PowerTune quite yet. PowerTune isn’t just something AMD can set – it’s adjustable in the Overdrive control panel too.

Redefining TDP With PowerTune Tweaking PowerTune
Comments Locked

168 Comments

View All Comments

  • AnnonymousCoward - Wednesday, December 15, 2010 - link

    First of all, 30fps is choppy as hell in a non-RTS game. ~40fps is a bare minimum, and >60fps all the time is hugely preferred since then you can also use vsync to eliminate tearing.

    Now back to my point. Your counter was "you know that non-AA will be higher than AA, so why measure it?" Is that a point? Different cards will scale differently, and seeing 2560+AA doesn't tell us the performance landscape at real-world usage which is 2560 no-AA.
  • Dug - Wednesday, December 15, 2010 - link

    Is it me, or are the graphs confusing.
    Some leave out cards on certain resolutions, but add some in others.

    It would be nice to have a dynamic graph link so we can make our own comparisons.
    Or a drop down to limit just ati, single card, etc.

    Either that or make a graph that has the cards tested at all the resolutions so there is the same number of cards in each graph.
  • benjwp - Wednesday, December 15, 2010 - link

    Hi,

    You keep using Wolfenstein as an OpenGL benchmark. But it is not. The single player portion uses Direct3D9. You can check this by checking which DLLs it loads or which functions it imports or many other ways (for example most of the idTech4 renderer debug commands no longer work).

    The multiplayer component does use OpenGL though.

    Your best bet for an OpenGL gaming benchmark is probably Enemy Territory Quake Wars.
  • Ryan Smith - Wednesday, December 15, 2010 - link

    We use WolfMP, not WolfSP (you can't record or playback timedemos in SP).
  • 7Enigma - Wednesday, December 15, 2010 - link

    Hi Ryan,

    What benchmark do you use for the noise testing? Is it Crysis or Furmark? Along the same line of questioning I do not think you can use Furmark in the way you have the graph setup because it looks like you have left Powertune on (which will throttle the power consumption) while using numbers from NVIDIA's cards where you have faked the drivers into not throttling. I understand one is a program cheat and another a TDP limitation, but it seems a bit wrong to not compare them in the unmodified position (or VERBALLY mention this had no bearing on the test and they should not be compared).

    Overall nice review, but the new cards are pretty underwhelming IMO.
  • Ryan Smith - Thursday, December 16, 2010 - link

    Hi 7Enigma;

    For noise testing it's FurMark. As is the case with the rest of our power/temp/noise benchmarks, we want to establish the worst case scenario for these products and compare them along those lines. So the noise results you see are derived from the same tests we do for temperatures and power draw.

    And yes, we did leave PowerTune at its default settings. How we test power/temp/noise is one of the things PowerTune made us reevaluate. Our decision is that we'll continue to use whatever method generates the worst case scenario for that card at default settings. For NVIDIA's GTX 500 series, this means disabling OCP because NVIDIA only clamps FurMark/OCCT, and to a level below most games at that. Other games like Program X that we used in the initial GTX 580 article clearly establish that power/temp/noise can and do get much worse than what Crysis or clamped FurMark will show you.

    As for the AMD cards the situation is much more straightforward: PowerTune clamps everything blindly. We still use FurMark because it generates the highest load we can find (even with it being reduced by over 200MHz), however because PowerTune clamps everything, our FurMark results are the worst case scenario for that card. Absolutely nothing will generate a significantly higher load - PowerTune won't allow it. So we consider it accurate for the purposes of establishing the worst case scenario for noise.

    In the long run this means that results will come down as newer cards implement this kind of technology, but then that's the advantage of such technology: there's no way to make the card louder without playing wit the card's settings. For the next iteration of the benchmark suite we will likely implement a game-based noise test, even though technologies like PowerTune are reducing the dynamic range.

    In conclusion: we use FurMark, we will disable any TDP limiting technology that discriminates based on the program type or is based on a known program list, and we will allow any TDP limiting technology that blindly establishes a firm TDP cap for all programs and games.

    -Thanks
    Ryan Smith
  • 7Enigma - Friday, December 17, 2010 - link

    Thanks for the response Ryan! I expected it to be lost in the slew of other posts. I highly recommend (as you mentioned in your second to last paragraph) that a game-based benchmark is used along with the Furmark for power/noise. Until both adopt the same TDP limitation it's going to put the NVIDIA cards in a bad light when comparisons are made. This could be seen as a legitimate beef for the fanboys/trolls, and we all know the less ammunition they have the better. :)

    Also to prevent future confusion it would be nice to have what program you are using for the power draw/noise/heat IN the graph title itself. Just something as simple as "GPU Temperature (Furmark-Load)" would make it instantly understandable.

    Thanks again for the very detailed review (on 1 week nonetheless!)
  • Hrel - Wednesday, December 15, 2010 - link

    I really hope these architexture changes lead to better minimum FPS results. AMD is ALWAYS behind Nvidia on minimum FPS and in many ways that's the most important measurment since min FPS determines if the game is playable or not. I dont' care if it maxes out 122 FPS if when the shit hits the fan I get 15 FPS, I won't be able to accurately hit anything.
  • Soldier1969 - Wednesday, December 15, 2010 - link

    I'm dissapointed in the 6970, its not what I was expecting over my 5870. I will wait to see what the 6990 brings to the table next month. I'm looking for a 30-40% boost from my 5870 at 2560 x 1600 res I game at.
  • stangflyer - Wednesday, December 15, 2010 - link

    Now that we see the power requirements for the 6970 and that it needs more power than the 5870 how would they make a 6990 without really cutting off the performance like the 5970?

    I had a 5970 for a year b4 selling it 3 weeks ago in preparation of getting 570 in sli or 6990.
    It would obviously have to be 2x8 pin power! Or they would have to really use that powertune feature.

    I liked my 5970 as I didn't have the stuttering issues (or i don't notice them) And actually have no issues with eyefinity as i have matching dell monitors with native dp inputs.

    If I was only on one screen I would not even be thinking upgrade but the vram runs out when using aa or keeping settings high as I play at 5040x1050. That is the only reason I am a little shy of getting the 570 in sli.

    Don't see how they can make a 6990 without really killing the performance of it.

    I used my 5970 at 5870 and beyond speeds on games all the time though.

Log in

Don't have an account? Sign up now