GPU Boost: Turbo For GPUs

Now that we’ve had a chance to take a look at the Kepler architecture, let’s jump into features. We’ll start with the feature that’s going to have the biggest impact on performance: GPU Boost.

Much like we’ve seen with CPUs in previous years, GPUs are reaching a point where performance is being limited by overall power consumption. Until the last couple of years GPU power consumption has been allowed to slowly drift up with each generation, allowing for performance to scale to an incredible degree. However for many of the same reasons NVIDIA has been focusing on efficiency in general, GPUs are being pressured to do more without consuming more.

The problem of course is compounded by the fact that there are a wide range of possible workloads for a GPU, much like there is for a CPU. With the need to design video cards around specific TDPs for both power supply and heat dissipation reasons, the goal becomes one of maximizing your performance inside of your assigned TDP.

The answer to that problem in the CPU space is turbo boosting – that is increasing the clockspeed of one or more CPU cores so long as the chip as a whole remains at or under its TDP. By using turbo, Intel and AMD have been able to both maximize the performance of lightly threaded applications by boosting a handful of cores to high speeds, and at the same time maximize heavily threaded performance by boosting a large number of cores by little to none. For virtually any CPU-bound workload the CPU can put itself into a state where the appropriate execution units are making the most of their TDP allocation.

Of course in the GPU world things aren’t that simple – for starters we don’t have a good analogy for a lightly threaded workload – but the concept is similar. GPUs need to be able to run demanding tasks such as Metro 2033 or even pathological applications like FurMark while staying within their designated TDPs, and at the same time they need to be sure to deliver good performance for compute applications and games that aren’t quite so demanding. Or put another way, tasks that are GPU limited but aren’t maxing out every aspect of the GPU need to be able to get good performance without being held back by the need to keep heavy workloads in check.

In 2010 AMD took a stab that this scenario with PowerTune, which was first introduced on the Radeon HD 6900 series. With PowerTune AMD could set their clockspeeds relatively high, and should any application demand too much of the GPU, PowerTune would throttle down the GPU in order to avoid going over its TDP. In essence with PowerTune the GPU could be clocked too high, and simply throttled down if it tried to draw too much power. This allowed lighter workloads to operate at higher clockspeeds, while keeping power consumption in check for heavy workloads.

With the introduction of Kepler NVIDIA is going to be tackling this problem for their products, and their answer is GPU Boost.

In a nutshell, GPU Boost is turbo for the GPU. With GPU Boost NVIDIA is able to increase the core clock of GTX beyond its 1006MHz base clock, and like turbo on CPUs this is based on the power load, the GPU temperature, and the overall quality of the GPU. Given the right workload the GTX 680 can boost by 100MHz or more, while under a heavy workload the GTX 680 may not move past 1006MHz.

With GPU Boost in play this adds a new wrinkle to performance of course, but ultimately there are 2 numbers to pay attention to. The first number is what NVIDIA calls the base clock: this is another name for the regular core clock, and it represents the minimum full load clock for GTX 680; when operating at its full 3D clocks, the GTX 680 will never drop below this number.

The second number is what NVIDIA calls the boost clock, and this one is far more nebulous, as it relates to the operation of GPU Boost itself. With GPU Boost NVIDIA does not have an explicit top clock; they’re letting chip quality play a significant role in GPU Boost. Because GPU Boost is based around power consumption and temperatures, higher quality GPUs that operate with lower power consumption can boost higher than lower quality GPUs with higher power consumption. In essence the quality of the chip determines its boost limit under normal circumstances.

Accordingly, the boost clock is intended to convey what kind of clockspeeds buyers can expect to see with the average GTX 680. Specifically, the boost clock is based on the average clockspeed of the average GTX 680 that NVIDIA has seen in their labs. This is what NVIDIA had to say about the boost clock in their reviewer’s guide:

The “Boost Clock” is the average clock frequency the GPU will run under load in many typical non-TDP apps that require less GPU power consumption. On average, the typical Boost Clock provided by GPU Boost in GeForce GTX 680 is 1058MHz, an improvement of just over 5%. The Boost Clock is a typical clock level achieved running a typical game in a typical environment

In other words, when the average GTX 680 is boosting it reaches 1058MHz on average.

Ultimately NVIDIA and their customers are going to go through some teething issues on this, and there’s no way around it. Although the idea of variable performance isn’t a new one – we already see this to some degree with CPU turbo – this is the first time we’ve seen something like this in the GPU space, and it’s going to take some time to get used to.

In any case while we can’t relate to you what the average GTX 680 does with GPU Boost, we can tell you about GPU Boost based on what we’ve seen with our review sample.

First and foremost, GPU Boost operates on the concept of steps, analogous to multipliers on a CPU. Our card has 9 steps, each 13MHz apart, ranging from 1006MHz to 1110MHz. And while it’s not clear whether every GTX 680 steps up in 13MHz increments, based on NVIDIA’s boost clock of 1058MHz this would appear to be the case, as that would be 4 steps over the base clock.

At each step our card uses a different voltage, listed in the table below. We should note that we’ve seen different voltages reported for the same step in some cases, so it’s not entirely clear what’s going on. In any case we’re listing the most common voltage we’ve recorded for each step.

GeForce GTX 680 GPU Boost Step Table
Frequency Voltage
1110MHz 1.175v
1097MHz 1.15v
1084MHz 1.137v
1071MHz 1.125v
1058MHz 1.125v
1045MHz 1.112v
1032MHz 1.100v
1019MHz 1.075v
1006MHz 1.062v

As for deciding what clockspeed to step up to, GPU boost determines this based on power consumption and GPU temperature. NVIDIA has on-card sensors to measure power consumption at the rails leading into the GPU, and will only allow the video card to step up so long as it’s below the GPU Boost power target. This target isn’t published, but NVIDIA has told us that it’s 170W. Note that this is not the TDP of the card, which is 195W. Because NVIDIA doesn’t have a true throttling mechanism with Kepler, their TDP is higher than their boost target as heavy workloads can push power consumption well over 170W even at 1006MHz.

Meanwhile GPU temperatures also play an important role in GPU boost. Our sample could only hit the top step (1110MHz) if the GPU temperature was below 70C; as soon as the GPU reached 70C it would be brought down to the next highest step of 1097MHz. This means that the top step is effectively unsustainable on the stock GTX 680, as there are few if any applications that are both intensive enough to require high clockspeeds and light enough to not push GPU temperatures up.

Finally, with the introduction of GPU Boost overclocking has been affected as well. Rather than directly controlling the core clock, overclocking is accomplished through the combined manipulation of the GPU Boost power target and the use of a GPU clock offset. Power target manipulation works almost exactly as you’d expect: you can lower and raise the GPU Boost power target by -30% to +32%, similar to how adjusting the PowerTune limit works on AMD cards. Increasing the power target allows the video card to pull more power, thereby allowing it to boost to higher steps than is normally possible (but no higher than the max step), while decreasing the power target keeps it from boosting at all.

The GPU offset meanwhile manipulates the steps themselves. By adjusting the GPU offset all of the GPU Boost steps are adjusted by roughly an equal amount, depending on what clocks the PLL driving the GPU can generate. E.G. a +100MHz offset clock would increase the 1st step to 1120MHz, etc up to the top step which would be increased to 1210MHz.

While each factor can be adjusted separately, it’s adjusting both factors together that truly unlock overclocking. Adjusting the GPU offset alone won’t achieve much if most workloads are limited by GPU Boost’s power target, and adjusting the power target alone won’t improve the performance of workloads that are already allowed to reach the highest step. By combining the two you can increase the GPU clock and at the same time increase the power target so that workloads are actually allowed to hit those new clocks.

On that note, overclocking utilities will be adding support for GPU Boost over the coming weeks. The first overclocking utility with support for GPU Boost is EVGA’s Precision X, the latest rendition of their Precision overclocking utility. NVIDIA supplied Precision X Beta 20 with our review samples, and as we understand it that will be made available shortly for GTX 680 buyers.

Finally, while we’ll go into full detail on overclocked performance in a bit, we wanted to quickly showcase the impact GPU Boost, both on regular performance and on overclocking. First up, we ran all of our benchmarks at 2560 with the power target for GPU boost set to -16%, which reduces the power target to roughly 142W. While GPU Boost cannot be disabled outright, this was enough to ensure that it almost never activated.

As is to be expected, the impact of GPU Boost varies depending on the game, but overall we found that enabling GPU boost on our card only improves performance by an average of 3%, and by no more than 5%. While this is effectively free performance, it also is a stark reminder that GPU Boost isn’t nearly as potent as turboing on a CPU – at least not quite yet. As there’s no real equivalent to the lightly threaded workload for GPUs, the need for a wide range of potential GPU Boost clocks is not nearly as great as the need for high turbo clocks on a CPU. Even a light GPU workload is relatively heavy when graphics itself is an embarrassingly parallel task.

Our other quick look is at overclocking. The following is what our performance looked like at 2560 with stock GPU Boost settings, a power target of +16% (195W), and a GPU offset of +100MHz.

Overall raising the GPU offset is much more effective than raising the power target to improve performance, reflecting the fact that in our case most games were limited by the GPU Boost clock rather than the power target at least some of the time.

The Kepler Architecture: Efficiency & Scheduling Meet the GeForce GTX 680
Comments Locked

404 Comments

View All Comments

  • Slayer68 - Saturday, March 24, 2012 - link

    Being able to run 3 screens off of one card is new for Nvidia. Barely even mentioned it in your review. It would be nice to see Nvidia surround / Eyefinity compared on these new cards. Especially interested in scaling at 5760 x 1080 between a 680 and 7970.....
  • ati666 - Saturday, March 24, 2012 - link

    does the gtx680 still have the same anisotropic filtering pattern like the gtx470/480/570/580 (octagonal pattern) or is it like AMDs HD7970 all angle-independent anisotropic filtering (circular pattern)?
  • Ryan Smith - Saturday, March 24, 2012 - link

    It's not something we were planning on publishing, but it is something we checked. It's still the same octagon pattern as Fermi. It would be nice if NVIDIA did have angle-independent AF, but to be honest the difference between that and what NVIDIA does has been so minor that it's not something we've ever been able to create a noticeable issue with in the real world.

    Now Intel's AF on the other hand...
  • ati666 - Saturday, March 24, 2012 - link

    thank for the reply, now i can finally make a decision to buy hd7970 or gtx680..
  • CeriseCogburn - Saturday, March 24, 2012 - link

    Yes I thank him too for finally coming clean and noting the angle independent amd algorithm he's been fanboy over for a long time has absolutely no real world gaming advantage whatsoever.
    It's a big fat zero of nothing but FUD for fanboys.
    It would be nice if notional advantages actually showed up in games, and when they don't or for the life of the reviewer cannot be detected in games, that be clearly stated and the insane "advantage" declared be called what it really is, a useless talking point of deception that fools purchasers instead of enlightening them.
    The biased emphasis with zero advantage is as unscientific as it gets. Worse yet, within the same area, the "perfectly round algorithm" yielded in game transition lines with the amd cards, denied by the reviewer for what, a year ? Then a race game finally convinced him, and in this 7000 series release we find another issue the "perfectly round algorithm" apparently was attached to flaw with, a "poor transition resolution" - rather crudely large instead of fine like Nvidia's which casued excessive amd shimmering in game, and we are treated to that information only now after the 7000 series "solved" the issue and brought it near or up to the GTX long time standard.
    So this whole "perfectly round algorithm" has been nothing but fanboy lies for amd all along, while ignoring at least 2 large IQ issues when it was "put to use" in game. (transition shading and shimmering)
    I'm certain an explanation could be given that there are other factors with differing descriptive explanation, like the fineness of textural changes as one goes toward center of the image not directly affecting roundness one way or another, used as an excuse, perhaps the self deceptive justification that allowed such misbehavior to go on for so long.
  • _vor_ - Saturday, March 24, 2012 - link

    Will you seriously STFU already? It's hard to read this discussion with your blatant and belligerent jackassery all over it.

    You love NVIDIA. Great. Now STFU and stop posting.
  • CeriseCogburn - Saturday, March 24, 2012 - link

    Great attack, did I get anything wrong at all ? I guess not.
  • silverblue - Monday, March 26, 2012 - link

    Could you provide a link to an article based on this subject, please? Not an attack; just curious.
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    http://www.anandtech.com/show/5261/amd-radeon-hd-7...

    http://forums.anandtech.com/showpost.php?p=3152067...

    " So what then is going on that made Civ V so much faster for NVIDIA? Admittedly I had to press NVIDIA for this - performance practically doubled on high-end GPUs, which is unheard of. Until they told me what exactly they did, I wasn't convinced it was real or if they had come up with a really sweet cheat. It definitely wasn't a cheat.

    If you recall from our articles, I keep pointing to how we seem to be CPU limited at the time. "

    (YES, SO THAT'S WHAT WE GOT, THEY'RE CHEATING IT'S FAKE WE'RE CPU LIMITED- ALL WRONG ALL LIES)

    Since AMD’s latest changes are focused on reducing shimmering in motion we’ve put together a short video of the 3D Center Filter Tester running the tunnel test with the 7970, the 6970, and GTX 580. The tunnel test makes the differences between the 7970 and 6970 readily apparent, and at this point both the 7970 and GTX 580 have similarly low levels of shimmering.

    with both implementing DX9 SSAA with the previous generation of GPUs, and AMD catching up to NVIDIA by implementing Enhanced Quality AA (their version of NVIDIA’s CSAA) with Cayman. Between Fermi and Cayman the only stark differences are that AMD offers their global faux-AA MLAA filter, while NVIDIA has support for true transparency and super sample anti-aliasing on DX10+ games.

    (AMD FINALLY CATCHES UP IN EQAA PART, NVIDIA TRUE STANS AND SUPER SAMPLE HIGH Q STUFF, AMD CHEAT AND BLUR AND BLUR TEXT)

    Thus I had expected AMD to close the gap from their end with Southern Islands by implementing DX10+ versions of Adaptive AA and SSAA, but this has not come to pass.

    ( AS I INTERPRETED AMD IS WAY BEHIND STILL A GAP TO CLOSE ! )

    AMD has not implemented any new AA modes compared to Cayman, and as a result AAA and SSAA continue to only available in DX9 titles.

    Finally, while AMD may be taking a break when it comes to anti-aliasing they’re still hard at work on tessellation

    ( BECAUSE THEY'RE BEHIND IN TESSELLATION TOO.)

    Don't forget amd has a tessellation cheat in their 7000 series driver, so 3dmark 11 is cheated on as is unigine heaven, while Nvidia does no such thing.

    ---
    I do have more like the race car game admission, but I think that's enough helping you doing homework .
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    So here's more mr curious ..
    " “There’s nowhere left to go for quality beyond angle-independent filtering at the moment.”

    With the launch of the 5800 series last year, I had high praise for AMD’s anisotropic filtering. AMD brought truly angle-independent filtering to gaming (and are still the only game in town), putting an end to angle-dependent deficiencies and especially AMD’s poor AF on the 4800 series. At both the 5800 series launch and the GTX 480 launch, I’ve said that I’ve been unable to find a meaningful difference or deficiency in AMD’s filtering quality, and NVIDIA was only deficienct by being not quite angle-independent. I have held – and continued to hold until last week – the opinion that there’s no practical difference between the two.

    It turns out I was wrong. Whoops.

    The same week as when I went down to Los Angeles for AMD’s 6800 series press event, a reader sent me a link to a couple of forum topics discussing AF quality. While I still think most of the differences are superficial, there was one shot comparing AMD and NVIDIA that caught my attention: Trackmania."

    " The shot clearly shows a transition between mipmaps on the road, something filtering is supposed to resolve. In this case it’s not a superficial difference; it’s very noticeable and very annoying.

    AMD appears to agree with everyone else. As it turns out their texture mapping units on the 5000 series really do have an issue with texture filtering, specifically when it comes to “noisy” textures with complex regular patterns. AMD’s texture filtering algorithm was stumbling here and not properly blending the transitions between the mipmaps of these textures, resulting in the kind of visible transitions that we saw in the above Trackmania screenshot. "

    http://www.anandtech.com/show/3987/amds-radeon-687...

    WE GET THIS AFTER 6000 SERIES AMD IS RELEASED, AND DENIAL UNTIL, NOW WE GET THE SAME THING ONCE 7000 SERIES IS RELEASED, AND COMPLETE DENIAL BEFORE THAT...

    HERE'S THE 600 SERIES COVERUP THAT COVERS UP 5000 SERIES AFTER ADMITTING THE PROBLEM A WHOLE GENERATION LATE
    " So for the 6800 series, AMD has refined their texture filtering algorithm to better handle this case. Highly regular textures are now filtered properly so that there’s no longer a visible transition between them. As was the case when AMD added angle-independent filtering we can’t test the performance impact of this since we don’t have the ability to enable/disable this new filtering algorithm, but it should be free or close to it. In any case it doesn’t compromise AMD’s existing filtering features, and goes hand-in-hand with their existing angle-independent filtering."

    NOW DON'T FORGET RYAN HAS JUST ADMITTED AMD ANGLE INDEPENDENT ALGORITHM IS WORTH NOTHING IN REAL GAME- ABSOLUTELY NOTHING.

Log in

Don't have an account? Sign up now