GPU Boost 2.0: Temperature Based Boosting

With the Kepler family NVIDIA introduced their GPU Boost functionality. Present on the desktop GTX 660 and above, boost allows NVIDIA’s GPUs to turbo up to frequencies above their base clock so long as there is sufficient power headroom to operate at those higher clockspeeds and the voltages they require. Boost, like turbo and other implementations, is essentially a form of performance min-maxing, allowing GPUs to offer higher clockspeeds for lighter workloads while still staying within their absolute TDP limits.

With the first iteration of GPU Boost, GPU Boost was based almost entirely around power considerations. With the exception of an automatic 1 bin (13MHz) step down in high temperatures to compensate for increased power consumption, whether GPU Boost could boost and by how much depended on how much power headroom was available. So long as there was headroom, GPU Boost could boost up to its maximum boost bin and voltage.

For Titan, GPU Boost has undergone a small but important change that has significant ramifications to how GPU Boost works, and how much it boosts by. And that change is that with GPU Boost 2, NVIDIA has essentially moved on from a power-based boost system to a temperature-based boost system. Or perhaps more precisely, a system that is predominantly temperature based but is also capable of taking power into account.

When it came to GPU Boost 1, its greatest weakness as explained by NVIDIA is that it essentially made conservative assumptions about temperatures and the interplay between high temperatures and high voltages in order keep from seriously impacting silicon longevity. The end result being that NVIDIA was picking boost bin voltages based on the worst case temperatures, which meant those conservative assumptions about temperatures translated into conservative voltages.

So how does a temperature based system fix this? By better mapping the relationship between voltage, temperature, and reliability, NVIDIA can allow for higher voltages – and hence higher clockspeeds – by being able to finely control which boost bin is hit based on temperature. As temperatures start ramping up, NVIDIA can ramp down the boost bins until an equilibrium is reached.

Of course total power consumption is still a technical concern here, though much less so. Technically NVIDIA is watching both the temperature and the power consumption and clamping down when either is hit. But since GPU Boost 2 does away with the concept of separate power targets – sticking solely with the TDP instead – in the design of Titan there’s quite a bit more room for boosting thanks to the fact that it can keep on boosting right up until the point it hits the 250W TDP limit. Our Titan sample can boost its clockspeed by up to 19% (837MHz to 992MHz), whereas our GTX 680 sample could only boost by 10% (1006MHz to 1110MHz).

Ultimately however whether GPU Boost 2 is power sensitive is actually a control panel setting, meaning that power sensitivity can be disabled. By default GPU Boost will monitor both temperature and power, but 3rd party overclocking utilities such as EVGA Precision X can prioritize temperature over power, at which point GPU Boost 2 can actually ignore TDP to a certain extent to focus on power. So if nothing else there’s quite a bit more flexibility with GPU Boost 2 than there was with GPU Boost 1.

Unfortunately because GPU Boost 2 is only implemented in Titan it’s hard to evaluate just how much “better” this is in any quantities sense. We will be able to present specific Titan numbers on Thursday, but other than saying that our Titan maxed out at 992MHz at its highest boost bin of 1.162v, we can’t directly compare it to how the GTX 680 handled things.

Titan For Compute GPU Boost 2.0: Overclocking & Overclocking Your Monitor
Comments Locked

157 Comments

View All Comments

  • tipoo - Tuesday, February 19, 2013 - link

    It seems if you were targetting maximum performance, being able to decouple them would make sense, as the GPU would both have higher thermal headroom as well as run cooler on average with the fan working harder, thus letting it hit the boost clocks higher.
  • Ryan Smith - Tuesday, February 19, 2013 - link

    You can always manually adjust the fan curve. NVIDIA is simply moving it with the temperature target by default.
  • Golgatha - Tuesday, February 19, 2013 - link

    WTF nVidia!? Seriously, WTF!?

    $1000 for a video card. Are they out of the GD minds!?
  • imaheadcase - Tuesday, February 19, 2013 - link

    No, read the article you twat.
  • tipoo - Tuesday, February 19, 2013 - link

    If they released a ten thousand dollar card, what difference would it make to you? This isn't' exactly their offering for mainstream gamers.
  • jackstar7 - Tuesday, February 19, 2013 - link

    I understand that my setup is a small minority, but I have to agree with the review about the port configuration. Not moving to multi-mDP on a card of this level just seems wasteful. As long as we're stuck with DVI, we're stuck with bandwidth limits that are going to stand in the way of 120Hz for higher resolutions (as seen on the Overlords and Catleap Extremes). Now I have to hope for some AIB to experiment with a $1000 card, or more likely wait for AMD to catch up to this.
  • akg102 - Tuesday, February 19, 2013 - link

    I'm glad Ryan got to experience this Nvidia circle jerk 'first-hand.'
  • Arakageeta - Tuesday, February 19, 2013 - link

    The Tesla- and Quadro-line GPUs have two DMA copy engines. This allows the GPU to simultaneously send and receive data on the full-duplex PCIe bus. However, the GeForce GPUs traditionally have only one DMA copy engine. Does the Titan have one or two copy engines? Since Titan has Tesla-class DP, I thought it might also have two copy engines.

    You can run the "deviceQuery" command that is a part of the CUDA SDK to find out.
  • Ryan Smith - Tuesday, February 19, 2013 - link

    1 copy engine. The full output of DeviceQuery is below.

    CUDA Device Query (Runtime API) version (CUDART static linking)

    Detected 1 CUDA Capable device(s)

    Device 0: "GeForce GTX TITAN"
    CUDA Driver Version / Runtime Version 5.0 / 5.0
    CUDA Capability Major/Minor version number: 3.5
    Total amount of global memory: 6144 MBytes (6442123264 bytes)
    (14) Multiprocessors x (192) CUDA Cores/MP: 2688 CUDA Cores
    GPU Clock rate: 876 MHz (0.88 GHz)
    Memory Clock rate: 3004 Mhz
    Memory Bus Width: 384-bit
    L2 Cache Size: 1572864 bytes
    Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3
    D=(4096,4096,4096)
    Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16
    384) x 2048
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Maximum sizes of each dimension of a block: 1024 x 1024 x 64
    Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 1 copy engine(s)
    Run time limit on kernels: Yes
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Mo
    del)
    Device supports Unified Addressing (UVA): Yes
    Device PCI Bus ID / PCI location ID: 3 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simu
    ltaneously) >

    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Versi
    on = 5.0, NumDevs = 1, Device0 = GeForce GTX TITAN
  • tjhb - Tuesday, February 19, 2013 - link

    Thank you!

    It seems to me NVIDIA are being incredibly generous to CUDA programmers with this card. I can hardly believe they've left FP64 capability at the full 1/3. (The ability to switch between 1/24 at a high clock and 1/3 at reduced clock seems ideal.) And we get 14/15 SMXs (a nice round number).

    Do you know whether the TCC driver can be installed for this card?

Log in

Don't have an account? Sign up now