NVIDIA's GeForce GTX Titan, Part 1: Titan For Gaming, Titan For Compute

Name: NVIDIA's GeForce GTX Titan, Part 1: Titan For Gaming, Titan For Compute
Item: NVIDIA's GeForce GTX Titan, Part 1: Titan For Gaming, Titan For Compute
Author: Ryan Smith

by Ryan Smith on February 19, 2013 9:01 AM EST

Posted in
GPUs
GeForce
Kepler
NVIDIA
Titan

157 Comments | Add A Comment

157 Comments

GPU Boost 2.0: Temperature Based Boosting

With the Kepler family NVIDIA introduced their GPU Boost functionality. Present on the desktop GTX 660 and above, boost allows NVIDIA’s GPUs to turbo up to frequencies above their base clock so long as there is sufficient power headroom to operate at those higher clockspeeds and the voltages they require. Boost, like turbo and other implementations, is essentially a form of performance min-maxing, allowing GPUs to offer higher clockspeeds for lighter workloads while still staying within their absolute TDP limits.

With the first iteration of GPU Boost, GPU Boost was based almost entirely around power considerations. With the exception of an automatic 1 bin (13MHz) step down in high temperatures to compensate for increased power consumption, whether GPU Boost could boost and by how much depended on how much power headroom was available. So long as there was headroom, GPU Boost could boost up to its maximum boost bin and voltage.

For Titan, GPU Boost has undergone a small but important change that has significant ramifications to how GPU Boost works, and how much it boosts by. And that change is that with GPU Boost 2, NVIDIA has essentially moved on from a power-based boost system to a temperature-based boost system. Or perhaps more precisely, a system that is predominantly temperature based but is also capable of taking power into account.

When it came to GPU Boost 1, its greatest weakness as explained by NVIDIA is that it essentially made conservative assumptions about temperatures and the interplay between high temperatures and high voltages in order keep from seriously impacting silicon longevity. The end result being that NVIDIA was picking boost bin voltages based on the worst case temperatures, which meant those conservative assumptions about temperatures translated into conservative voltages.

So how does a temperature based system fix this? By better mapping the relationship between voltage, temperature, and reliability, NVIDIA can allow for higher voltages – and hence higher clockspeeds – by being able to finely control which boost bin is hit based on temperature. As temperatures start ramping up, NVIDIA can ramp down the boost bins until an equilibrium is reached.

Of course total power consumption is still a technical concern here, though much less so. Technically NVIDIA is watching both the temperature and the power consumption and clamping down when either is hit. But since GPU Boost 2 does away with the concept of separate power targets – sticking solely with the TDP instead – in the design of Titan there’s quite a bit more room for boosting thanks to the fact that it can keep on boosting right up until the point it hits the 250W TDP limit. Our Titan sample can boost its clockspeed by up to 19% (837MHz to 992MHz), whereas our GTX 680 sample could only boost by 10% (1006MHz to 1110MHz).

Ultimately however whether GPU Boost 2 is power sensitive is actually a control panel setting, meaning that power sensitivity can be disabled. By default GPU Boost will monitor both temperature and power, but 3^rd party overclocking utilities such as EVGA Precision X can prioritize temperature over power, at which point GPU Boost 2 can actually ignore TDP to a certain extent to focus on power. So if nothing else there’s quite a bit more flexibility with GPU Boost 2 than there was with GPU Boost 1.

Unfortunately because GPU Boost 2 is only implemented in Titan it’s hard to evaluate just how much “better” this is in any quantities sense. We will be able to present specific Titan numbers on Thursday, but other than saying that our Titan maxed out at 992MHz at its highest boost bin of 1.162v, we can’t directly compare it to how the GTX 680 handled things.

Titan For Compute GPU Boost 2.0: Overclocking & Overclocking Your Monitor

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

157 Comments

View All Comments

hammer256 - Tuesday, February 19, 2013 - link
Ryan's analysis of the target market for this card is spot on: this card is for small scale HPC type workloads, where the researcher just want to build a desktop-like machine with a few of those cards. I know that's what I use for my research. To me, this is the real replacement of the GTX 580 for our purposes. The price hike is not great, but when put to context of the K20X, it's a bargain. I'm lusting to get 8 of these cards and get a Tyan GPU server.
Gadgety - Tuesday, February 19, 2013 - link
While gamers see little benefit, it looks like this is the card for GPU rendering, provided the software developers at VRay, Octane and others find a way to tap into this. So one of these can replace the 3xGTX580 3GBs.
chizow - Tuesday, February 19, 2013 - link
Nvidia has completely lost their minds. Throwing in a minor bone with the non-neutered DP performance does not give them license to charge $1K for this part, especially when DP on previous flagship parts carried similar performance relative to Tesla.

First the $500 for a mid-range ASIC in GTX 680, then $1200 GTX 690 and now a $1000 GeForce Titan. Unbelievable. Best of luck Nvidia, good luck competing with the next-gen consoles at these price points, or even with yourselves next generation.

While AMD is still at fault in all of this for their ridiculous launch pricing for the 7970, these recent price missteps from Nvidia make that seem like a distant memory.
ronin22 - Wednesday, February 20, 2013 - link
Bullshit of a typical NV hater.

The compute-side of the card isn't a minor bone, it's its prime feature, along with the single-chip GTX690-like performance.

"especially when DP on previous flagship parts carried similar performance relative to Tesla"

Bullshit again.
Give me a single card that is anywhere near the K20 in DP performance and we'll talk.

You don't understand the philosophy of this card, as many around here.
Thanksfully, the real intended audience is already recognizing the awesomeness of this card (read previous comments).

You can go back to playing BF3 on your 79xx, but please close the door behind you on your way out ;)
chizow - Wednesday, February 20, 2013 - link
Heh, your ignorant comments couldn't be further from the truth about being an "NV hater". I haven't bought an ATI/AMD card since the 9700pro (my gf made the mistake of buying a 5850 though, despite my input) and previously, I solely purchased *multiple* Nvidia cards in this flagship market for the last 3 generations.

I have a vested interest in Nvidia in this respect as I enjoy their products, so I've never rooted for them to fail, until now. It's obvious to me now that between AMD's lackluster offerings and ridiculous launch prices along with Nvidia's greed with their last two high-end product launches (690 and Titan), that they've completely lost touch with their core customer base.

Also, before you comment ignorantly again, please look up the DP performance of GTX 280 and GTX 480/580 relative to their Tesla counterparts. You will see they are still respectable, ~1/8th of SP performance, which was still excellent compared to the completely neutered 1/32 DP of GK104 Kepler. That's why there is still a high demand for flagship Fermi parts and even GT200 despite their overall reputation as a less desirable part due to their thermal characteristics.

Lastly, I won't be playing BF3 on a 7970, try a pair of GTX 670s in SLI. There's a difference between supporting a company through sound purchasing decisions and stupidly pissing away $1K for something that cost $500-$650 in the past.

The philosophy of this card is simple: Rob stupid people of their money. I've seen enough of this in the past from the same target audience and generally that feeling of "awesomeness" is quickly replaced by buyer's remorse as they realize that slightly higher FPS number in the upper left of their screen isn't worth the massive number on their credit card statement.
CeriseCogburn - Sunday, February 24, 2013 - link
That one's been pissing acid since the 680 launch, failed and fails to recognize the superior leap of the GTX580 over the prior gen, which gave him his mental handicap believing he can get something for nothing, along with sucking down the master amd fanboy Charlie D's rumor about the "$350" flagship nVidia card blah blah blah 680 blah blah second tier blah blah blah.

So instead the rager now claims he wasted near a grand on two 670's - R O F L - the lunatics never end here man.
bamboo69 - Tuesday, February 19, 2013 - link
Origin is using EK Waterblocks? i hope they arentt nickel plated, their nickel blocks flake
Knock24 - Wednesday, February 20, 2013 - link
I've seen it mentioned in the article that Titan has HyperQ support, but I've also read the opposite elsewhere.
Can anyone confirm that HyperQ is supported? I'm guessing the simpleHyperQ Cuda SDK example might reveal if it's supported or not.
torchedguitar - Wednesday, February 20, 2013 - link
HyperQ actually means two separate things... One part is the ability to have a process act as a server, providing access to the GPU for other MPI processes. This is supported on Linux using Tesla cards (e.g. K20X) only, so it won't work on GTX Titan (it does work on Titan the supercomputer, though). The other part of HyperQ is that there are multiple hardware queues available for managing the work on multiple CUDA streams. GTX Titan DOES support this part, although I'm not sure just how many of these will be enabled (it's a tradeoff, having more hardware streams allows more flexibility in launching concurrent kernels, but also takes more memory and takes more time to initialize). The simpleHyperQ sample is a variation of the concurrentKernels sample (just look at the code), and it shows how having more hardware channels cuts down on false dependencies between kernels in different streams. You put things in different stream because they have no dependencies on each other, so in theory nothing in stream X should ever get stuck waiting for something in stream Y. When that does happen due to hitting limits of the hardware, it's a false dependency. An example would be when you try to time a kernel launch by wrapping it with CUDA event records (this is the simpleHyperQ sample). GPUs before GK110 only have one hardware stream, and if you take a program that launches kernels concurrently in separate streams, and wrap all the kernels with CUDA event records, you'll see that suddenly the kernels run one-at-a-time instead of all together. This is because in order to do the timing for the event, the single hardware channel queues up the other launches while waiting for each kernel to finish, then it records the end time in the event, then goes on to the next kernel. With HyperQ's addition of more hardware streams, you get around this problem. Run the simpleHyperQ sample on a 580 or a 680 through a tool like Nsight and look at the timeline... You'll see all the work in the streams show up like stair steps -- even though they're in different streams, they happen one at a time. Now run it on a GTX Titan or a K20 and you'll see many of the kernels are able to completely overlap. If 8 hardware streams are enabled, the app will finish 8x faster, or if 32 are enabled, 32x faster.

Now, this sample is extremely contrived, just to illustrate the feature. In reality, overlapping kernels won't buy you much speedup if you're already launching big enough kernels to use the GPU effectively. In that case, there shouldn't much room left for overlapping kernels, except when you have unbalanced workloads where many threads in a kernel finish quickly but a few stragglers run way longer. With HyperQ, you greatly increase your chances that kernels in other streams can immediately start using the resources freed up when some of the threads in a kernel finish early, instead of waiting for all threads in the kernel to finish before starting the next kernel.
vacaloca - Monday, March 4, 2013 - link
I wanted to say that you hit the nail on the head... I just tested the simpleHyperQ example, and indeed, the Titan has 8 hardware streams enabled. For every multiple higher than 8, and the "Measured time for sample" goes up.

NVIDIA's GeForce GTX Titan, Part 1: Titan For Gaming, Titan For Compute

Post Your Comment

157 Comments

View All Comments

hammer256 - Tuesday, February 19, 2013 - link

Gadgety - Tuesday, February 19, 2013 - link

chizow - Tuesday, February 19, 2013 - link

ronin22 - Wednesday, February 20, 2013 - link

chizow - Wednesday, February 20, 2013 - link

CeriseCogburn - Sunday, February 24, 2013 - link

bamboo69 - Tuesday, February 19, 2013 - link

Knock24 - Wednesday, February 20, 2013 - link

torchedguitar - Wednesday, February 20, 2013 - link

vacaloca - Monday, March 4, 2013 - link

Log in

Don't have an account? Sign up now