Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • mrdude - Tuesday, February 19, 2013 - link

    I doubt it, given the transistor count and die size. This thing isn't exactly svelte, with 7.1Billion transistors. The viable-chips-per-wafer must be quite low, hence the price tag.

    What I don't understand is why people would buy a a $1000 GPU for compute? I can understand why somebody buys a ~$300 GPU to add a little extra horsepower to their small selection of applications, but if you're paying $1000 for a GPU then you're also expecting a decent set of drivers as well. But both AMD and nVidia have purposely neutered their consumer cards' performance for most professional tasks and applications. As a result, you can buy a cheaper FirePro or Quadro with professional drivers based on the smaller die/GPU (like a 7850 or 660Ti) that will outperform this $1000 single GPU card in a variety of software.

    If I'm paying upwards of $1000 for a GPU, it sure as hell has to work. Buying a consumer grade GPU and relying on consumer (gaming) drivers just means that you'll almost never hit anywhere near the max theoretical throughput of the card. In essence, you're paying for performance which you'll never get anywhere close to.

    This is a perfect card for the fools who overspend on their gaming GPUs. For everyone else it's just a high-priced bore.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    All those fools, we have been told over and over, and in fact very recently by the site's own, are here !

    That's what this is for, dimwit. Not for crybaby losers who can barely scrape up an HD 5750.

    Let's face it, every one of you whining jerks is drooling uncontrollably for this flagship, and if you're just a loser with a 450W power supply, no worries, they're being sold in high priced systems with that.

    You'd take in a minute, happily, and max out your games and your 1920x1080 monitor in MOST games.

    I mean I have no idea what kind of poor all you crybabies are. I guess you're all living in some 3rd world mudhole.
  • madmilk - Thursday, February 21, 2013 - link

    They're clearly not in any kind of hurry, given how well Tesla is selling at 3 times the price. These are probably just the rejects, set to a higher voltage and TDP and sold to the consumer market.
  • mrdude - Thursday, February 21, 2013 - link

    Oh yea, nVidia is never going to jeopardize the cash cow that is the Tesla for the HPC crowd, or Quadro for the professional market. The margins there aren't worth giving up in order to bring GPU compute (and its drivers) to the mass market.

    This notion that this is a GPGPU card is silly, frankly. We can throw around the max theoretical GFLOPs/TFLOPs figures all we please, the reality is that you'll never see anywhere close to those in professional applications. There are two reasons for that: Tesla and Quadro.
  • chizow - Tuesday, February 19, 2013 - link

    Yeah, totally agree with the post title, Nvidia has lost their fking minds.

    And PS: The X-Men *STILL* want their logo back.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    This isn't 19G80 Kansas anymore Dorothy.

    Do any of you people live in the USA ?

    I mean really, how frikkin poor are all you crybabies, and how do you even afford any gaming system or any games ?

    Are you all running low end C2D still, no SSD's, and 1280x1024, do you live in a box ?

    How can you be in the USA and whine about this price on the very top end product for your Lifetime Hobby ?

    What is wrong with you, is the question.
  • Pariah - Tuesday, February 19, 2013 - link

    In most cases, this card won't make sense. There are at least a couple of scenarios where it might make sense. One, in an ultra highend gaming system. That means multiple Titan cards. Because these are single GPU cards, an SLI Titan setup should scale much better than an SLI 690 with 4 GPU's would. And further that point with triple SLI Titans.

    Secondly, this card is smaller and uses less power than a 690, which means you can use it in much smaller cases, even some mini-itx cases. That would be one helluva a nice portable LAN box.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    This card makes sense for anyone running a mid sandy bridge and 1920x1080 monitor.
    After I complained about the 1920X1200 reviews here, pointing out nVidia is 12% BETTER compared to amd in the former resolution, 50 raging amd fanboys screeched they have a 1920X1200 monitor they run all the time and they were more than willing to pop the extra $150 bucks for it over the 1920x1080...

    So we can safely assume MOST of the people here have a 1920X1080 for pete sakes.
    A low end sandy is $50 to $80, same for a board, DDR3 is the cheapest ram.
    So for less than $200 bucks to prepare at max, (use your old case+ps) near everyone here is ready to run this card, and would find benefit from doing so.

    Now lying about that just because they don't plan on buying one is what most here seem to want to do.

  • Deo Domuique - Friday, March 8, 2013 - link

    This card should be cost ~600-650$. Not a single cent more. The rest is ala Apple markup for the mindless consumer. Unfortunately, there are a lot of them.
  • trajan2448 - Tuesday, February 19, 2013 - link

    Obviously a great piece of technology. Interested to see what the over clockers can achieve.
    If it was $700 It would make a lot more sense. Nonetheless, fun to see some fanatics do a TRI SLI overclocked and blow up their monitor.

Log in

Don't have an account? Sign up now