Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • vacaloca - Tuesday, February 19, 2013 - link

    A while ago when K20 released and my advisor didn't want to foot the bill, I ended up doing it myself. Looks like the K20 might be going to eBay since I don't need HyperQ MPI and GPU Direct RDMA or ECC for that matter. I do suspect that it might be possible to crossflash this card with a K20 or K20X BIOS and mod the softstraps to enable the missing features... but probably the video outputs would be useless (and warranty void, and etc) so it's not really an exercise worth doing.

    Props to NVIDIA for releasing this for us compute-focused people and thanks to AnandTech for the disclosure on FP64 enabling. :)
  • extide - Tuesday, February 19, 2013 - link

    Can you please run some F@H benchmarks on this card? I would be very very interested to see how well it folds. Also if you could provide some power consumption numbers (watts @ system idle and watts when gpu only is folding).

    That would be great :)
    Thanks!
  • Ryan Smith - Tuesday, February 19, 2013 - link

    OpenCL is broken with the current press drivers. So I won't have any more information until NVIDIA issues new drivers.
  • jimhans1 - Tuesday, February 19, 2013 - link

    Alright, the whining about this being a $1000 card is just stupid; nVidia has priced this right in my eyes on the performance/noise/temperature front, they have never billed this as being anything other than an Extreme style GPU, just like the 690, yes the 690 will outperform this in raw usage, but not by much I'm guessing, and it will run hotter, louder and use more power than the Titan, not to mention possible SLI issues that have plagued ALL SLI/CF on one PCB cards to date. If you want THE high end MAINSTREAM card, you get the 680, if you wan't the EXTREME card(s), you get the Titan or 690.

    Folks, we don't yell at Ferrari or Bugatti for pricing their vehicles to their performance capabilities; nobody yelled at Powercolor for pricing the Devil 13 at $1000 even though the 690 spanks it on ALMOST all fronts for $100 LESS.

    Yes, I wish I could afford 1 or 3 of the Titans; but' I am not going to yell and whine about the $1000 price because I CAN'T afford them, it gives me a goal to try and save my sheckles to get at least 2 of them before years end, hopefully the price may (but probably won't) have dropped by then.
  • chizow - Tuesday, February 19, 2013 - link

    The problem with your car analogy is that Nvidia is now charging you Bugatti prices for the same BMW series you bought 2 years ago. Maybe an M3 level of trim this time around, but it's the same class of car, just 2x the price.
  • Sandcat - Wednesday, February 20, 2013 - link

    The high end 28nm cards have all been exercises in gouging. At least they're being consistent with the 'f*ck the customer because we have a duopoly' theme.
  • Kevin G - Tuesday, February 19, 2013 - link

    The card is indeed a luxury product. Like all consumer cards, this is crippled in in some way compared to the Quadro and Tesla lines. Not castrating FP64 performance is big. I guess nVidia finally realized that the HPC market values reliability more than raw computer and hence why EDC/ECC is disabled. ditto for RMDA, though I strongly suspect that RMDA is still used for SLI between Geforce cards - just a lock out to another vendor's hardware.

    The disabling of GPU Boost for FP64 workloads is odd. Naturally it should consumer a bit more energy to do FP64 workloads which would either result in higher temps at the same frequency as FP32 or lower clocks at the same frequency as FP32. The surprise is that users are don't have the flexibility to choose or adjust those settings.

    Display overclocking has me wondering exactly what is being altered. DVI and DP operate at distinct frequencies and moving to a higher refresh rate at higher resolutions should also increase this. Cable quality would potentially have an impact here as well. Though for lower resolutions, driving them at a higher refresh rate should still be within the cabling spec.
  • Kepe - Tuesday, February 19, 2013 - link

    The comment section is filled with NVIDIA hate, on how they dropped the ball, lost their heads, smoked too much and so on. What you don't seem to understand is that this is not a mainstream product. It's not meant for those who look at performance/$ charts when buying their graphics cards. This thing is meant for those who have too much money on their hands. Not the average Joe building his next gaming rig. And as such, this is a valid product at a valid price point. A bit like the X-series Intel processors. If you look at the performance compared to their more regular products the 1000+ dollar price is completely ridiculous.

    You could also compare the GTX Titan to a luxury phone. They use extravagant building materials, charge a lot of extra for the design and "bling", but raw performance isn't on the level of what you'd expect by just looking at the price tag.
  • jimhans1 - Tuesday, February 19, 2013 - link

    I agree, the pricing is in line with the EXPECTED user base for the card; it is NOT a mainstream card.
  • Sandcat - Tuesday, February 19, 2013 - link

    The disconnect regards the Gx110 chip. Sure, it's a non-mainstream card, however people do have the impression that it is the lock-step successor to the 580, and as such should be priced similarly.

    Nvidia does need to be careful here, they enjoy a duopoly in the market but goodwill is hard to create and maintain. I've been waiting for the 'real' successor to the 580 to replace my xfire 5850's and wasn't impressed with the performance increase of the 680. Looks like it'll be another year....at least.

    :(

Log in

Don't have an account? Sign up now