Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • TheJian - Wednesday, February 20, 2013 - link

    http://www.guru3d.com/articles-pages/geforce_gtx_t...
    1176mhz from 876 (boost). No bad for $2500 K20 basically for $1000. I've never done homework on it, but I don't think K20's overclock, but I could be wrong.

    Can't wait to see the review tomorrow. Clearly he'll bench it there and he has 3 :) You should get your answers then :)

    I'm wondering if some hacker will enable the K20 drivers, or if that's possible. It seems a lot of reviewers got 3, so you should have lots of data by weekend.
  • Bill Brasky - Tuesday, February 19, 2013 - link

    There were rumors this card would launch at 799-899, which made more sense. But for a grand this thing better be pretty darn close to 690.
  • wand3r3r - Tuesday, February 19, 2013 - link

    The price tag just makes this card a failure. It's a 580 replacement no matter how they label it, so they can shove it. They lost a potential customer...
  • karasaj - Tuesday, February 19, 2013 - link

    So what is the 680?
  • Sandcat - Tuesday, February 19, 2013 - link

    A GK104, which replaced the GF104,

    The GK110 is the replacement for the GF110, which was the GTX 580.
  • Ananke - Tuesday, February 19, 2013 - link

    the 680 was meant as a 560ti replacement...however, NVidia decided it turns too good to be sold too cheap, and changed the model numbering...I have several close friends in the marketing at NV :)
    However, NV is using this GK110 core for HPComputing for the very beginning in the Quadro cards, since there they really cannot skip on the double precision.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    BS.
    The 680 core is entirely different, the rollout time is over half a year off, and that just doesn't happen on a whim in Jan2012 with the post mental breakdown purported 7970 epic failure by amd...

    So after the scat lickers spew the 7970 amd failure, they claim it's the best card ever, even now.
    R O F L

    Have it both ways rumor mongering retreads. No one will notice... ( certainly none of you do).
  • rolodomo - Tuesday, February 19, 2013 - link

    Their business model has become separating money from the wallets of well-to-do who have no sense of value and technology (NVIDIA's PR admits this in the article ). It is a business model, but boutique. Doesn't do much for the their name brand in the view of the technorati either (NVIDIA: We Market to Suckers).
  • Wreckage - Tuesday, February 19, 2013 - link

    It's almost as fast as a pair of 7970's that cost $1100 at launch.

    AMD set the bar on high prices. Now that they are out of the GPU race, don't expect much to change.

    At least NVIDIA was able to bring a major performance increase this year. While AMD has become the new Matrox.
  • Stuka87 - Tuesday, February 19, 2013 - link

    AMD is out of the GPU race? What are you smoking? A $1000 dollar card does not put AMD out of the GPU race. The 7970GE competes well with the 680 for less money (They go back and forth depending on the game).

    Now if this card was priced at $500 then that would hurt AMD as the prices on the 660/670/680 would all drop. But its not the case, so your point is moot. Not to mention this card was due out a year ago, and it got delayed. Which is why the GK104 was bumped up to the 680 slot.

Log in

Don't have an account? Sign up now