Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • CeriseCogburn - Monday, March 4, 2013 - link

    lol - DREAM ON about goodwill and maintaining it.

    nVidia is attacked just like Intel, only worse. They have the least amount of "goodwill" any company could possibly have, as characterized by the dunderheads all over the boards and the also whining reviewers who cannot stand the "arrogant know it all confident winners who make so much more money playig games as an nVidia rep"...

    Your theory is total crap.

    What completely overrides it is the simple IT JUST WORKS nVidia tech and end user experience.
    Add in the multiplied and many extra features and benefits, and that equals the money in the bank that lets the end user rest easy that new games won't become an abandoned black holed screen.

    Reputation ? The REAL reputation is what counts, not some smarmy internet crybaby loser with lower self esteem than a confident winner with SOLID products, the BEST of the industry.
    That's arrogance, that's a winner, that's a know it all, that's Mr. Confidence, that's the ca$h and carry ladies magnet, and that's what someone for the crybaby underdog loser crash crapster company cannot stand.
  • Galvin - Tuesday, February 19, 2013 - link

    Can this card do 10bit video or still limited to 8bit?
  • alpha754293 - Tuesday, February 19, 2013 - link

    Does this mean that Tesla-enabled applications will be able to make use of Titan?
  • Ryan Smith - Tuesday, February 19, 2013 - link

    It depends on what features you're trying to use. From a fundamental standpoint even the lowly GT 640 supports the baseline Kepler family features, including FP64.
  • Ankarah - Tuesday, February 19, 2013 - link

    Highly unusual for a company to have two of their products at the exact same price point, catering to pretty much the same target audience.

    I guess it could be viewed as a poor-man's-Tesla but as far as the gaming side goes, it's quite pointless next to the 690, not to mention very confusing to anyone other than those are completely up-to-date on the latest news stories.
  • CeriseCogburn - Monday, March 4, 2013 - link

    Let's see, single GPU core fastest in the gaming world, much lower wattage, no need for profiles, constant FPS improvement - never the same or no scaling issues across all games, and you find it strange ?

    I find your complete lack of understanding inexcusable since you opened the piehole and removed all doubt.
  • Voidman - Tuesday, February 19, 2013 - link

    Finally somehting I could be excited about. I have a hard time caring much about the latest smart phone or tablet. A new high end video card though is something different all together. And then it turns out to be a "luxury product" and priced at 1k. Cancel excitement. Oh well, I'm happy with my 680 still, and I'm pretty sure I've still got overclocking room on it to boot. But for all those that love to hate on either AMD or Nvidia, this is what happens when one is not pushing the other. I have no doubt what so ever that AMD would do the same if they were on top at the moment.

  • HanakoIkezawa - Tuesday, February 19, 2013 - link

    The price is a bit disappointing but not unexpected. I was hoping this would be 750-850 not so I could buy one but so that I could get a second 670 for a bit cheaper :D

    But in all seriousness, this coming out does not make the 680 or 670 any slower or less impressive. In the same way the 3970x's price tag doesn't make the 3930k any less of a compelling option.
  • johnsmith9875 - Tuesday, February 19, 2013 - link

    Why not just make the video card the computer and let the intel chip handle graphics???
  • Breit - Tuesday, February 19, 2013 - link

    Thanks Ryan, this made my day! :)

    Looking forward to part 2...

Log in

Don't have an account? Sign up now