Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • AeroJoe - Wednesday, February 20, 2013 - link

    Very good article - but now I'm confused. If I'm building an Adobe workstation to handle video and graphics, do I want a TITAN for $999 or the Quadro K5000 for $1700? Both are Kepler, but TITAN looks like more bang for the buck. What am I missing?
  • Rayb - Wednesday, February 20, 2013 - link

    The extra money you are paying is for the driver support in commercial applications like Adobe CS6 with a Quadro card vs a non certified card.
  • mdrejhon - Wednesday, February 20, 2013 - link

    Excellent! Geforce Titan will make it much easier to overclock an HDTV set to 120 Hz
    ( http://www.blurbusters.com/zero-motion-blur/hdtv-r... )

    Some HDTV’s such as Vizio e3d420vx can be successfully “overclocked” to a 120 Hz native PC signal from a computer. This was difficult because an EDID override was necessary. However, the Geforce Titan should make this a piece of cake!
  • Blazorthon - Wednesday, February 20, 2013 - link

    Purely as a gaming card, Titan is obviously way to overpriced to be worth considering. However, it's compute performance is intriguing. It can't totally replace a Quadro or Tesla, but there are still many compute workloads that you don't need those extremely expensive extra features such as ECC and Quadro/Tesla drivers to excel in. Many of them may be better suited to a Tahiti card's far better value, but stuff like CUDA workloads may find Titan to be the first card to truly succeed GF100/GF110 based cards as a gaming and compute-oriented card, although like I said, I think that the price could still be at least somewhat lower. I understand it not being around $500 like GF100/110 launched at for various reasons, but come on, at most give us an arpund $700-750 price...
  • just4U - Thursday, February 21, 2013 - link

    Some one here have stated that AMD is at fault for pricing their 7x series so high las year. Perhaps many were disapointed with the $550 price range but that's still somewhat lower than previously released Nvidia products thru the years. Several of those cards (at various price points) handily beat the 580 (which btw never did get much of a price drop) and at the time that's what it was competing against.

    So I can't quite connect the dots in why they are saying that it's AMD's fault for originally pricing the 7x series so high when in reality it was still lower than newly released Nvidia product over the past several years.
  • CeriseCogburn - Monday, March 4, 2013 - link

    For the most part, correct.
    The 7970 came out at $579 though, not $550. And it was nearly not present for many months, till just the prior day to the 680's $499 launch.

    In any case, ALL these cards drop in price over the first six months or so, EXCEPT sometimes, if they are especially fast, like the 580, they hold at the launch price, which it did, until the 7970 was launched - the 580 was $499 till the day the 7970 launched.

    So what we have here is the tampon express. The tampon express has not paid attnetion to any but fps/price vs their revised and memory holed history, so it will continue forever.

    They have completely ignored capital factors like the extreme lack of production space in the node, ongoing prior to the 7970 release, and at emergency low levels prior to the months later 680 release, with the emergency board meeting, and multi-billion dollar borrowing buildout for die space production expansion, not to mention the huge change in wafer from dies payment which went from per good die to per wafer cost, thus placing the burden of failure on the GPU company side.

    It's not like they could have missed that, it was all over the place for months on end, the amd fanboys were bragging amd got diespace early and constantly hammering away at nVidia and calling them stupid for not having reserved space and screaming they would be bankrupt from low yields they had to pay for from the "housefires" dies.

    So what we have now is well trained (not potty trained) crybabies pooping their diapers over and over again, and let's face it, they do believe they have the power to lower the prices if they just whine loudly enough.

    AMD has been losing billions, and nVidia profit ratio is 10% - but the crying babies screams mean to assist their own pocketbooks at any expense, including the demise of AMD even though they all preach competition and personal CEO capitalist understanding after they spew out 6th grader information or even make MASSIVE market lies and mistakes with illiterate interpretation of standard articles or completely blissful denial of things like diespace (mentioned above) or long standing standard industry tapeout times for producing the GPU's in question.

    They want to be "critical reporters" but they fail miserably at it, and merely show crybaby ignorance with therefore false outrage. At least they consider themselves " the good hipster !"
  • clickonflick - Thursday, March 7, 2013 - link

    i agree that the price of this GPU is really high , one could easily assemble a fully mainstream laptop online with dell at this price tag or a desktop, but for gamers, to whom performance is above price. then it is a boon for them
    for more pics check this out
    clickonflick/nvidia-geforce-gtx-titan

Log in

Don't have an account? Sign up now