Titan For Compute

Titan, as we briefly mentioned before, is not just a consumer graphics card. It is also a compute card and will essentially serve as NVIDIA’s entry-level compute product for both the consumer and pro-sumer markets.

The key enabler for this is that Titan, unlike any consumer GeForce card before it, will feature full FP64 performance, allowing GK110’s FP64 potency to shine through. Previous NVIDIA cards either had very few FP64 CUDA cores (GTX 680) or artificial FP64 performance restrictions (GTX 580), in order to maintain the market segmentation between cheap GeForce cards and more expensive Quadro and Tesla cards. NVIDIA will still be maintaining this segmentation, but in new ways.

NVIDIA GPU Comparison
  Fermi GF100 Fermi GF104 Kepler GK104 Kepler GK110
Compute Capability 2.0 2.1 3.0 3.5
Threads/Warp 32 32 32 32
Max Warps/SM(X) 48 48 64 64
Max Threads/SM(X) 1536 1536 2048 2048
Register File 32,768 32,768 65,536 65,536
Max Registers/Thread 63 63 63 255
Shared Mem Config 16K
48K
16K
48K
16K
32K
48K
16K
32K
48K
Hyper-Q No No No Yes
Dynamic Parallelism No No No Yes

We’ve covered GK110’s compute features in-depth in our look at Tesla K20 so we won’t go into great detail here, but as a reminder, along with beefing up their functional unit counts relative to GF100, GK110 has several feature improvements to further improve compute efficiency and the resulting performance. Relative to the GK104 based GTX 680, Titan brings with it a much greater number of registers per thread (255), not to mention a number of new instructions such as the shuffle instructions to allow intra-warp data sharing. But most of all, Titan brings with it NVIDIA’s Kepler marquee compute features: HyperQ and Dynamic Parallelism, which allows for a greater number of hardware work queues and for kernels to dispatch other kernels respectively.

With that said, there is a catch. NVIDIA has stripped GK110 of some of its reliability and scalability features in order to maintain the Tesla/GeForce market segmentation, which means Titan for compute is left for small-scale workloads that don’t require Tesla’s greater reliability. ECC memory protection is of course gone, but also gone is HyperQ’s MPI functionality, and GPU Direct’s RDMA functionality (DMA between the GPU and 3rd party PCIe devices). Other than ECC these are much more market-specific features, and as such while Titan is effectively locked out of highly distributed scenarios, this should be fine for smaller workloads.

There is one other quirk to Titan’s FP64 implementation however, and that is that it needs to be enabled (or rather, uncapped). By default Titan is actually restricted to 1/24 performance, like the GTX 680 before it. Doing so allows NVIDIA to keep clockspeeds higher and power consumption lower, knowing the apparently power-hungry FP64 CUDA cores can’t run at full load on top of all of the other functional units that can be active at the same time. Consequently NVIDIA makes FP64 an enable/disable option in their control panel, controlling whether FP64 is operating at full speed (1/3 FP32), or reduced speed (1/24 FP32).

The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as GPU Boost being disabled, forcing our card to run at 837MHz (or lower) at all times. And while we haven't seen it first-hand, NVIDIA tells us that in particularly TDP constrained situations Titan can drop below the base clock to as low as 725MHz. This is why NVIDIA’s official compute performance figures are 4.5 TFLOPS for FP32, but only 1.3 TFLOPS for FP64. The former is calculated around the base clock speed, while the latter is calculated around the worst case clockspeed of 725MHz. The actual execution rate is still 1/3.

Unfortunately there’s not much else we can say about compute performance at this time, as to go much farther than this requires being able to reference specific performance figures. So we’ll follow this up on Thursday with those figures and a performance analysis.

Meet The GeForce GTX Titan GPU Boost 2.0: Temperature Based Boosting
Comments Locked

157 Comments

View All Comments

  • chizow - Friday, February 22, 2013 - link

    Um, GF100/110 are absolutely the same league as this card. In the semiconductor industry, size = classification. This is not the first 500+mm^2 ASIC Nvidia has produced, the lineage is long and distinguished:

    G80, GT200, GT200b, GF100, GF110.

    *NONE* of these GPUs cost $1K, only the 8800Ultra came anywhere close to it at $850. All of these GPUs offered similar features and performance relative to the competition and prevailing landscape. Hell, GT200 was even more impressive as it offered a 512-bit memory interface.

    Increase in number of transistors is just Moore's law, that's just expected progress. If you don't know the material you're discussing please refrain from commenting, thank you.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    Wait a minute doofus, you said the memory cost the same, and it's cheap.
    You entirely disregarded the more than double the core transistor footprint, the R&D for it, the yield factor, the high build quality, and the new and extra top tier it resides in, not to mention it's awesome features the competition did not develop and does not have, AT ALL.
    4 monitors out of the box, Single card 3d and surround, extra monitor for surfing, target frame rate, TXAA, no tesselation lag, and on and on.

    Once a product breaks out far from the competitions underdeveloped and undeveloped failures, it EARNS a price tier.

    You're living in the past, you're living with the fantasy of zero worldwide inflation, you'r living the lies you've told yourself and all of us about the last 3 top tier releases, all your arguments exposed in prior threads for the exaggerated lies they were and are, and the Charlie D RUMORS all you of the this same ilk repeat, even as you ignore the absolute time years long DEV time and entire lack of production capability with your tinfoil hat whine.

    The market has changed you fool. There was a SEVERE SHORTAGE in the manufacturing space (negating your conspiracy theory entirely) and still there's pressure, and nVidia has developed a large range of added features the competition is entirely absent upon.

    You didn't get the 680 for $350 (even though you still 100% believe Charlie D's lie filled rumor) and you're not getting this for your fantasy lie price either.
  • CeriseCogburn - Sunday, February 24, 2013 - link

    NONE had the same or much bigger die sizes.
    NONE had 7.1 BILLION engineer traced research die points.
    NONE had the potential downside low yield.
    NONE had the twice plus expensive ram in multiples more attached.

    NONE is the amount of truth you told.
  • Stuka87 - Tuesday, February 19, 2013 - link

    Common sense would say nVidia is charging double what they should be.

    384bit memory is certainly not a reason for high cost as AMD uses it in the 79x0 series chips. A large die adds to cost, but the 580 had a big die as well (520mm2), so that cant be the whole reason for the high cost (the GK110 does have more transistors).

    So it comes down to nVidia wanted to scalp customers.

    As for your comments on AMD, what proof do you have that AMD has nothing else in the works? Not sure what crap you are referring too. I have had no issues with my AMD cards or their drivers (Or my nVidias for that matter). Just keep on hating for no reason.
  • AssBall - Tuesday, February 19, 2013 - link

    You speak of common sense, but miss the point. When have you ever bought a consumer card for the pre-listed MSRP? These cards will sell to OEM's for compute and to enthusiasts via Nvidia's partners for much less.

    So it comes down to "derp Nvidia is a company that wants to make money derp".

    Calling someone a hater for unrealistic reasons is much less of an offense than being generally an idiot.
  • TheJian - Wednesday, February 20, 2013 - link

    A chip with 7.1B transistors is tougher to make correctly than 3B. Which card has 6GB of 6ghz memory from AMD that's $500 with this performance? 7990 is $900-1000 with 6GB and is poorly engineered compared to this (nearly double the watts, two slots more heat etc etc).

    This is why K20 costs $2500. They get far few of these perfect than much simpler chips. Also as said before, engineering these are not free. AMD charges less you say? Their bottom line for last year shows it too...1.18B loss. That's why AMD will have no answer until the end of the year. They can't afford to engineer an answer now. They just laid of 30% of their workforce because they couldn't afford them. NV hired 800 people last year for new projects. You do that with profits, not losses. You quit giving away free games or go out of business.

    Let me know when AMD turns a profit for a year. I guess you won't be happy until AMD is out of business. I think you're hating on NV for no reason. If they were anywhere near scalping customers they should have record PROFITS but they don't. Without Intel's lawsuit money (300mil a year) they'd be making ~1/2 of what they did in 2007. You do understand a company has to make money to stay in business correct?

    If NV charged 1/2 the price for this they would be losing probably a few hundred on each one rather than probably a $200 profit or so.

    K20 is basically the same card for $2500. You're lucky their pricing it at $1000 for what you're getting. Amazon paid $2000ea for 10000 of these as K20's. You think they feel robbed? So by your logic, they got scalped 20,000 times since they paid double the asking here with 10000 of them?...ROFL. OK.

    What it comes down to is NV knows how to run a business, while AMD knows how to run one into the ground. AMD needs to stop listening to people like you and start acting like NV or they will die.

    AMD killed themselves the day they paid 3x the price they should have for ATI. Thank Hector Ruiz for that. He helped to ruin Motorola too if memory serves...LOL. I love AMD, love their stuff, but they run their business like idiots. Kind of like Obama runs the country. AMD is running a welfare business (should charge more, and overpays for stuff they shouldn't even buy), obama runs a welfare country, and pays for crap like solyndra etc he shouldn't (with our money!). Both lead to credit downgrades and bankruptcy. You can't spend your way out of a visa bill. But both AMD and Obama think you can. You have to PAY IT OFF. Like NV, no debt. Spend what you HAVE, not what you have to charge.

    Another example. IMG.L, just paid triple what they should have for the scrap of MIPS. I think this will be their downfall. They borrowed 22million to pay 100mil bid for mips. It was worth 30mil. This will prove to be Imaginations downfall. That along with having chip sales up 90% but not charging enough to apple for them. They only made 30mil for 6 MONTHS! Their chip powers all of apples phones and tablets graphics! They have a hector ruiz type running their company too I guess. Hope they fire him before he turns them into AMD. Until Tegra4 they have the best gpu on a soc in the market. But they make 1/10 of what NV does. Hmmm...Wrong pricing? Apple pockets 140Bil over the life of ipad/iphone...But IMG.L had to borrow 22mil just to buy a 100mil company? They need to pull a samsung and raise prices 20% on apple. NV bought icera with 325mil cash...Still has 3.74B in the bank (which btw is really only up from 2007 because of Intel's 300mil/yr, not overcharging you).
  • CeriseCogburn - Sunday, February 24, 2013 - link

    Appreciate it. Keep up the good work, as in telling the basic facts called the truth to the dysfunctional drones.

    no physx
    no cuda
    no frame rate target (this is freaking AWESOME, thanks nVidia)
    no "cool n quiet" on the fly GPU heat n power optimizing max fps
    no TXAA
    no same game day release drivers

    EPIC FAIL on dual cards, yes even today for amd

    " While it suffers from the requirement to have proper game-specific SLI profiles for optimum scaling, NVIDIA has done a very good job here in the past, and out of the 19 games in our test suite, SLI only fails in F1 2012. Compare that to 6 out of 19 failed titles with AMD CrossFire."

    http://www.techpowerup.com/reviews/NVIDIA/GeForce_...

    nVidia 18 of 19, 90%+ GRADE AAAAAAAAAA

    amd 13 of 19 < 70% grade DDDDDDDDDD
  • Iketh - Tuesday, February 19, 2013 - link

    please drag yourself into the street and stone yourself
  • CeriseCogburn - Sunday, February 24, 2013 - link

    LOL awww, now that wasn't very nice... May I assume you aren't in the USA and instead in some 3rd world hole with some 3rd world currency and economy where you can't pitch up a few bucks because there's no welfare available ? Thus your angry hate filled death wish ?
  • MrSpadge - Tuesday, February 19, 2013 - link

    Don't worry.. price will drop if they're actually in a hurry to sell them.

Log in

Don't have an account? Sign up now