GM200 - All Graphics, Hold The Double Precision

Before diving into our look at the GTX Titan X itself, I want to spend a bit of time talking about the GM200 GPU. GM200 is a very interesting GPU, and not for the usual reasons. In fact you could say that GM200 is remarkable for just how unremarkable it is.

From a semiconductor manufacturing standpoint we’re still at a standstill on 28nm for at least a little bit longer, pushing 28nm into its 4th year and having all sorts of knock-on effects. We’ve droned on about this for some time now, so we won’t repeat ourselves, but ultimately what it means for consumers is that AMD and NVIDIA have needed to make do with the tools they have, and in lieu of generational jumps in manufacturing have focused on architectural efficiency and wringing out everything they can get out of 28nm.

For NVIDIA those improvements came in the form of the company’s Maxwell architecture, which has made a concentrated effort to focus on energy and architectural efficiency to get the most out of their technology. In assembling GM204 NVIDIA built the true successor to GK104, putting together a pure graphics chip. From a design standpoint NVIDIA spent their energy efficiency gains on growing out GM204’s die size without increasing power, allowing them to go from 294mm2 and 3.5B transistors to 398mm2 and 5.2B transistors. With a larger die and larger transistor budget, NVIDIA was able to greatly increase performance by laying down a larger number of high performance (and relatively larger themselves) Maxwell SMMs.

On the other hand for GM206 and the GTX 960, NVIDIA banked the bulk of their energy savings, building what’s best described as half of a GM204 and leading to a GPU that didn’t offer as huge of a jump in performance from its predecessor (GK106) but also brought power usage down and kept costs in check.


Not Pictured: The 96 FP64 ALUs

But for Big Maxwell, neither option was open to NVIDIA. At 551mm2 GK110 was already a big GPU, so large (33%) increase in die size like with GM204 was not practical. Neither was leaving the die size at roughly the same area and building the Maxwell version of GK110, gaining only limited performance in the process. Instead NVIDIA has taken a 3rd option, and this is what makes GM200 so interesting.

For GM200 NVIDIA’s path of choice has been to divorce graphics from high performance FP64 compute. Big Kepler was a graphics powerhouse in its own right, but it also spent quite a bit of die area on FP64 CUDA cores and some other compute-centric functionality. This allowed NVIDIA to use a single GPU across the entire spectrum – GeForce, Quadro, and Tesla – but it also meant that GK110 was a bit jack-of-all-trades. Consequently when faced with another round of 28nm chips and intent on spending their Maxwell power savings on more graphics resources (ala GM204), NVIDIA built a big graphics GPU. Big Maxwell is not the successor to Big Kepler, but rather it’s a really (really) big version of GM204.

GM200 is 601mm2 of graphics, and this is what makes it remarkable. There are no special compute features here that only Tesla and Quadro users will tap into (save perhaps ECC), rather it really is GM204 with 50% more GPU. This means we’re looking at the same SMMs as on GM204, featuring 128 FP32 CUDA cores per SMM, a 512Kbit register file, and just 4 FP64 ALUs per SMM, leading to a puny native FP64 rate of just 1/32. As a result, all of that space in GK110 occupied by FP64 ALUs and other compute hardware – and NVIDIA won’t reveal quite how much space that was – has been reinvested in FP32 ALUs and other graphics-centric hardware.

NVIDIA Big GPUs
  Die Size Native FP64 Rate
GM200 (Big Maxwell) 601mm2 1/32
GK110 (Big Kepler) 551mm2 1/3
GF110 (Big Fermi) 520mm2 1/2
GT200 (Big Tesla) 576mm2 1/8
G80 484mm2 N/A

It’s this graphics “purification” that has enabled NVIDIA to improve their performance over GK110 by 50% without increasing power consumption and with only a moderate 50mm2 (9%) increase in die size. In fact in putting together GM200, NVIDIA has done something they haven’t done for years. The last flagship GPU from the company to dedicate this little space to FP64 was G80 – heart of the GeForce 8800GTX – which in fact didn’t have any FP64 hardware at all. In other words this is the “purest” flagship graphics GPU in 9 years.

Now to be clear here, when we say GM200 favors graphics we don’t mean exclusively, but rather it favors graphics and its associated FP32 math over FP64 math. GM200 is still a FP32 compute powerhouse, unlike anything else in NVIDIA’s lineup, and we don’t expect it will be matched by anything else from NVIDIA for quite some time. For that reason I wouldn’t be too surprised if we a Tesla card using it aimed at FP32 users such the oil & gas industry – something NVIDIA has done once before with the Tesla K10 – but you won’t be seeing GM200 in the successor to Tesla K40.

This is also why the GTX Titan X is arguably not a prosumer level card like the original GTX Titan. With the GTX Titan NVIDIA shipped it with its full 1/3 rate FP64 enabled, having GTX Titan pull double duty as the company’s consumer graphics flagship while also serving as their entry-level FP64 card. For GTX Titan X however this is not an option since GM200 is not a high performance FP64 GPU, and as a result the card is riding only on its graphics and FP32 compute capabilities. Which for that matter doesn’t mean that NVIDIA won’t also try to pitch it as a high-performance FP32 card for users who don’t need Tesla, but it won’t be the same kind of entry-level compute card like the original GTX Titan was. In other words, GTX Titan X is much more consumer focused than the original GTX Titan.


Tesla K80: The Only GK210 Card

Looking at the broader picture, I’m left to wonder if this is the start of a permanent divorce between graphics/FP32 compute and FP64 compute in the NVIDIA ecosystem. Until recently, NVIDIA has always piggybacked compute on their flagship GPUs as a means of bootstrapping the launch of the Tesla division. By putting compute in their flagship GPU, even if NVIDIA couldn’t sell those GPUs to compute customers they could sell them to GeForce/Quadro graphics customers. This limited the amount of total risk the company faced, as they’d never end up with a bunch of compute GPUs they could never sell.

However in the last 6 months we’ve seen a shift from NVIDIA at both ends of the spectrum. In November we saw the launch of a Tesla K80, a dual-GPU card featuring the GK210 GPU, a reworked version of GK110 that doubled the register file and shared memory sizes for better performance. GK210 would not come to GeForce or Quadro (though in theory it could have), making it the first compute-centric GPU from NVIDIA. And now with the launch of GM200 we have distinct graphics and compute GPUs from NVIDIA.

NVIDIA GPUs By Compute
  GM200 GK210 GK110B
Stream Processors 3072 2880 2880
Memory Bus Width 384-bit 384-bit 384-bit
Register File Size (Per SM) 4 x 64KB 512KB 256KB
Shared Memory /
L1 Cache (Per SM)
96KB + 24KB 128KB 64KB
Transistor Count 8B 7.1B(?) 7.1B
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm
Architecture Maxwell Kepler Kepler
Tesla Products None K80 K40

The remaining question at this point is what happens from here. Was this divorce of compute and graphics a temporary action, the result of being stuck on the 28nm process for another generation? Or was it the first generation in a permanent divorce between graphics and compute, and consequently a divorce between GeForce/Quadro and Tesla? Is NVIDIA finally ready to let Tesla stand on its own?

With Pascal NVIDIA could very well build a jack-of-all-trades style GPU once more. However having already divorced graphics and compute for a generation, merging them again would eat up some of the power and die space benefits from going to 16nm FinFET, power and space that NVIDIA would likely want to invest in greater separate improvements in graphics and compute performance. We’ll see what Pascal brings, but I suspect GM200 is the shape of things to come for GeForce and the GTX Titan lineup.

The NVIDIA GeForce GTX Titan X Review Meet The GeForce GTX Titan X
Comments Locked

276 Comments

View All Comments

  • Refuge - Thursday, March 19, 2015 - link

    Honestly this looks more like a Ti than a Titan.
  • D. Lister - Tuesday, March 17, 2015 - link

    Nice performance/watt, but at $1000, I find the performance/dollar to be unacceptable. Without a double-precision edge, this GPU is essentially a 980Ti, and Nvidia seems to want to get away with slapping on a Titan decal (and the consequential 1K price tag) by just adding a useless amount of graphics memory.

    Take out about 4 gigs of VRAM, hold the "Titan" brand, add maybe 5-10% core clock, with an MSRP of at least $300 less, and I'll be interested. But I guess, for Nvidia to feel the need to do something like that, we'll have to wait for the next Radeon launch.
  • chizow - Tuesday, March 17, 2015 - link

    It's 980Ti with double the VRAM, a year earlier, if you are going off previous timelines. Don't undervalue the fact this is the first big Maxwell only 6 months after #2 Maxwell.

    I agree the pricing has gotten ridiculous on these graphics cards, but this is the market we live and play in now. I typically spent $800-$1000 every 2 years on graphics cards, but I would get 2 flagship cards. After the whole 7970/680 debacle where mid-range became flagship, I can now get 2 high-end midrange for that much, or 1 super premium flagship. Going with the flagship, and I'm happy! :D
  • D. Lister - Tuesday, March 17, 2015 - link

    @chizow
    It's 980Ti with double the VRAM
    Yes, pretty much - a Ti GPU, with more VRAM than necessary, with the price tag of a Titan.
    I agree the pricing has gotten ridiculous on these graphics cards, but this is the market we live and play in now.
    The market is the way it is because we, consumers, let it be that way, through our choices. For us to obediently accept, at any time, overpricing as an acceptable trend of the market, is basically like agreeing with the fox who wants to be a guard for our henhouse.
  • chizow - Wednesday, March 18, 2015 - link

    Except the 780Ti came much later, it was the 3rd GK210 chip to be released, so there is a premium on that time and money. While this is the 1st GM200 based chip, no need to look any further beyond it. Also, how many 780Ti owners complained about not enough VRAM? Looks like Nvidia addressed that. There's just no compromises with this card, its Nvidia's best foot forward for this chip and only 6 months after GTX 980. No complaints here and I had plenty when Titan launched.

    Sure the market is this way partially because we allow it, but the reality is, the demand is overwhelmingly there. I was thoroughly against paying $1000 for what I used to get for $500-$650 for Nvidia's big chip flagship card with the original Titan, but the reality is, Nvidia has raised the bar on all fronts (and AMD has done well also) and they are looking to be rewarded for doing so. I used to buy 2x cards before because 1 just wasn't good enough. Now, 1 is good enough, so I don't mind paying the same amount for that relative level of performance and enjoyment.
  • D. Lister - Wednesday, March 18, 2015 - link

    @chizow
    Except the 780Ti came much later, ...... plenty when Titan launched.
    Both the 780Ti and the Titan X were released exactly when Nvidia needed them in the market. For the 780Ti, the reason was to challenge the 290X for the top spot. The Titan X was made available sooner because a) Nvidia needed the positive press after the 970 VRAM fiasco and b) because Nvidia wanted to take some attention away from the recent 3xx announcements by AMD.

    Hence I really can't find any logical reason to agree with your spin that the Nvidia staff was doing overtime as some sort of a public service, and so deserve some reward for their noble sacrifices.

    Sure the market is this way partially because we allow it, but the reality is, the demand is overwhelmingly there. I was thoroughly against paying $1000 for what I used to get for $500-$650 for Nvidia's big chip flagship card with the original Titan, but the reality is, Nvidia has raised the bar on all fronts (and AMD has done well also) and they are looking to be rewarded for doing so. I used to buy 2x cards before because 1 just wasn't good enough. Now, 1 is good enough, so I don't mind paying the same amount for that relative level of performance and enjoyment.
    http://media2.giphy.com/media/13ayyyRnHJKrug/giphy...
  • chizow - Monday, March 23, 2015 - link

    Uh, you make a lot of assumptions while trying to dismiss the fact there is a huge difference in time to market and relative geography on Nvidia's release timeline for Titan X, and that difference carries a premium to anyone who observed or felt burned by how Titan and Kepler launches played out over 2012, 2013, 2014.

    Fact remains, Titan X is the full chip very close to the front of Maxwell's line-up release, while the 780Ti came near the end of Kepler's life cycle. The correct comparison is if Nvidia launched Titan Black in 2013 instead of the original Titan, because that's what Titan X is.

    The bolded portion should be pretty easy to digest, not sure why you are having trouble with it. Nvidia's advancement on the 28nm node has been so good (someone showed a 4x increase from the 40nm GTX 480 to the Titan X, which is damn amazing on the same node) and the relatively slow advancement in game requirements mean I no longer need 2 GPUs to push the game resolutions and settings I need. A single, super flagship card is all I need, and Nvidia has provided just that with the Titan X.

    For those who don't think it is worth it, you can always wait for something cheaper and faster to come along, but for me, I'm good until Pascal in 2016 (maybe? Oh wait, don't need to worry about that).
  • chizow - Tuesday, March 17, 2015 - link

    Bit of a sidenote, but wow looks like 980 SLI scaling has REALLY improved in the last few months. I don't recall it being that good at launch, but that's not a huge surprise given Maxwell was a new architecture and has gone through a number of big (on paper) driver improvements. Looks really good though, made it harder to go with the Titan X over a 2nd 980 for SLI, but I think I'll be happier this way for now.
  • mdriftmeyer - Tuesday, March 17, 2015 - link

    Buy these like hotcakes. And when the R9 390/390X arrives in June I pick either up and laugh at all that used hardware being dumped on EBay.
  • TheJian - Tuesday, March 17, 2015 - link

    You're assuming they'll beat this card, and I doubt you'll see them in June as the channel is stuffed with AMD's current stuff. I say Q3 and won't be as good as you think. HBM will cause pricing issues, won't net any perf (isn't needed, bandwidth isn't a problem, so wasted extra cost here) so the gpu will have to win on it's own vs. NV. You'd better hope AMD's is good enough to sell like hotcakes, as they really need the profits finally. This Q already wasted and will result in a loss most likely, and NV is good for the next 3 months at least until something competitive arrives, at which point NV just drops pricing eating any chance of AMD profits anyway. AMD has a very tough road ahead and console sales drop due to mobile closing the gap at 16/14nm for xmas (good enough that is, to have some say screw a console this gen, and screw $60 game pricing - go android instead).

Log in

Don't have an account? Sign up now