GM200 - All Graphics, Hold The Double Precision

Before diving into our look at the GTX Titan X itself, I want to spend a bit of time talking about the GM200 GPU. GM200 is a very interesting GPU, and not for the usual reasons. In fact you could say that GM200 is remarkable for just how unremarkable it is.

From a semiconductor manufacturing standpoint we’re still at a standstill on 28nm for at least a little bit longer, pushing 28nm into its 4th year and having all sorts of knock-on effects. We’ve droned on about this for some time now, so we won’t repeat ourselves, but ultimately what it means for consumers is that AMD and NVIDIA have needed to make do with the tools they have, and in lieu of generational jumps in manufacturing have focused on architectural efficiency and wringing out everything they can get out of 28nm.

For NVIDIA those improvements came in the form of the company’s Maxwell architecture, which has made a concentrated effort to focus on energy and architectural efficiency to get the most out of their technology. In assembling GM204 NVIDIA built the true successor to GK104, putting together a pure graphics chip. From a design standpoint NVIDIA spent their energy efficiency gains on growing out GM204’s die size without increasing power, allowing them to go from 294mm2 and 3.5B transistors to 398mm2 and 5.2B transistors. With a larger die and larger transistor budget, NVIDIA was able to greatly increase performance by laying down a larger number of high performance (and relatively larger themselves) Maxwell SMMs.

On the other hand for GM206 and the GTX 960, NVIDIA banked the bulk of their energy savings, building what’s best described as half of a GM204 and leading to a GPU that didn’t offer as huge of a jump in performance from its predecessor (GK106) but also brought power usage down and kept costs in check.


Not Pictured: The 96 FP64 ALUs

But for Big Maxwell, neither option was open to NVIDIA. At 551mm2 GK110 was already a big GPU, so large (33%) increase in die size like with GM204 was not practical. Neither was leaving the die size at roughly the same area and building the Maxwell version of GK110, gaining only limited performance in the process. Instead NVIDIA has taken a 3rd option, and this is what makes GM200 so interesting.

For GM200 NVIDIA’s path of choice has been to divorce graphics from high performance FP64 compute. Big Kepler was a graphics powerhouse in its own right, but it also spent quite a bit of die area on FP64 CUDA cores and some other compute-centric functionality. This allowed NVIDIA to use a single GPU across the entire spectrum – GeForce, Quadro, and Tesla – but it also meant that GK110 was a bit jack-of-all-trades. Consequently when faced with another round of 28nm chips and intent on spending their Maxwell power savings on more graphics resources (ala GM204), NVIDIA built a big graphics GPU. Big Maxwell is not the successor to Big Kepler, but rather it’s a really (really) big version of GM204.

GM200 is 601mm2 of graphics, and this is what makes it remarkable. There are no special compute features here that only Tesla and Quadro users will tap into (save perhaps ECC), rather it really is GM204 with 50% more GPU. This means we’re looking at the same SMMs as on GM204, featuring 128 FP32 CUDA cores per SMM, a 512Kbit register file, and just 4 FP64 ALUs per SMM, leading to a puny native FP64 rate of just 1/32. As a result, all of that space in GK110 occupied by FP64 ALUs and other compute hardware – and NVIDIA won’t reveal quite how much space that was – has been reinvested in FP32 ALUs and other graphics-centric hardware.

NVIDIA Big GPUs
  Die Size Native FP64 Rate
GM200 (Big Maxwell) 601mm2 1/32
GK110 (Big Kepler) 551mm2 1/3
GF110 (Big Fermi) 520mm2 1/2
GT200 (Big Tesla) 576mm2 1/8
G80 484mm2 N/A

It’s this graphics “purification” that has enabled NVIDIA to improve their performance over GK110 by 50% without increasing power consumption and with only a moderate 50mm2 (9%) increase in die size. In fact in putting together GM200, NVIDIA has done something they haven’t done for years. The last flagship GPU from the company to dedicate this little space to FP64 was G80 – heart of the GeForce 8800GTX – which in fact didn’t have any FP64 hardware at all. In other words this is the “purest” flagship graphics GPU in 9 years.

Now to be clear here, when we say GM200 favors graphics we don’t mean exclusively, but rather it favors graphics and its associated FP32 math over FP64 math. GM200 is still a FP32 compute powerhouse, unlike anything else in NVIDIA’s lineup, and we don’t expect it will be matched by anything else from NVIDIA for quite some time. For that reason I wouldn’t be too surprised if we a Tesla card using it aimed at FP32 users such the oil & gas industry – something NVIDIA has done once before with the Tesla K10 – but you won’t be seeing GM200 in the successor to Tesla K40.

This is also why the GTX Titan X is arguably not a prosumer level card like the original GTX Titan. With the GTX Titan NVIDIA shipped it with its full 1/3 rate FP64 enabled, having GTX Titan pull double duty as the company’s consumer graphics flagship while also serving as their entry-level FP64 card. For GTX Titan X however this is not an option since GM200 is not a high performance FP64 GPU, and as a result the card is riding only on its graphics and FP32 compute capabilities. Which for that matter doesn’t mean that NVIDIA won’t also try to pitch it as a high-performance FP32 card for users who don’t need Tesla, but it won’t be the same kind of entry-level compute card like the original GTX Titan was. In other words, GTX Titan X is much more consumer focused than the original GTX Titan.


Tesla K80: The Only GK210 Card

Looking at the broader picture, I’m left to wonder if this is the start of a permanent divorce between graphics/FP32 compute and FP64 compute in the NVIDIA ecosystem. Until recently, NVIDIA has always piggybacked compute on their flagship GPUs as a means of bootstrapping the launch of the Tesla division. By putting compute in their flagship GPU, even if NVIDIA couldn’t sell those GPUs to compute customers they could sell them to GeForce/Quadro graphics customers. This limited the amount of total risk the company faced, as they’d never end up with a bunch of compute GPUs they could never sell.

However in the last 6 months we’ve seen a shift from NVIDIA at both ends of the spectrum. In November we saw the launch of a Tesla K80, a dual-GPU card featuring the GK210 GPU, a reworked version of GK110 that doubled the register file and shared memory sizes for better performance. GK210 would not come to GeForce or Quadro (though in theory it could have), making it the first compute-centric GPU from NVIDIA. And now with the launch of GM200 we have distinct graphics and compute GPUs from NVIDIA.

NVIDIA GPUs By Compute
  GM200 GK210 GK110B
Stream Processors 3072 2880 2880
Memory Bus Width 384-bit 384-bit 384-bit
Register File Size (Per SM) 4 x 64KB 512KB 256KB
Shared Memory /
L1 Cache (Per SM)
96KB + 24KB 128KB 64KB
Transistor Count 8B 7.1B(?) 7.1B
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm
Architecture Maxwell Kepler Kepler
Tesla Products None K80 K40

The remaining question at this point is what happens from here. Was this divorce of compute and graphics a temporary action, the result of being stuck on the 28nm process for another generation? Or was it the first generation in a permanent divorce between graphics and compute, and consequently a divorce between GeForce/Quadro and Tesla? Is NVIDIA finally ready to let Tesla stand on its own?

With Pascal NVIDIA could very well build a jack-of-all-trades style GPU once more. However having already divorced graphics and compute for a generation, merging them again would eat up some of the power and die space benefits from going to 16nm FinFET, power and space that NVIDIA would likely want to invest in greater separate improvements in graphics and compute performance. We’ll see what Pascal brings, but I suspect GM200 is the shape of things to come for GeForce and the GTX Titan lineup.

The NVIDIA GeForce GTX Titan X Review Meet The GeForce GTX Titan X
Comments Locked

276 Comments

View All Comments

  • Kevin G - Wednesday, March 18, 2015 - link

    There was indeed a bigger chip due closer to the GK104/GTX 680's launch: the GK100. However it was cancelled due to bugs in the design. A fixed revision eventually became the GK110 which was ultimately released as the Titan/GTX 780.

    After that there have been two more revisions. The GK110B is quick respin which all fully enabled dies stem from (Titan Black/GTX 780 Ti). Then late last nVidia surprised everyone with the GK210 which has a handful of minor architectural improvements (larger register files etc.).

    The morale of the story is that building large dies is hard and takes lots of time to get right.
  • chizow - Monday, March 23, 2015 - link

    We don't know what happened to GK100, it is certainly possible as I've guessed aloud numerous times that AMD's 7970 and overall lackluster pricing/performance afforded Nvidia the opportunity to scrap GK100 and respin it to GK110 while trotting GK104 out as its flagship, because it was close enough to AMD's best and GK100 may have had problems as you described. All of that led to considerable doubt whether or not we would see a big Kepler, a sentiment that was even dishonestly echoed by some Nvidia employees I got into it with on their forums.

    Only in October 2012 did we see signs of Big Kepler in the Titan supercomputer with K20X, but still no sign of a GeForce card. Its no doubt that a big die takes time, but Nvidia had always led with their big chip first, since G80 and this was the first time they deviated from that strategy while parading what was clearly their 2nd best, mid-range performance ASIC as flagship.

    Titan X sheds all that nonsense and goes back to their gaming roots. It is their best effort, up front, no BS. 8Bn transistors Inspired by Gamers and Made by Nvidia. So as someone who buys GeForce for gaming first and foremost, I'm going to reward them for those efforts so they keep rewarding me with future cards of this kind. :)
  • Railgun - Wednesday, March 18, 2015 - link

    With regards to the price, 12GB of RAM isn't justification enough for it. Memory isn't THAT expensive in the grand scheme of things. What the Titan was originally isn't what the Titan X is now. They can't be seen as the same lineage. If you want to say memory is the key, the original Titan with its 6GB could be seen as more than still relevant today. Crysis is 45% faster in 4K with the X than the original. Is that the chip itself or memory helping? I vote the former given the 690 is 30% faster in 4K with the same game than the original Titan, with only 4GB total memory. VRAM isn't going to really be relevant for a bit other than those that are running stupidly large spans. It's a shame as Ryan touches on VRAM usage in Middle Earth, but doesn't actually indicate what's being used. There too, the 780Ti beats the original Titan sans huge VRAM reserves. Granted, barely, but point being is that VRAM isn't the reason. This won't be relevant for a bit I think.

    You can't compare an aftermarket price to how an OEM prices their products. The top tier card other than the TiX is the 980, which has been mentioned ad nauseam that the TiX is NOT worth 80% more given its performance. If EVGA wants to OC a card out of their shop and charge 45% more than a stock clock card, then buyer beware if it's not a 45% gain in performance. I for one don't see the benefit of a card like that. The convenience isn't there given the tools and community support for OCing something one's self.

    I too game on 25x14 and there've been zero issues regarding VRAM, or the lack thereof.
  • chizow - Monday, March 23, 2015 - link

    I didn't say VRAM was the only reason, I said it was one of the reasons. The bigger reason for me is that it is the FULL BOAT GM200 front and center. No waiting. No cut cores. No cut SMs for compute. No cut down part because of TDP. It's 100% of it up front, 100% of it for gaming. I'm sold and onboard until Pascal. That really is the key factor, who wants to wait for unknown commodities and timelines if you know this will set you within +/-10% of the next fastest part's performance if you can guarantee you get it today for maybe a 25-30% premium? I guess it really depends on how much you value your current and near-future gaming experience. I knew from the day I got my ROG Swift (with 2x670 SLI) I would need more to drive it. 980 was a bit of a sidegrade in absolute performance and I still knew i needed more perf, and now I have it with Titan X.

    As for VRAM, 12GB is certainly overkill today, but I'd say 6GB isn't going to be enough soon enough. Games are already pushing 4GB (SoM, FC4, AC:U) and that's still with last-gen type textures. Once you start getting console ports with PC texture packs I could see 6 and 8GB being pushed quite easily, as that is the target framebuffer for consoles (2+6). So yes, while 12GB may be too much, 6GB probably isn't enough, especially once you start looking at 4K and Surround.

    Again, if you don't think the price is worth it over a 980 that's fine and fair, but the reality of it is, if you want better single-GPU performance there is no alternative. A 2nd 980 for SLI is certainly an option, but for my purposes and my resolution, I would prefer to stick to a single-card solution if possible, which is why I went with a Titan X and will be selling my 980 instead of picking up a 2nd one as I originally intended.

    Best part about Titan X is it gives another choice and a target level of performance for everyone else!
  • Frenetic Pony - Tuesday, March 17, 2015 - link

    They could've halved the ram, dropped the price by $200, and done a lot better without much to any performance hit.
  • Denithor - Wednesday, March 18, 2015 - link

    LOL.

    You just described the GTX 980 Ti, which will likely launch within a few months to answer the 390X.
  • chizow - Wednesday, March 18, 2015 - link

    @Frenetic Pony, maybe now, but what about once DX12 drops and games are pushing over 6GB? We already see games saturating 4GB, and we still haven't seen next-gen engine games like UE4. Why compromise for a few hundred less? You haven't seen all the complaints from 780Ti users about how 3GB isn't enough anymore? Shoudn't be a problem for this card, which is just 1 less thing to worry about.
  • LukaP - Thursday, March 19, 2015 - link

    Games dont push 4GB... Check the LTT Ultrawide video, where he barely got Shadow of Mordor on ultra to go past 4GBs on 3 ulrawide 1440p screens.

    And as a game dev i can tell you, with proper optimisations, more than 4GB is insane, on a GPU, unless you just load stuff in with a predictive algorithm, to avoid PCIe bottlenecks.

    And please do show me where a 780Ti user isnt happy with his cards performance at 1080-1600p. Because the card does, and will continue to perform great on those resolutions, since games wont really advance, due to consoles limiting again.
  • LukaP - Thursday, March 19, 2015 - link

    Also, DX12 wont make games magically use more VRAM. all it really does is it makes the CPU and GPU communicate better. It wont magically make games run or look better. both of those are up to the devs, and the look better part is certainly not the textures or polycounts. Its merely the amount of drawcalls per frame going up, meaning more UNIQUE objects. (contrary to more objects, which can be achieved through instancing easily in any modern engine, but Ubisoft havent learned that yet)
  • chizow - Monday, March 23, 2015 - link

    DX12 raises the bar for all games by enabling better visuals, you're going to get better top-end visuals across the board. Certainly you don't think UE4 when it debuts will have the same reqs as DX11 based games on UE3?

    Even if you have the same size textures as before 2K or 4K assets as is common now, the fact you are drawing more polygons enabled by DX12's lower overhead, higher draw call/poly capabilities means they need to be textured, meaning higher VRAM requirement unless you are using the same textures over and over again.

    Also, since you are a game dev, you would also know Devs are going more and more towards bindless or megatextures that specifically make great use of textures staying resident in local VRAM for faster accesses, rather than having to optimize and cache/load/discharge them.

Log in

Don't have an account? Sign up now