“How do you follow up on Fermi?” That’s the question we had going into NVIDIA’s press briefing for the GeForce GTX 680 and the Kepler architecture earlier this month. With Fermi NVIDIA not only captured the performance crown for gaming, but they managed to further build on their success in the professional markets with Tesla and Quadro. Though it was a very clearly a rough start for NVIDIA, Fermi ended up doing quite well in the end.

So how do you follow up on Fermi? As it turns out, you follow it up with something that is in many ways more of the same. With a focus on efficiency, NVIDIA has stripped Fermi down to the core and then built it back up again; reducing power consumption and die size alike, all while maintaining most of the aspects we’ve come to know with Fermi. The end result of which is NVIDIA’s next generation GPU architecture: Kepler.

Launching today is the GeForce GTX 680, at the heart of which is NVIDIA’s new GK104 GPU, based on their equally new Kepler architecture. As we’ll see, not only has NVIDIA retaken the performance crown with the GeForce GTX 680, but they have done so in a manner truly befitting of their drive for efficiency.

GTX 680 GTX 580 GTX 560 Ti GTX 480
Stream Processors 1536 512 384 480
Texture Units 128 64 64 60
ROPs 32 48 32 48
Core Clock 1006MHz 772MHz 822MHz 700MHz
Shader Clock N/A 1544MHz 1644MHz 1401MHz
Boost Clock 1058MHz N/A N/A N/A
Memory Clock 6.008GHz GDDR5 4.008GHz GDDR5 4.008GHz GDDR5 3.696GHz GDDR5
Memory Bus Width 256-bit 384-bit 256-bit 384-bit
Frame Buffer 2GB 1.5GB 1GB 1.5GB
FP64 1/24 FP32 1/8 FP32 1/12 FP32 1/12 FP32
TDP 195W 244W 170W 250W
Transistor Count 3.5B 3B 1.95B 3B
Manufacturing Process TSMC 28nm TSMC 40nm TSMC 40nm TSMC 40nm
Launch Price $499 $499 $249 $499

Technically speaking Kepler’s launch today is a double launch. On the desktop we have the GTX 680, based on the GK104 GPU. Meanwhile in the mobile space we have the GT640M, which is based on the GK107 GPU. While NVIDIA is not like AMD in that they don’t announce products ahead of time, it’s a sure bet that we’ll eventually see GK107 move up to the desktop and GK104 move down to laptops in the future.

What you won’t find today however – and in a significant departure from NVIDIA’s previous launches – is Big Kepler. Since the days of the G80, NVIDIA has always produced a large 500mm2+ GPU to serve both as a flagship GPU for their consumer lines and the fundamental GPU for their Quadro and Tesla lines, and have always launched with that big GPU first. At 294mm2 GK104 is not Big Kepler, and while NVIDIA doesn’t comment on unannounced products, somewhere in the bowels of NVIDIA Big Kepler certainly lives, waiting for its day in the sun. As such this is the first NVIDIA launch where we’re not in a position to talk about the ramifications for Tesla or Quadro, or really for that matter what NVIDIA’s peak performance for this generation might be.

Anyhow, we’ll jump into the full architectural details of GK104 in a bit, but let’s quickly talk about the specs first. Unlike Fermi or AMD’s GCN, Kepler is not a brand new architecture. To be sure there are some very important changes, but at a high level the workings of Kepler have not significantly changed compared to Fermi. With Kepler what we’re ultimately looking at is a die shrunk distillation of Fermi, and in the case of GK104 that’s specifically a distillation of GF114 rather than GF110.

Starting from the top, GTX 680 features a fully enabled GK104 GPU – unlike the first generation of Fermi products there are no shenanigans with disabled units here. This means GTX 680 has 1536 CUDA cores, a massive increase from GTX 580 (512) and GTX 560 Ti (384). Note however that NVIDIA has dropped the shader clock with Kepler, opting instead to double the number of CUDA cores to achieve the same effect, so while 1536 CUDA cores is a big number it’s really only twice the number of cores of GF114 as far as performance is concerned. Joining those 1536 CUDA cores are 32 ROPs and 128 texture units; the number of ROPs is effectively unchanged from GF114, while the number of texture units has been doubled. Meanwhile on the memory and cache side of things GTX 680 features a 256-bit memory bus coupled with 512KB of L2 cache.

As for clockspeeds, GTX 680 will introduce a few wrinkles courtesy of Kepler. As we mentioned before, the shader clock is gone in Kepler, with everything now running off of the core clock (or as NVIDIA likes to put it, the graphics clock). At the same time Kepler introduces the Boost Clock – effectively a turbo clock for the GPU – so we still have a 3rd clock to pay attention to. With that said, GTX 680 ships at a base clock of 1006MHz and a boost clock of 1058MHz. On the memory side of things NVIDIA has finally managed to fully hammer out their memory controller, allowing NVIDIA to ship with a memory clock of 6.006GHz.

Taken altogether, on paper GTX 680 has roughly 195% the shader performance, 260% the texture performance, 87% of the ROP performance, and 100% of the memory bandwidth of GTX 580. Or as compared to its more direct ancestor the GTX 560 Ti, GTX 680 has 244% of the shader performance, 244% of the texture performance, 122% of the ROP performance, and 150% of the memory bandwidth of GTX 560 Ti. Compared to GTX 560 Ti NVIDIA has effectively doubled every aspect of their GPU except for ROP performance, which is the one area where NVIDIA believes they already have enough performance.

On the power front, GTX 680 has a few different numbers to contend with. NVIDIA’s official TDP is 195W, though as with the GTX 500 series they still consider this is an average number rather than a true maximum. The second number is the boost target, which is the highest power level that GPU Boost will turbo to; that number is 170W. Finally, while NVIDIA doesn’t publish an official idle TDP, the GTX 680 should have an idle TDP of around 15W. Overall GTX 680 is targeted at a power envelope somewhere between GTX 560 Ti and GTX 580, though it’s closer to the former than the latter.

As for GK104 itself, as we’ve already mentioned GK104 is a smaller than average GPU for NVIDIA, with a die size of 294mm2. This is roughly 89% the size of GF114, or compared to GF110 a mere 56% of the size. Inside that 294mm2 NVIDIA packs 3.5B transistors thanks to TSMC’s 28nm process, only 500M more than GF110 and largely explaining why GK104 is so small compared to GF110. Or to once again make a comparison to GF114, this is 1050M (53%) more than GF114, which makes the fact that GK104 doubles most of GF114’s functional units all the more surprising. With Kepler NVIDIA is going to be heavily focusing on efficiency, and this is one such example of Kepler’s efficiency in action.

Last but not least, let’s talk about pricing and availability. GTX 680 is the successor to GTX 580 and NVIDIA will be pricing it accordingly, with an MSRP of $500. This is the same price that the GTX 580 and GTX 480 launched at back in 2010, and while it’s consistent for an x80 video card it’s effectively a conservative price given GK104’s die size. NVIDIA does need to bring their pricing in at the right point to combat AMD, but they’re in no more of a hurry than AMD to start any price wars, so it’s conservative pricing all around for the time being.

AMD’s competition of course is the recently launched Radeon HD 7970 and 7950. Priced at $550 and $450, the GTX 680 sits right in between them in terms of pricing. However with regard to gaming performance the GTX 680 is generally more than a match for the 7970, which is going to leave AMD in a tough spot. AMD’s partners do have factory overclocked cards, but those only close the performance gap at the cost of an even wider price gap. NVIDIA has priced the GTX 680 to undercut the 7970, and that’s exactly what will be happening today.

As for availability, we’re told that it should be similar to past high end video card launches, which is to say it will be touch and go. As with any launch NVIDIA has been stockpiling cards but it’s still a safe bet that GTX 680 will sell out in the first day. Beyond the initial launch it’s not clear whether NVIDIA will be able to keep up with demand over the next month or so. NVIDIA has been fairly forthcoming to their investors about how 28nm production is going, and while yields have been acceptable TSMC doesn’t have enough wafers to satisfy all of their customers at once, so NVIDIA is still getting fewer wafers than they’d like. Until very recently AMD’s partners have had a difficult time keeping the 7970 in stock, and it’s likely it will be the same story for NVIDIA’s partners.

The Kepler Architecture: Fermi Distilled
Comments Locked

404 Comments

View All Comments

  • Slayer68 - Saturday, March 24, 2012 - link

    Being able to run 3 screens off of one card is new for Nvidia. Barely even mentioned it in your review. It would be nice to see Nvidia surround / Eyefinity compared on these new cards. Especially interested in scaling at 5760 x 1080 between a 680 and 7970.....
  • ati666 - Saturday, March 24, 2012 - link

    does the gtx680 still have the same anisotropic filtering pattern like the gtx470/480/570/580 (octagonal pattern) or is it like AMDs HD7970 all angle-independent anisotropic filtering (circular pattern)?
  • Ryan Smith - Saturday, March 24, 2012 - link

    It's not something we were planning on publishing, but it is something we checked. It's still the same octagon pattern as Fermi. It would be nice if NVIDIA did have angle-independent AF, but to be honest the difference between that and what NVIDIA does has been so minor that it's not something we've ever been able to create a noticeable issue with in the real world.

    Now Intel's AF on the other hand...
  • ati666 - Saturday, March 24, 2012 - link

    thank for the reply, now i can finally make a decision to buy hd7970 or gtx680..
  • CeriseCogburn - Saturday, March 24, 2012 - link

    Yes I thank him too for finally coming clean and noting the angle independent amd algorithm he's been fanboy over for a long time has absolutely no real world gaming advantage whatsoever.
    It's a big fat zero of nothing but FUD for fanboys.
    It would be nice if notional advantages actually showed up in games, and when they don't or for the life of the reviewer cannot be detected in games, that be clearly stated and the insane "advantage" declared be called what it really is, a useless talking point of deception that fools purchasers instead of enlightening them.
    The biased emphasis with zero advantage is as unscientific as it gets. Worse yet, within the same area, the "perfectly round algorithm" yielded in game transition lines with the amd cards, denied by the reviewer for what, a year ? Then a race game finally convinced him, and in this 7000 series release we find another issue the "perfectly round algorithm" apparently was attached to flaw with, a "poor transition resolution" - rather crudely large instead of fine like Nvidia's which casued excessive amd shimmering in game, and we are treated to that information only now after the 7000 series "solved" the issue and brought it near or up to the GTX long time standard.
    So this whole "perfectly round algorithm" has been nothing but fanboy lies for amd all along, while ignoring at least 2 large IQ issues when it was "put to use" in game. (transition shading and shimmering)
    I'm certain an explanation could be given that there are other factors with differing descriptive explanation, like the fineness of textural changes as one goes toward center of the image not directly affecting roundness one way or another, used as an excuse, perhaps the self deceptive justification that allowed such misbehavior to go on for so long.
  • _vor_ - Saturday, March 24, 2012 - link

    Will you seriously STFU already? It's hard to read this discussion with your blatant and belligerent jackassery all over it.

    You love NVIDIA. Great. Now STFU and stop posting.
  • CeriseCogburn - Saturday, March 24, 2012 - link

    Great attack, did I get anything wrong at all ? I guess not.
  • silverblue - Monday, March 26, 2012 - link

    Could you provide a link to an article based on this subject, please? Not an attack; just curious.
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    http://www.anandtech.com/show/5261/amd-radeon-hd-7...

    http://forums.anandtech.com/showpost.php?p=3152067...

    " So what then is going on that made Civ V so much faster for NVIDIA? Admittedly I had to press NVIDIA for this - performance practically doubled on high-end GPUs, which is unheard of. Until they told me what exactly they did, I wasn't convinced it was real or if they had come up with a really sweet cheat. It definitely wasn't a cheat.

    If you recall from our articles, I keep pointing to how we seem to be CPU limited at the time. "

    (YES, SO THAT'S WHAT WE GOT, THEY'RE CHEATING IT'S FAKE WE'RE CPU LIMITED- ALL WRONG ALL LIES)

    Since AMD’s latest changes are focused on reducing shimmering in motion we’ve put together a short video of the 3D Center Filter Tester running the tunnel test with the 7970, the 6970, and GTX 580. The tunnel test makes the differences between the 7970 and 6970 readily apparent, and at this point both the 7970 and GTX 580 have similarly low levels of shimmering.

    with both implementing DX9 SSAA with the previous generation of GPUs, and AMD catching up to NVIDIA by implementing Enhanced Quality AA (their version of NVIDIA’s CSAA) with Cayman. Between Fermi and Cayman the only stark differences are that AMD offers their global faux-AA MLAA filter, while NVIDIA has support for true transparency and super sample anti-aliasing on DX10+ games.

    (AMD FINALLY CATCHES UP IN EQAA PART, NVIDIA TRUE STANS AND SUPER SAMPLE HIGH Q STUFF, AMD CHEAT AND BLUR AND BLUR TEXT)

    Thus I had expected AMD to close the gap from their end with Southern Islands by implementing DX10+ versions of Adaptive AA and SSAA, but this has not come to pass.

    ( AS I INTERPRETED AMD IS WAY BEHIND STILL A GAP TO CLOSE ! )

    AMD has not implemented any new AA modes compared to Cayman, and as a result AAA and SSAA continue to only available in DX9 titles.

    Finally, while AMD may be taking a break when it comes to anti-aliasing they’re still hard at work on tessellation

    ( BECAUSE THEY'RE BEHIND IN TESSELLATION TOO.)

    Don't forget amd has a tessellation cheat in their 7000 series driver, so 3dmark 11 is cheated on as is unigine heaven, while Nvidia does no such thing.

    ---
    I do have more like the race car game admission, but I think that's enough helping you doing homework .
  • CeriseCogburn - Tuesday, March 27, 2012 - link

    So here's more mr curious ..
    " “There’s nowhere left to go for quality beyond angle-independent filtering at the moment.”

    With the launch of the 5800 series last year, I had high praise for AMD’s anisotropic filtering. AMD brought truly angle-independent filtering to gaming (and are still the only game in town), putting an end to angle-dependent deficiencies and especially AMD’s poor AF on the 4800 series. At both the 5800 series launch and the GTX 480 launch, I’ve said that I’ve been unable to find a meaningful difference or deficiency in AMD’s filtering quality, and NVIDIA was only deficienct by being not quite angle-independent. I have held – and continued to hold until last week – the opinion that there’s no practical difference between the two.

    It turns out I was wrong. Whoops.

    The same week as when I went down to Los Angeles for AMD’s 6800 series press event, a reader sent me a link to a couple of forum topics discussing AF quality. While I still think most of the differences are superficial, there was one shot comparing AMD and NVIDIA that caught my attention: Trackmania."

    " The shot clearly shows a transition between mipmaps on the road, something filtering is supposed to resolve. In this case it’s not a superficial difference; it’s very noticeable and very annoying.

    AMD appears to agree with everyone else. As it turns out their texture mapping units on the 5000 series really do have an issue with texture filtering, specifically when it comes to “noisy” textures with complex regular patterns. AMD’s texture filtering algorithm was stumbling here and not properly blending the transitions between the mipmaps of these textures, resulting in the kind of visible transitions that we saw in the above Trackmania screenshot. "

    http://www.anandtech.com/show/3987/amds-radeon-687...

    WE GET THIS AFTER 6000 SERIES AMD IS RELEASED, AND DENIAL UNTIL, NOW WE GET THE SAME THING ONCE 7000 SERIES IS RELEASED, AND COMPLETE DENIAL BEFORE THAT...

    HERE'S THE 600 SERIES COVERUP THAT COVERS UP 5000 SERIES AFTER ADMITTING THE PROBLEM A WHOLE GENERATION LATE
    " So for the 6800 series, AMD has refined their texture filtering algorithm to better handle this case. Highly regular textures are now filtered properly so that there’s no longer a visible transition between them. As was the case when AMD added angle-independent filtering we can’t test the performance impact of this since we don’t have the ability to enable/disable this new filtering algorithm, but it should be free or close to it. In any case it doesn’t compromise AMD’s existing filtering features, and goes hand-in-hand with their existing angle-independent filtering."

    NOW DON'T FORGET RYAN HAS JUST ADMITTED AMD ANGLE INDEPENDENT ALGORITHM IS WORTH NOTHING IN REAL GAME- ABSOLUTELY NOTHING.

Log in

Don't have an account? Sign up now