“How do you follow up on Fermi?” That’s the question we had going into NVIDIA’s press briefing for the GeForce GTX 680 and the Kepler architecture earlier this month. With Fermi NVIDIA not only captured the performance crown for gaming, but they managed to further build on their success in the professional markets with Tesla and Quadro. Though it was a very clearly a rough start for NVIDIA, Fermi ended up doing quite well in the end.

So how do you follow up on Fermi? As it turns out, you follow it up with something that is in many ways more of the same. With a focus on efficiency, NVIDIA has stripped Fermi down to the core and then built it back up again; reducing power consumption and die size alike, all while maintaining most of the aspects we’ve come to know with Fermi. The end result of which is NVIDIA’s next generation GPU architecture: Kepler.

Launching today is the GeForce GTX 680, at the heart of which is NVIDIA’s new GK104 GPU, based on their equally new Kepler architecture. As we’ll see, not only has NVIDIA retaken the performance crown with the GeForce GTX 680, but they have done so in a manner truly befitting of their drive for efficiency.

GTX 680 GTX 580 GTX 560 Ti GTX 480
Stream Processors 1536 512 384 480
Texture Units 128 64 64 60
ROPs 32 48 32 48
Core Clock 1006MHz 772MHz 822MHz 700MHz
Shader Clock N/A 1544MHz 1644MHz 1401MHz
Boost Clock 1058MHz N/A N/A N/A
Memory Clock 6.008GHz GDDR5 4.008GHz GDDR5 4.008GHz GDDR5 3.696GHz GDDR5
Memory Bus Width 256-bit 384-bit 256-bit 384-bit
Frame Buffer 2GB 1.5GB 1GB 1.5GB
FP64 1/24 FP32 1/8 FP32 1/12 FP32 1/12 FP32
TDP 195W 244W 170W 250W
Transistor Count 3.5B 3B 1.95B 3B
Manufacturing Process TSMC 28nm TSMC 40nm TSMC 40nm TSMC 40nm
Launch Price $499 $499 $249 $499

Technically speaking Kepler’s launch today is a double launch. On the desktop we have the GTX 680, based on the GK104 GPU. Meanwhile in the mobile space we have the GT640M, which is based on the GK107 GPU. While NVIDIA is not like AMD in that they don’t announce products ahead of time, it’s a sure bet that we’ll eventually see GK107 move up to the desktop and GK104 move down to laptops in the future.

What you won’t find today however – and in a significant departure from NVIDIA’s previous launches – is Big Kepler. Since the days of the G80, NVIDIA has always produced a large 500mm2+ GPU to serve both as a flagship GPU for their consumer lines and the fundamental GPU for their Quadro and Tesla lines, and have always launched with that big GPU first. At 294mm2 GK104 is not Big Kepler, and while NVIDIA doesn’t comment on unannounced products, somewhere in the bowels of NVIDIA Big Kepler certainly lives, waiting for its day in the sun. As such this is the first NVIDIA launch where we’re not in a position to talk about the ramifications for Tesla or Quadro, or really for that matter what NVIDIA’s peak performance for this generation might be.

Anyhow, we’ll jump into the full architectural details of GK104 in a bit, but let’s quickly talk about the specs first. Unlike Fermi or AMD’s GCN, Kepler is not a brand new architecture. To be sure there are some very important changes, but at a high level the workings of Kepler have not significantly changed compared to Fermi. With Kepler what we’re ultimately looking at is a die shrunk distillation of Fermi, and in the case of GK104 that’s specifically a distillation of GF114 rather than GF110.

Starting from the top, GTX 680 features a fully enabled GK104 GPU – unlike the first generation of Fermi products there are no shenanigans with disabled units here. This means GTX 680 has 1536 CUDA cores, a massive increase from GTX 580 (512) and GTX 560 Ti (384). Note however that NVIDIA has dropped the shader clock with Kepler, opting instead to double the number of CUDA cores to achieve the same effect, so while 1536 CUDA cores is a big number it’s really only twice the number of cores of GF114 as far as performance is concerned. Joining those 1536 CUDA cores are 32 ROPs and 128 texture units; the number of ROPs is effectively unchanged from GF114, while the number of texture units has been doubled. Meanwhile on the memory and cache side of things GTX 680 features a 256-bit memory bus coupled with 512KB of L2 cache.

As for clockspeeds, GTX 680 will introduce a few wrinkles courtesy of Kepler. As we mentioned before, the shader clock is gone in Kepler, with everything now running off of the core clock (or as NVIDIA likes to put it, the graphics clock). At the same time Kepler introduces the Boost Clock – effectively a turbo clock for the GPU – so we still have a 3rd clock to pay attention to. With that said, GTX 680 ships at a base clock of 1006MHz and a boost clock of 1058MHz. On the memory side of things NVIDIA has finally managed to fully hammer out their memory controller, allowing NVIDIA to ship with a memory clock of 6.006GHz.

Taken altogether, on paper GTX 680 has roughly 195% the shader performance, 260% the texture performance, 87% of the ROP performance, and 100% of the memory bandwidth of GTX 580. Or as compared to its more direct ancestor the GTX 560 Ti, GTX 680 has 244% of the shader performance, 244% of the texture performance, 122% of the ROP performance, and 150% of the memory bandwidth of GTX 560 Ti. Compared to GTX 560 Ti NVIDIA has effectively doubled every aspect of their GPU except for ROP performance, which is the one area where NVIDIA believes they already have enough performance.

On the power front, GTX 680 has a few different numbers to contend with. NVIDIA’s official TDP is 195W, though as with the GTX 500 series they still consider this is an average number rather than a true maximum. The second number is the boost target, which is the highest power level that GPU Boost will turbo to; that number is 170W. Finally, while NVIDIA doesn’t publish an official idle TDP, the GTX 680 should have an idle TDP of around 15W. Overall GTX 680 is targeted at a power envelope somewhere between GTX 560 Ti and GTX 580, though it’s closer to the former than the latter.

As for GK104 itself, as we’ve already mentioned GK104 is a smaller than average GPU for NVIDIA, with a die size of 294mm2. This is roughly 89% the size of GF114, or compared to GF110 a mere 56% of the size. Inside that 294mm2 NVIDIA packs 3.5B transistors thanks to TSMC’s 28nm process, only 500M more than GF110 and largely explaining why GK104 is so small compared to GF110. Or to once again make a comparison to GF114, this is 1050M (53%) more than GF114, which makes the fact that GK104 doubles most of GF114’s functional units all the more surprising. With Kepler NVIDIA is going to be heavily focusing on efficiency, and this is one such example of Kepler’s efficiency in action.

Last but not least, let’s talk about pricing and availability. GTX 680 is the successor to GTX 580 and NVIDIA will be pricing it accordingly, with an MSRP of $500. This is the same price that the GTX 580 and GTX 480 launched at back in 2010, and while it’s consistent for an x80 video card it’s effectively a conservative price given GK104’s die size. NVIDIA does need to bring their pricing in at the right point to combat AMD, but they’re in no more of a hurry than AMD to start any price wars, so it’s conservative pricing all around for the time being.

AMD’s competition of course is the recently launched Radeon HD 7970 and 7950. Priced at $550 and $450, the GTX 680 sits right in between them in terms of pricing. However with regard to gaming performance the GTX 680 is generally more than a match for the 7970, which is going to leave AMD in a tough spot. AMD’s partners do have factory overclocked cards, but those only close the performance gap at the cost of an even wider price gap. NVIDIA has priced the GTX 680 to undercut the 7970, and that’s exactly what will be happening today.

As for availability, we’re told that it should be similar to past high end video card launches, which is to say it will be touch and go. As with any launch NVIDIA has been stockpiling cards but it’s still a safe bet that GTX 680 will sell out in the first day. Beyond the initial launch it’s not clear whether NVIDIA will be able to keep up with demand over the next month or so. NVIDIA has been fairly forthcoming to their investors about how 28nm production is going, and while yields have been acceptable TSMC doesn’t have enough wafers to satisfy all of their customers at once, so NVIDIA is still getting fewer wafers than they’d like. Until very recently AMD’s partners have had a difficult time keeping the 7970 in stock, and it’s likely it will be the same story for NVIDIA’s partners.

The Kepler Architecture: Fermi Distilled
POST A COMMENT

405 Comments

View All Comments

  • Targon - Thursday, March 22, 2012 - link

    Many people have been blasting AMD for price vs performance in the GPU arena in the current round of fighting. The thing is, until now, AMD had no competition, so it was expected that the price would remain high until NVIDIA released their new generation. So, expect lower prices from AMD to be released in the next week.

    You also fail to realize that with a 3 month lead, AMD is that much closer to the refresh parts being released that will beat NVIDIA for price vs. performance. Power draw may still be higher from the refresh parts, but that will be addressed for the next generation.

    Now, you and others have been claiming that NVIDIA is somehow blowing AMD out of the water in terms of performance, and that is NOT the case. Yes, the 680 is faster, but isn't so much faster that AMD couldn't EASILY counter with a refresh part that catches up or beats the 680 NEXT WEEK. The 7000 series has a LOT of overclocking room there.

    So, keep things in perspective. A 3 percent performance difference isn't enough to say that one is so much better than the other. It also remains to be seen how quickly the new NVIDIA parts will be made available.
    Reply
  • SlyNine - Thursday, March 22, 2012 - link

    I still blast them, I'm not happy with the price/performance increase of this generation at all. Reply
  • Unspoken Thought - Thursday, March 22, 2012 - link

    Finally! Logic! But it still falls on deaf ears. We finally see both sides getting their act together to get minimum features sets in, and we can't see passed our own bias squabbles.

    How about we continue to push these manufactures in what we want and need most; more features, better algorithms, and last and most important, revolutionize and find new way to render, aside from vector based rendering.

    Lets start incorporating high level mathematics for fluid dynamics and the such. They have already absorbed PhysX and moved to Direct Compute. Lets see more realism in games!

    Where is the Technological Singularity when you need one.
    Reply
  • CeriseCogburn - Thursday, March 22, 2012 - link

    Well, the perspective I have is amd had a really lousy (without drivers) and short 2.5 months when the GTX580 wasn't single core king w GTX590 dual core king and the latter still is and the former has been replaced by the GTX680.
    So right now Nvidia is the asbolute king, and before now save that very small time period Nvidia was core king for what with the 580 .. 1.5 years ?
    That perspective is plain fact.
    FACTS- just stating those facts makes things very clear.
    We already have heard the Nvidia monster die is afoot - that came out with "all the other lies" that turned out to be true...
    I don't fail to realize anything - I just have a clear mind about what has happened.
    I certainly hope AMD has a new better core slapped down very soon, a month would be great.
    Until AMD is really winning, it's LOSING man, it's LOSING!
    Reply
  • CeriseCogburn - Thursday, March 22, 2012 - link

    Since amd had no competition for 2.5 months and that excuses it's $569.99 price, then certainly the $500 price on the GTX580 that had no competition for a full year and a half was not an Nvidia fault, right ? Because you're a fair person and "finally logic!" is what another poster supported you with...
    So thanks for saying the GTX580 was never priced too high because it has no competition for 1.5 years.

    Reply
  • Unspoken Thought - Saturday, March 24, 2012 - link

    Honestly the only thing I was supporting was the fact he is showing that perspective changes everything. a fact exacerbated when bickering over marginal differences that are driven by the economy when dealing with price vs performance.

    Both of you have valid arguments, but it sounds like you just want to feel better about supporting nVidia.

    You should be able to see how AMD achieved its goals with nVidia following along playing leap frog. Looking at benchmarks, no it doesn't beat it in everything and both are very closely matched in power consumption, heat, and noise. Features are where nVidia shine and get my praise. but I would not fault you if you had either card.
    Reply
  • CeriseCogburn - Friday, April 06, 2012 - link

    Ok Targon, now we know TiN put the 1.3V core on the 680 and it OC'ed on air to 1,420 core, surpassing every 7970 1.3V core overclock out there.
    Furthermore, Zotac has put out the 2,000Ghz 680 edition...
    So it appears the truth comes down to the GTX680 has more left in the core than the 7970.
    Nice try but no cigar !
    Nice spin but amd does not win !
    Nice prediction, but it was wrong.
    Reply
  • SlyNine - Thursday, March 22, 2012 - link

    Go back and look at the benchmarks idiot. 7970 wins in some situations. Reply
  • SlyNine - Thursday, March 22, 2012 - link

    In Crysis max, 7970 gets 36 FPS while the 680 only gets 30 FPS.

    Yes, some how the 7970 is losing. LOOK AT THE NUMBERS, HELLO!!???

    Metro 2033 the 7970 gets 38 and the 680 gets 37. But yet in your eyes another loss for the 7970...

    7970 kills it in certian GPU Compute, and has hardware H.264 encoding.

    In a couple of games, which you already get massive FPS with both, the 680 boasts much much higher FPS. But than in games where you need the FPS the 7970 wins. Hmmm

    But no no, you're right, the 680 is total elite top shit.
    Reply
  • eddieroolz - Friday, March 23, 2012 - link

    You pretty much admitted that 7970 loses in a lot of other cases by stating that:

    "7970 kills it in certain GPU compute..."

    Adding the word modifier "certain" to describe a win is like admitting defeat in every other compute situation.

    Even for the games, you can only mention 2 out of what, 6 games where 7970 wins by a <10% margin. Meanwhile, GTX 680 proceeds to maul the 7970 by >15% in at least 2 of the games.

    Yes, 7970 is full of win, indeed! /s
    Reply

Log in

Don't have an account? Sign up now