NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260

Name: NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260
Item: NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on June 16, 2008 9:00 AM EST

Posted in
GPUs

108 Comments | Add A Comment

108 Comments

Tweaks and Enahancements in GT200

NVIDIA provided us with a list, other than the obvious addition of units and major enhancements in features and technology, of adjustments made from G80 to GT200. These less obvious changes are part of what makes this second generation Tesla architecture a well evolved G80. First up, here's a quick look at percent increases from G80 to GT200.

NVIDIA Architecture Comparison	8800 GTX	GTX 280	% Increase
Cores	128	240	87.5%
Texture	64t/clk	80t/clk	25%
ROP Blend	12p / clk	32p / clk	167%
Max Precision	fp32	fp64
GFLOPs	518	933	80%
FB Bandwidth	86 GB/s	142 GB/s	65%
Texture Fill Rate	37 GT/s	48 GT/s	29.7%
ROP Blend Rate	7 GBL/s	19 GBLs	171%
PCI Express Bandwidth	6.4 GB/s	12.8GB/s	100%
Video Decode	VP1	VP2

Communication between the driver and the front-end hardware has been enhanced through changes to the communications protocol. These changes were designed to help facilitate more efficient data movement between the driver and the hardware. On G80/G92, the front-end could end up in contention with the "data assembler" (input assembler) when performing indexed primitive fetches and forced the hardware to run at less than full speed. This has been fixed with GT200 through some optimizations to the memory crossbar between the assembler and the frame buffer.

The post-transform cache size has been increased. This cache is used to hold transformed vertex and geometry data that is ready for the viewport clip/cull stage, and increasing the size of it has resulted in faster communication and fewer pipeline stalls. Apparently setup rates are similar to G80 at up to one primative per clock, but feeding the setup engine is more efficient with a larger cache.

Z-Cull performance has been improved, while Early-Z rejection rates have increased due to the addition of more ROPs. Per ROP, GT200 can eliminate 32 pixles (or up to 256 samples with 8xAA) per clock.

The most vague improvement we have on the list is this one: "significant micro-architectural improvements in register allocation, instruction scheduling, and instruction issue." These are apparently the improvements that have enabled better "dual-issue" on GT200, but that's still rather vague as to what is actually different. It is mentioned that scheduling between the texture units and SMs within a TPC has also been improved. Again, more detail would be appreciated, but it is at least worth noting that some work went into that area.

Register Files? Double Em!

Each of those itty-bitty SPs is a single-core microprocessor, and as such it has its own register file. As you may remember from our CPU architecture articles, registers are storage areas used to directly feed execution units in a CPU core. A processor's register file is its collection of registers and although we don't know the exact number that were in G80's SPs, we do know that the number has been doubled for GT200.

NVIDIA's own data shows a greater than 10% increase in performance due to the larger register file size (source: NVIDIA)

If NVIDIA is betting that games are going to continue to get more compute intensive, then register file usage should increase as well. More computations means more registers in use, which in turn means that there's a greater likelihood of running out of registers. If a processor runs out of registers, it needs to start swapping data out to much slower memory and performance suffers tremendously.

If you haven't gotten the impression that NVIDIA's GT200 is a compute workhorse, doubling the size of the register file per SP (multiply that by 240 SPs in the chip) should help drive the idea home.

Double the Precision, 1/8th the Performance

Another major feature of the GT200 GPU and cards based on it is support for hardware double precision floating point operations. Double precision FP operations are 64-bits wide vs. 32-bit for single precision FP operations.

Now the 240 SPs in GT200 are single-precision only, they simply can't accept 64-bit operations at all. In order to add hardware level double precision NVIDIA actually includes one double precision unit per shading multiprocessor, for a total of 30 double precision units across the entire chip.

The ratio of double precision to single precision hardware in GT200 is ridiculously low, to the point that it's mostly useless for graphics rasterization. It is however, useful for scientific computing and other GPGPU applications.

It's unlikely that 3D games will make use of double precision FP extensively, especially given that 8-bit integer and 16-bit floating point are still used in many shader programs today. If anything, we'll see the use of DP FP operations in geometry and vertex operations first, before we ever need that sort of precision for color - much like how the transition to single precision FP started first in vertex shaders before eventually gaining support throughout the 3D pipeline.

Geometry Wars

ATI's R600 is alright at geometry shading. So is RV670. G80 didn't really keep up in this area. Of course, games haven't really made extensive use of geometry shaders because neither AMD nor NVIDIA offered compelling performance and other techniques made more efficient use of the hardware. This has worked out well for NVIDIA so far, but they couldn't ignore the issue forever.

GT200 has enhanced geometry shading support over G80 and is now on par with what we wish we had seen last year. We can't fault NVIDIA too much as with such divergent new features they had to try and predict the usage models that developers might be interested in years in advance. Now that we are here and can see what developers want to do with geometry shading, it makes sense to enhance the hardware in ways that support these efforts.

GT200 has significantly improved geometry shader performance compared to G80 (source: NVIDIA)

Generation of vertex data is a particularly weak part of NVIDIA's G80, so GT200 is capable of streaming out 6x the data of G80. Of course there are the scheduling enhancements that affect everything, but it is unclear as to whether NVIDIA did anything beyond increasing the size of their internal output buffers by 6x in order to enhance their geometry shading capability. Certainly this was lacking previously, but hopefully this will make heavy use of the geometry shader something developers are both interested in and can take advantage of.

Derek Gets Technical: 15th Century Loom Technology Makes a Comeback Derek's Conjecture Regarding SP Pipelining and TMT

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

108 Comments

View All Comments

Anand Lal Shimpi - Monday, June 16, 2008 - link
Thanks for the heads up, you're right about G92 only having 4 ROPs, I've corrected the image and references in the article. I also clarified the GeForce FX statement, it definitely fell behind for more reasons than just memory bandwidth, but the point was that NVIDIA has been trying to go down this path for a while now.

Take care,
Anand
mczak - Monday, June 16, 2008 - link
Thanks for correcting. Still, the paragraph about the FX is a bit odd imho. Lack of bandwidth really was the least of its problem, it was a too complicated core with actually lots of texturing power, and sacrificed raw compute power for more programmability in the compute core (which was its biggest problem).
Arbie - Monday, June 16, 2008 - link
I appreciate the in-depth look at the architecture, but what really matters to me are graphics performance, heat, and noise. You addressed the card's idle power dissipation but only in full-system terms, which masks a lot. Will it really draw 25W in idle under WinXP?

And this highly detailed review does not even mention noise! That's very disappointing. I'm ready to buy this card, but Tom's finds their samples terribly noisy. I was hoping and expecting Anandtech to talk about this.

Arbie
Anand Lal Shimpi - Monday, June 16, 2008 - link
I've updated the article with some thoughts on noise. It's definitely loud under load, not GeForce FX loud but the fan does move a lot of air. It's the loudest thing in my office by far once you get the GPU temps high enough.

From the updated article:

"Cooling NVIDIA's hottest card isn't easy and you can definitely hear the beast moving air. At idle, the GPU is as quiet as any other high-end NVIDIA GPU. Under load, as the GTX 280 heats up the fan spins faster and moves much more air, which quickly becomes audible. It's not GeForce FX annoying, but it's not as quiet as other high-end NVIDIA GPUs; then again, there are 1.4 billion transistors switching in there. If you have a silent PC, the GTX 280 will definitely un-silence it and put out enough heat to make the rest of your fans work harder. If you're used to a GeForce 8800 GTX, GTS or GT, the noise will bother you. The problem is that returning to idle from gaming for a couple of hours results in a fan that doesn't want to spin down as low as when you first turned your machine on.

While it's impressive that NVIDIA built this chip on a 65nm process, it desperately needs to move to 55nm."
Mr Roboto - Monday, June 16, 2008 - link
I agree with what Darkryft said about wanting a card that absolutely without a doubt, stomps the 8800GTX. So far that hasn't happened as the GX2 and GT200 hardly do either. The only thing they proved with the G90 and G92 is that they know how to cut costs.

Well thanks for making me feel like such a smart consumer as it's going on 2 years with my 8800GTX and it still owns 90% of the games I play.

P.S. It looks like Nvidia has quietly discontinued the 8800GTX as it's no longer on major retail sites.
Rev1 - Monday, June 16, 2008 - link
Ya the 640 8800 gts also. No Sli for me lol.
wiper - Monday, June 16, 2008 - link
What about noise ? Other reviews show mixed data. One says it's another dustblower, others says the noise level is ok.
Zak - Monday, June 16, 2008 - link
First thing though, don't rely entirely on spell checker:)) Page 4 "Derek Gets Technical": "borrowing terminology from weaving was cleaver" I believe you meant "clever"?

As darkryft pointed out:

"In my opinion, for $650, I want to see some f-ing God-like performance."

Why would anyone pay $650 for this? Ugh? This is probably THE disappointment of the year:(((

Z.
js01 - Monday, June 16, 2008 - link
On techpowerups review it seemed to pull much bigger numbers but they were using xp sp2.
http://www.techpowerup.com/reviews/Point_Of_View/G...">http://www.techpowerup.com/reviews/Point_Of_View/G...
NickelPlate - Monday, June 16, 2008 - link
Pfft, title says it all. Let's hope that driver updates widen the gap between previous high end products. Otherwise, I'll pass on this one.

NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260

Tweaks and Enahancements in GT200

Post Your Comment

108 Comments

View All Comments

Anand Lal Shimpi - Monday, June 16, 2008 - link

mczak - Monday, June 16, 2008 - link

Arbie - Monday, June 16, 2008 - link

Anand Lal Shimpi - Monday, June 16, 2008 - link

Mr Roboto - Monday, June 16, 2008 - link

Rev1 - Monday, June 16, 2008 - link

wiper - Monday, June 16, 2008 - link

Zak - Monday, June 16, 2008 - link

js01 - Monday, June 16, 2008 - link

NickelPlate - Monday, June 16, 2008 - link

Log in

Don't have an account? Sign up now