NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Name: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Item: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST

Posted in
GPUs

111 Comments | Add A Comment

111 Comments

G80: A Mile High Overview

Now that we know a little more about the requirements and direction of DX10, we can take a deeper look at where NVIDIA has decided to go with the architecture of G80. We will be seeing a completely new design based around a unified shader architecture. While DX10 doesn't require a unified architecture, it certainly does make a lot of sense to move in that direction.

Inside G80, vertex, geometry, pixel shaders and more (more on this later) are all able to run on the same set of execution resources. In order to make this happen, the shader core needed to be made more general purpose and suitable for multiple usage scenarios. This is much like what we are used to seeing on a CPU, and as time moves on we expect these similarities to increase from both the CPU and GPU side. The design NVIDIA has come up, while very complex and powerful, is quite elegant. Here's a look at the block diagram for G80:

The architecture is able to use thread management hardware to dispatch different types of instructions on to the shader core. As vertices complete, their output can be used as input to geometry shaders back at the "top" of the shader core. Geometry shader output is then used as input to pixel shaders. Here's a quick conceptual representation of what we are talking about:

The sheer size of G80 is absolutely amazing; while NVIDIA wouldn't disclose exact die sizes let's look at the facts. The G80 chip is made up of 681 million transistors, more than a single core Itanium 2 or the recently launched Kentsfield, but manufactured on an almost old 90nm process. As a reference point, ATI's Radeon X1900 XTX based on the R580 GPU was built on a 90nm process yet it featured only 384 million transistors. NVIDIA's previous high-end GPU, the G71 based GeForce 7900 GTX was also built on a 90nm process but used only 278 million transistors. Any way you slice it, this is one huge chip. Architecting such a massive GPU has taken NVIDIA a great deal of time and money, four years and $475M to be exact. The previous record for time was almost 3 years at a lesser amount, but NVIDIA wouldn't tell us which GPU that was.

Intel's Quad Core Kentsfield on top, G80 on bottom

Despite very high clock speeds on the die and a ridiculous 681 million transistor count, power consumption of NVIDIA's G80 is quite reasonable given its target; on average, a G80 system uses about 8% more power than one outfitted with ATI's Radeon X1950 XTX.

Click to Enlarge

You really start to get a sense of how much of a departure G80 is from previous architectures when you look at the shader core. Composed of 128 simple processors, called Stream Processors (SPs), the G80 shader core runs at a very high 1.35GHz on the highest end G80 SKU. We'll get into exactly what these stream processors are on the coming pages, but NVIDIA basically put together a wide array of very fast, specialized, but simple processors. In a sense, G80's shader core looks much like Cell's array of SPEs, but the SPs here are not nearly as independent as the SPEs in Cell.

Running at up to 1.35GHz, NVIDIA had to borrow a few pages from the books of Intel in order to get this done. The SPs are fairly deeply pipelined and as you'll soon see, are only able to operate on scalar values, thus through simplifying the processors and lengthening their pipelines NVIDIA was able to hit the G80's aggressive clock targets. There was one other CPU-like trick employed to make sure that G80 could have such a shader core, and that is the use of custom logic and layout.

The reason new CPU architectures take years to design while new GPU architectures can be cranked out in a matter of 12 months is because of how they're designed. GPUs are generally designed using a hardware description language (HDL), which is sort of a high level programming language that is used to translate code into a transistor layout that you can use to build your chip. At the other end of the spectrum are CPU designs which are largely done by hand, where design is handled at the transistor level rather than at a higher level like a HDL would.

Elements of GPUs have been designed at the transistor level in the past; things like memory interfaces, analog circuits, memories, register files and TMDS drivers were done by hand using custom transistor level design. But shaders and the rest of the pipeline was designed by writing high level HDL code and relying on automated layout.

You can probably guess where we're headed with this; the major difference between G80 and NVIDIA's previous GPUs is that NVIDIA designed the shader core at the transistor level. If you've heard the rumors of NVIDIA building more than just GPUs in the future, this is the first step, although NVIDIA was quick to point out that G80 won't be the norm. NVIDIA will continue to design using HDLs where it makes sense, and in critical areas where additional performance or power sensitive circuitry is needed, we'll see transistor level layout work done by NVIDIA's engineering. It's simply not feasible for NVIDIA's current engineering staff and product cycles to work with a GPU designed completely at the transistor level. That's not to say it won't happen in the future, and if NVIDIA does eventually get into the system on a chip business with its own general purpose CPU core, it will have to happen; but it's not happening anytime soon.

The additional custom logic and layout present in G80 helped extend the design cycle to a full four years and brought costs for the chip up to $475M. Prior to G80 the previous longest design cycle was approximately 2.5 - 3 years. Although G80 did take four years to design, much of that was due to the fact that G80 was a radical re-architecting of the graphics pipeline and that future GPUs derived from G80 will have an obviously shorter design cycle.

Shader Model 4.0 Enhancements Digging deeper into the shader core

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

Sunrise089 - Thursday, November 9, 2006 - link
Then I suppose he's in the market to part with an ugly old high-end CRT. I'd love to buy it from him. Seriously.
JarredWalton - Thursday, November 9, 2006 - link
You want an older 21" Cornerstone CRT? It's a beast, but you can have it for the cost of shipping (which unfortunately would probably be ~$50). I'd also sell my Samsung 997DF 19" CRT for about $50, and maybe an NEC FE991-SB for $50 (which unfortunately has a scratch from my daughter in the anti-glare coating). If anyone lives in the Olympia, WA area, you know how to contact me (I hope). I'd rather someone come by to pick up any of these CRTs rather than shipping, as I don't think I have the original boxes.
DerekWilson - Thursday, November 9, 2006 - link
lol next thing you know links to ebay auctions are gonna start showing up in our articles :-)
yyrkoon - Thursday, November 9, 2006 - link
lol, I've got a 21" techtronics I'll sell for $200 usd, plus shipping ;) Hasnt been used since I purchased my Viewsonic VA1912wb (well, been used very little ).
imaheadcase - Wednesday, November 8, 2006 - link
can't stand AA benchmarks myself :)

Question: Do you have any info on what kinda card nvidia releasing this feb? Is it something in between these 2 cards or something even lower?

Im looking for a $300ish g80! :D
flexy - Wednesday, November 8, 2006 - link
if ANYTHING counts then how those high-end cards perform WITH their various AA settings.... the power of those cards (and the money spent on :) RIGHT translated into ---> IMAGE QUALITY/PERFORMANCE.

Please dont tell you you would get an G80 but do NOT care about AA, this does NOT make any sense...sorry...

I am especially impressed reading that transparency AA has such a LITTLE performance impact. What game engine did you test this on ?

On the older ATI cards (and am i right that T.A. is the same as "adaptive antialiasing" ? )...this feature (depending on game engine) is the FPS killer....eg. w/ games like oblivion (WHERE ARE THE GOTHIC 3 BENCHEIS BTW ? :)...much vegetation etc. game-engines.

Enable transparency AA and see all those trees, grass etc. without jaggies.
imaheadcase - Thursday, November 9, 2006 - link
Well lots of people don't are for AA. Even if i had this card I would not use it. I visually see NO difference with it on or off. Its personal test. I don't even see "jaggies" on my older 9700 PRO card.
flexy - Thursday, November 9, 2006 - link
you sure are talking about ANTIALIASING ???

What resoltions do you run ? Not that my CRT can even handle more than 1600x1200..but even w/ 1600 i get VERY prominent jaggies if i dont run AA.

I made it a habit to run at least 4xAA in ANY game, and some engines (hl2:source engine) etc. run extremely well with 4xAA, even 6xAA is very playable at elast with HL2.

The very recent games, namely NWN2 and G3 now dont support AA, playing at 1280x1024 and it looks utterly horrible ! If you say you dont see jags in say ANY resolution under 1600..very hard to believe
imaheadcase - Thursday, November 9, 2006 - link
Yes im talking about antialiasing. I normally play BF2 and oblivion at 1024x768 (9700 pro remember).

Fact is most people won't see them unless someone points them out. The brain is still better at rendering stuff the way you want to see it vs hardware :)
flexy - Thursday, November 9, 2006 - link
ok..but then it's also a performance problem. If it doesn't bother you, well ok.
I also have to settle w/ the fact that many RECENT games are even unable to do AA..however i wish they would.

But once i get a 8800 i will do &&&& to get the most out of IQ, AA, AF, transparency/texture AF, you name it. ALONE also for the reason that i would need a super-high end monitor first to even run resolutions like 2000xsomething...and as long as i have a lame 19" CRT and CANNOT even go over 1600 (99,99% of games even running everything on 1280x or 1360x) i will use all the power to get out best possible IQ in those low resolutions.

Also..looking at the benchmarks..its NOT that you lose any real time gaming-experiencee since THOSE monster cards are made for exactly this...eg. running oblivion with all those settings at MAX AND AA on and HDR...and you are still in VERY reasobale FPS ranges.

NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

G80: A Mile High Overview

Post Your Comment

111 Comments

View All Comments

Sunrise089 - Thursday, November 9, 2006 - link

JarredWalton - Thursday, November 9, 2006 - link

DerekWilson - Thursday, November 9, 2006 - link

yyrkoon - Thursday, November 9, 2006 - link

imaheadcase - Wednesday, November 8, 2006 - link

flexy - Wednesday, November 8, 2006 - link

imaheadcase - Thursday, November 9, 2006 - link

flexy - Thursday, November 9, 2006 - link

imaheadcase - Thursday, November 9, 2006 - link

flexy - Thursday, November 9, 2006 - link

Log in

Don't have an account? Sign up now