NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Name: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Item: NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on November 8, 2006 6:01 PM EST

Posted in
GPUs

111 Comments | Add A Comment

111 Comments

Digging deeper into the shader core

Many of the same patterns that lead designers of current hardware to their conclusions are still true today. For instance, pixels next to each other on the screen still tend to follow a very similar path through the hardware. This means that it still makes sense to process pixels in quads. As for changes, as hardware becomes more programmable, we are seeing a higher percentage of scalar data being used. In spite of the fact that much of the work done by graphics hardware is vector based, it becomes easier to schedule code if we are working with a bunch of parallel, independent, scalar processors. It is also more efficient to build separate units for texture addressing and filtering, and ATI has done this for quite some time now.

NVIDIA has finally decoupled the texture units from their shader hardware, enabling math and texturing to happen at the same time with no scheduling issues. They have also decided to implement their math hardware as a collection of scalar processors that can be used together to perform vector operations. NVIDIA calls the scalar processors Stream Processors (SPs), and they handle all the math performed in the shader core of G80.

It isn't surprising to see that NVIDIA's implementation of a unified shader is based on taking a pixel shader quad pipeline, and breaking up the vector units into 4 scalar units. Now, rather than 4 pixel quads, we see 16 SPs per "quad" or block of stream processors. Each block of 16 SPs shares 4 texture address units, 8 texture filter units, and an L1 cache.

G70 Pixel Shader Quad

G80 Stream Processor Block

The fact that these SPs are now independent and scalar gives NVIDIA the ability to keep more of them busy more of the time. This is very important as programmers start to write longer more complex shaders. Even while working with vectors, programmers need to use scalar values all the time to manipulate and evaluate data.

Each Stream Processor is able to complete one MAD and one MUL per clock cycle. While this is based on maximum throughput, we can reasonably expect to achieve this even though the hardware is pipelined. In spite of the 4 or 5 cycles (depending on precision) latency of a MUL in Conroe, SSE is now capable of one MUL per cycle throughput (as long as there are no stalls in the pipeline). Latency of operations in G80 could be even longer and sustain high throughput, as most of the time we are working with code that isn't riddled with dependencies.

The fact that each SP is capable of IEEE 754 single precision and can sustain high throughput for MAD and MUL operations while running any type of shader code makes this hardware very powerful and more general purpose than ever.

As a thread exits the SP, G80 is capable of writing the output of the shader to memory. The fact that SPs can do this at any time (except after pixel shaders) goes beyond the DX10 spec of just allowing for stream output after the Geometry Shader. On previous hardware, data would have to go through every stage of the pipeline until a value was finally written out to the frame buffer. Now, we can write data out at the end of anything but a pixel shader (as pixel shaders must send their output straight over to the ROPs for processing). This will be a great benefit to GPGPU (general purpose computing on graphics processing units).

G80: A Mile High Overview Branching, Early Z and Memory Interface

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

JarredWalton - Wednesday, November 8, 2006 - link
They did the same thing with the original Halo, porting it (and slowing it down) to DX9. MS seems to think making Halo 2 Vista-only will get people to upgrade to the new OS. [:rolls eyes:]
stmok - Wednesday, November 8, 2006 - link
How else are they gonna get gamers to upgrade to Vista? :)
(by cornering them into adopting Vista, using DirectX 10.0)

Its sad and pathetic at the same time.

DirectX 10.0 should be a "transitional" solution...That is, it covers both XP and Vista. This allows people to gradually upgrade their hardware, and if they wish, to Vista. What MS is doing now, is throwing everyone (developers and consumers) into the deep end, and expecting them to pay for the changes. (I suspect some would be put off by this, while the majority will continue to accept it...Which is unfortunate).

Great article BTW. Interesting to see the high-end stuff...But I doubt I can afford it in this lifetime!

I have two questions!

(1) Any chance of looking at a triple video card setup?
(I saw a presentation slide which had 2 video cards in SLI, while a third showed something else on screen).

(2) Any idea when the GF8600-series comes?
(mainstream market solution).
yyrkoon - Thursday, November 9, 2006 - link
Great, links arent working ?

http://www.gamedev.net/reference/programming/featu...">http://www.gamedev.net/reference/programming/featu...
yyrkoon - Thursday, November 9, 2006 - link
http://www.gamedev.net/reference/programming/featu...">

This article was written by a friend of mine back in April after an interview with ATI. Perhaps this will clear some things up.
yyrkoon - Thursday, November 9, 2006 - link
When you break all hardware/software ties to something that has been around for 4-5 years? Its not that easy making it "transitional". From a software perspective, D3D10 is not compatable with XP in the least.

I for one, think this is a step in the right direction.
JarredWalton - Thursday, November 9, 2006 - link
Supposedly all of the changes to the WDDM make porting DX10 back to Windows XP "impossible", although I'm more inclined to think the correct term would be "difficult" and you also have to add in "it doesn't fit with MS marketing protocol". WDDM is quite different in Vista however, so maybe there's some substance to the claims.
cosmotic - Wednesday, November 8, 2006 - link
On page 9:

--Briefly explain what a sub-pixel is in the sentence before--
JarredWalton - Wednesday, November 8, 2006 - link
Due to the size of this article and the amount of time it took to get ready, let me preempt any comments about the spelling and grammar. I am in the process of editing the final document as I read through it, and there are spelling/grammar errors. If they bother you too much, check back in an hour. If you read this an hour from now and you still find errors, then you can respond, though it would be useful to keep all responses in a single thread like this one.

Thanks in advance,
Jarred Walton
Editor
AnandTech.com
xtknight - Thursday, November 16, 2006 - link
On p 12 (gamma corrected AA):

"This causes problems for thing like thin lines."
acejj26 - Wednesday, November 8, 2006 - link
"If DirectX 10 sounds like a great boon to software developers, the fact that DX10 will only be supported in Windows XP is certain to curb enthusiasm. "

I believe this should say "DX10 will only be supported in Windows Vista..."

Not to be rude, but shouldn't the article be edited BEFORE being published??

NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10

Digging deeper into the shader core

Post Your Comment

111 Comments

View All Comments

JarredWalton - Wednesday, November 8, 2006 - link

stmok - Wednesday, November 8, 2006 - link

yyrkoon - Thursday, November 9, 2006 - link

yyrkoon - Thursday, November 9, 2006 - link

yyrkoon - Thursday, November 9, 2006 - link

JarredWalton - Thursday, November 9, 2006 - link

cosmotic - Wednesday, November 8, 2006 - link

JarredWalton - Wednesday, November 8, 2006 - link

xtknight - Thursday, November 16, 2006 - link

acejj26 - Wednesday, November 8, 2006 - link

Log in

Don't have an account? Sign up now