Digging deeper into the shader core

Many of the same patterns that lead designers of current hardware to their conclusions are still true today. For instance, pixels next to each other on the screen still tend to follow a very similar path through the hardware. This means that it still makes sense to process pixels in quads. As for changes, as hardware becomes more programmable, we are seeing a higher percentage of scalar data being used. In spite of the fact that much of the work done by graphics hardware is vector based, it becomes easier to schedule code if we are working with a bunch of parallel, independent, scalar processors. It is also more efficient to build separate units for texture addressing and filtering, and ATI has done this for quite some time now.

NVIDIA has finally decoupled the texture units from their shader hardware, enabling math and texturing to happen at the same time with no scheduling issues. They have also decided to implement their math hardware as a collection of scalar processors that can be used together to perform vector operations. NVIDIA calls the scalar processors Stream Processors (SPs), and they handle all the math performed in the shader core of G80.

It isn't surprising to see that NVIDIA's implementation of a unified shader is based on taking a pixel shader quad pipeline, and breaking up the vector units into 4 scalar units. Now, rather than 4 pixel quads, we see 16 SPs per "quad" or block of stream processors. Each block of 16 SPs shares 4 texture address units, 8 texture filter units, and an L1 cache.

G70 Pixel Shader Quad


G80 Stream Processor Block


The fact that these SPs are now independent and scalar gives NVIDIA the ability to keep more of them busy more of the time. This is very important as programmers start to write longer more complex shaders. Even while working with vectors, programmers need to use scalar values all the time to manipulate and evaluate data.

Each Stream Processor is able to complete one MAD and one MUL per clock cycle. While this is based on maximum throughput, we can reasonably expect to achieve this even though the hardware is pipelined. In spite of the 4 or 5 cycles (depending on precision) latency of a MUL in Conroe, SSE is now capable of one MUL per cycle throughput (as long as there are no stalls in the pipeline). Latency of operations in G80 could be even longer and sustain high throughput, as most of the time we are working with code that isn't riddled with dependencies.

The fact that each SP is capable of IEEE 754 single precision and can sustain high throughput for MAD and MUL operations while running any type of shader code makes this hardware very powerful and more general purpose than ever.

As a thread exits the SP, G80 is capable of writing the output of the shader to memory. The fact that SPs can do this at any time (except after pixel shaders) goes beyond the DX10 spec of just allowing for stream output after the Geometry Shader. On previous hardware, data would have to go through every stage of the pipeline until a value was finally written out to the frame buffer. Now, we can write data out at the end of anything but a pixel shader (as pixel shaders must send their output straight over to the ROPs for processing). This will be a great benefit to GPGPU (general purpose computing on graphics processing units).

G80: A Mile High Overview Branching, Early Z and Memory Interface
Comments Locked

111 Comments

View All Comments

  • JarredWalton - Wednesday, November 8, 2006 - link

    They did the same thing with the original Halo, porting it (and slowing it down) to DX9. MS seems to think making Halo 2 Vista-only will get people to upgrade to the new OS. [:rolls eyes:]
  • stmok - Wednesday, November 8, 2006 - link

    How else are they gonna get gamers to upgrade to Vista? :)
    (by cornering them into adopting Vista, using DirectX 10.0)

    Its sad and pathetic at the same time.

    DirectX 10.0 should be a "transitional" solution...That is, it covers both XP and Vista. This allows people to gradually upgrade their hardware, and if they wish, to Vista. What MS is doing now, is throwing everyone (developers and consumers) into the deep end, and expecting them to pay for the changes. (I suspect some would be put off by this, while the majority will continue to accept it...Which is unfortunate).


    Great article BTW. Interesting to see the high-end stuff...But I doubt I can afford it in this lifetime!

    I have two questions!

    (1) Any chance of looking at a triple video card setup?
    (I saw a presentation slide which had 2 video cards in SLI, while a third showed something else on screen).

    (2) Any idea when the GF8600-series comes?
    (mainstream market solution).
  • yyrkoon - Thursday, November 9, 2006 - link

    Great, links arent working ?

    http://www.gamedev.net/reference/programming/featu...">http://www.gamedev.net/reference/programming/featu...
  • yyrkoon - Thursday, November 9, 2006 - link

    http://www.gamedev.net/reference/programming/featu...">

    This article was written by a friend of mine back in April after an interview with ATI. Perhaps this will clear some things up.
  • yyrkoon - Thursday, November 9, 2006 - link

    When you break all hardware/software ties to something that has been around for 4-5 years? Its not that easy making it "transitional". From a software perspective, D3D10 is not compatable with XP in the least.

    I for one, think this is a step in the right direction.
  • JarredWalton - Thursday, November 9, 2006 - link

    Supposedly all of the changes to the WDDM make porting DX10 back to Windows XP "impossible", although I'm more inclined to think the correct term would be "difficult" and you also have to add in "it doesn't fit with MS marketing protocol". WDDM is quite different in Vista however, so maybe there's some substance to the claims.
  • cosmotic - Wednesday, November 8, 2006 - link

    On page 9:

    --Briefly explain what a sub-pixel is in the sentence before--
  • JarredWalton - Wednesday, November 8, 2006 - link

    Due to the size of this article and the amount of time it took to get ready, let me preempt any comments about the spelling and grammar. I am in the process of editing the final document as I read through it, and there are spelling/grammar errors. If they bother you too much, check back in an hour. If you read this an hour from now and you still find errors, then you can respond, though it would be useful to keep all responses in a single thread like this one.

    Thanks in advance,
    Jarred Walton
    Editor
    AnandTech.com
  • xtknight - Thursday, November 16, 2006 - link

    On p 12 (gamma corrected AA):

    "This causes problems for thing like thin lines."
  • acejj26 - Wednesday, November 8, 2006 - link

    "If DirectX 10 sounds like a great boon to software developers, the fact that DX10 will only be supported in Windows XP is certain to curb enthusiasm. "

    I believe this should say "DX10 will only be supported in Windows Vista..."

    Not to be rude, but shouldn't the article be edited BEFORE being published??

Log in

Don't have an account? Sign up now