Branching

In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).

Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:

GPU Branch Granularity
NVIDIA NV4x ~1K pixels
NVIDIA G70 ~256 pixels
ATI R580 48 pixels
NVIDIA G80 16 vertex
32 pixels

Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.

Early-Z, Memory Interface

NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.

The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.

Digging deeper into the shader core General Purpose Processing with G80
Comments Locked

111 Comments

View All Comments

  • JarredWalton - Wednesday, November 8, 2006 - link

    The text is basically complete, and minor spelling issues aren't going to change the results. Obviously, proofing 29 pages of article content is going to take some time. We felt our readers would be a lot more interested in getting the content now rather than waiting even longer for me to proof everything. I know the vast majority of readers don't bother to comment on spelling and grammar issues, but my post was to avoid the comments section turning into a bunch of short posts complaining about errors that will be corrected shortly. :)
  • Iger - Wednesday, November 8, 2006 - link

    Pff, of course we would! If I would like to read a novel I would find a book! Results first - proofing later... if ever :) Thanks for the article!
  • JarredWalton - Wednesday, November 8, 2006 - link

    Did I say an hour? Okay, how about I just post here when I'm done reading/editing? :)
  • JarredWalton - Wednesday, November 8, 2006 - link

    Okay, I'm done proofing/editing. If you still see errors, feel free to complain. Like I said, though, try to keep them in this thread.

    --Jarred
  • LuxFestinus - Thursday, November 9, 2006 - link

    Pg. 3 under <b>Unified Shaders</b>

    quote:

    <i>Until now, building a GPU with unified shaders would not have desirable, let alone practical, but Shader Model 4.0 lends itself well to this approach.</i>

    Should read as follows:
    <i>Until now, building a GPU with unified shaders would not have <b>been</b> desirable, let alone practical, but Shader Model 4.0 lends itself well to this approach.</i>

    Good try though.;)
  • shabby - Wednesday, November 8, 2006 - link

    $600 for the gtx and $450 for the gts is pretty good seeing how much they crammed into the gpu, makes you wonder why the previous gen topped 650 bucks at times.
  • dcalfine - Wednesday, November 8, 2006 - link

    How does the 8800GTX compare to the 7950GX2? Not just in FPS, but also in performance/watt?
  • dcalfine - Wednesday, November 8, 2006 - link

    Ignore ^^^
    sorry


    Hot card by the way!
  • neogodless - Wednesday, November 8, 2006 - link

    I know you touched on this, but I assume that DirectX 10 is still not available for your testing platform, Windows XP Professional SP2, and additionally no games have been released for that platform. Is this correct? If so...

    Will DirectX 10 be made available for Windows XP?
    Will you publish a new review once Vista, DirectX 10 and the new games are available?

    Can we peak into the future at all now?
  • JarredWalton - Wednesday, November 8, 2006 - link

    DX10 will be Vista only according to Microsoft. What that means according to some game developers is that DX10 support is going to be somewhat slow, and it's also going to be a major headache because for the next 3-4 years they will pretty much be required to have a DX9 rendering solution along with DX10.

Log in

Don't have an account? Sign up now