Branching

In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).

Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:

GPU Branch Granularity
NVIDIA NV4x ~1K pixels
NVIDIA G70 ~256 pixels
ATI R580 48 pixels
NVIDIA G80 16 vertex
32 pixels

Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.

Early-Z, Memory Interface

NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.

The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.

Digging deeper into the shader core General Purpose Processing with G80
Comments Locked

111 Comments

View All Comments

  • dwalton - Thursday, November 9, 2006 - link

    When using older cards sacrificing IQ for performance is typically acceptable. Who needs AA when running F.E.A.R on a 9700 Pro.

    However, on a just launched high-end card, why would anyone feel the need to sacrifice IQ for performance? Some may say resolution over AA, but I find it hard to believe that there is a lot of gaming enthusiasts with deep pockets, who play with insane resolutions yet no AA.
  • JarredWalton - Thursday, November 9, 2006 - link

    If I look for jaggies, I see them. On most games, however, they don't bother me much at all. Running at native resolution on LCDs or at a really high resolution on CRTs, I'd take that over a lower res with 4xAA. If you have the power to enable 4xAA, great, but I'm certainly not one to suggest it's required. I'd rather be able to enable vsync without a massive performance hit (i.e. stay above 60 FPS) than worry about jaggies. Personal preference.
  • munim - Wednesday, November 8, 2006 - link

    "With the latest 1.09 patch, F.E.A.R. has gained multi-core support,"

    Where is this?
  • JarredWalton - Wednesday, November 8, 2006 - link

    I wrote that, but it may be incorrect. I'm trying to get in contact with Gary to find out if I'm just being delusional about Quad Core support. Maybe it's NDA still? Hmmm.... nothing to see here!
  • JarredWalton - Wednesday, November 8, 2006 - link

    Okay, it's the 1.08 patch, and that is what was tested. Since we didn't use a quad core CPU I don't know if it will actually help or not -- something to look at in the future.
  • Nelsieus - Wednesday, November 8, 2006 - link

    I haven't even finished reading it yet, but so far, this is the most comprehensive, in-depth review I've seen on G80 and I just wanted to mention that beforehand.

    :)
  • GhandiInstinct - Wednesday, November 8, 2006 - link

    What upcoming games will be the first to be fully made on DX10 structure? And does the G80 have full support of DX10?
  • timmiser - Thursday, November 9, 2006 - link

    Microsoft Flight Simulator X will be DX10 compliant via a planned patch once Vista comes out.
  • JarredWalton - Wednesday, November 8, 2006 - link

    All DX10 hardware will be full DX10 (see pages 2-4). As for games that will be DX10 ready, Halo 2 for Vista will be for sure. Beyond that... I don't know for sure. As we've explained a bit, DX10 will require Vista, so anything launching before Vista will likely not be DX10 compliant.
  • shabby - Wednesday, November 8, 2006 - link

    They're re-doing a dx8 game in dx10? You gotta be kidding me, whats the point? You cant polish a turd.

Log in

Don't have an account? Sign up now