Branching

In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).

Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:

GPU Branch Granularity
NVIDIA NV4x ~1K pixels
NVIDIA G70 ~256 pixels
ATI R580 48 pixels
NVIDIA G80 16 vertex
32 pixels

Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.

Early-Z, Memory Interface

NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.

The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.

Digging deeper into the shader core General Purpose Processing with G80
Comments Locked

111 Comments

View All Comments

  • JarredWalton - Wednesday, November 8, 2006 - link

    Page 17:

    "The dual SLI connectors are for future applications, such as daisy chaining three G80 based GPUs, much like ATI's latest CrossFire offerings."

    Using a third GPU for physics processing is another possibility, once NVIDIA begins accelerating physics on their GPUs (something that has apparently been in the works for a year or so now).
  • Missing Ghost - Wednesday, November 8, 2006 - link

    So it seems like by substracting the highest 8800gtx sli power usage result with the one for the 8800gtx single card we can conclude that the card can use as much as 205W. Does anybody knows if this number could increase when the card is used in DX10 mode?
  • JarredWalton - Wednesday, November 8, 2006 - link

    Without DX10 games and an OS, we can't test it yet. Sorry.
  • JarredWalton - Wednesday, November 8, 2006 - link

    Incidentally, I would expect the added power draw in SLI comes from more than just the GPU. The CPU, RAM, and other components are likely pushed to a higher demand with SLI/CF than when running a single card. Look at FEAR as an example, and here's the power differences for the various cards. (Oblivion doesn't have X1950 CF numbers, unfortunately.)

    X1950 XTX: 91.3W
    7900 GTX: 102.7W
    7950 GX2: 121.0W
    8800 GTX: 164.8W

    Notice how in this case, X1950 XTX appears to use less power than the other cards, but that's clearly not the case in single GPU configurations, as it requires more than everything besides the 8800 GTX. Here's the Prey results as well:

    X1950 XTX: 111.4W
    7900 GTX: 115.6W
    7950 GX2: 70.9W
    8800 GTX: 192.4W

    So there, GX2 looks like it is more power efficient, mostly because QSLI isn't doing any good. Anyway, simple subtraction relative to dual GPUs isn't enough to determine the actual power draw of any card. That's why we presented the power data without a lot of commentary - we need to do further research before we come to any final conclusions.
  • IntelUser2000 - Wednesday, November 8, 2006 - link

    It looks like putting SLI uses +170W more power. You can see how significant video card is in terms of power consumption. It blows the Pentium D away by couple of times.
  • JoKeRr - Wednesday, November 8, 2006 - link

    well, keep in mind the inefficiency of PSU, generally around 80%, so as overall power draw increases, the marginal loss of power increases a lot as well. If u actually multiply by 0.8, it gives about 136W. I suppose the power draw is from the wall.
  • DerekWilson - Thursday, November 9, 2006 - link

    max TDP of G80 is at most 185W -- NVIDIA revised this to something in the 170W range, but we know it won't get over 185 in any case.

    But games generally don't enable a card to draw max power ... 3dmark on the other hand ...
  • photoguy99 - Wednesday, November 8, 2006 - link

    Isn't 1920x1440 a resolution that almost no one uses in real life?

    Wouldn't 1920x1200 apply many more people?

    It seems almost all 23", 24", and many high end laptops have 1900x1200.

    Yes we could interpolate benchmarks, but why when no one uses 1440 vertical?

  • Frallan - Saturday, November 11, 2006 - link

    Well i have one more suggestion for a resolution. Full HD is 1920*1080 - that is sure to be found in a lot of homes in the future (after X-mas any1 ;0) ) on large LCDs - I believe it would be a good idea to throw that in there as well. Especially right now since loads of people will have to decide how to spend their money. The 37" Full HD is a given but on what system will I be gaming PS-3/X-Box/PC... Pls advice.
  • JarredWalton - Wednesday, November 8, 2006 - link

    This should be the last time we use that resolution. We're moving to LCD resolutions, but Derek still did a lot of testing (all the lower resolutions) on his trusty old CRT. LOL

Log in

Don't have an account? Sign up now