Branching

In order to talk generally about SPs and their capabilities, all the vertices, primitives, pixel components, etc. to be processed are referred to as threads. This way we can look at each SP as handling its own thread no matter what type of data is being processed. G80 is able to sustain "thousands" of threads at a time, but the actual number of threads that can be active at any given time is not disclosed. While all SPs can handle any type of thread, SPs that share resources must be running the same type of thread at any given time. In this way, each block of 16 SPs can be running one type of shader program on 16 threads. This indicates something about branch granularity as well. For vertex shaders, branch granularity is 16 vertices. For pixel shaders, branch granularity is 32 pixels (arranged in pairs of blocks of 4x4 pixels).

Branch granularity defines how many threads must follow the same path through data. When a group of 32 pixel threads all take the same branch, we don't have a problem. If even one thread must take a path that is different from the others, all 32 threads must be evaluated with both paths following the branch. The branch then defines what result each individual thread will keep and which it will discard. It's easy to see that optimum granularity is 1 thread, as no unnecessary work would be done. The way resources are allocated and the way instructions are run on SPs grouped together currently doesn't allow any more fine-grained branching. Here's a chart that address branch granularity:

GPU Branch Granularity
NVIDIA NV4x ~1K pixels
NVIDIA G70 ~256 pixels
ATI R580 48 pixels
NVIDIA G80 16 vertex
32 pixels

Clearly G80 has the advantage here, as it's less likely that smaller groups of pixels will take different directions through a branch. This gives programmers the ability to more easily integrate branching into their code without getting a massive performance hit. If programmers are able to incorporate more branches, shader code can become more general purpose and we will see many more effects make their way into games. Now that G80 has caught up to ATI in terms of potential branch performance, we hope developers will take the reality of more complex code seriously.

Early-Z, Memory Interface

NVIDIA has added hardware for Early-Z to G80, after their current Z-Cull hardware which removes regions of pixels completely occluded by other geometry. Early-Z is a more fine-grained occlusion culling method that looks at a calculated Z value of a fragment before it hits the pixel pipeline. Z-Cull doesn't look at per fragment Z values, but uses a Z value based on geometry. While Z-Cull can get rid of large blocks of data it has issues handling surfaces that are only partially occluded or intersecting surfaces. Looking at individual depth values per pixel can help remove unnecessary fragments from heading down the pipeline only to be thrown out when the ROPs get to them.

The memory interface has been dramatically redesigned to support the access patterns of all of G80's independent stream processors. Given the theme of increasing granularity within G80 it's no surprise that we are now seeing 5 and 6 channels of GDDR rather than the 2 or 4 channels we have been used to for the past few years. 8800 GTX will have a 384 bit bus (6 x 64-bit channels), while the 8800 GTS will have a 320 bit wide connection to DRAM (5 x 64-bit channels). We would love to delve further into the details of G80's new memory interface, but NVIDIA isn't discussing the details of this aspect of their hardware.

Digging deeper into the shader core General Purpose Processing with G80
POST A COMMENT

111 Comments

View All Comments

  • DerekWilson - Thursday, November 09, 2006 - link

    i'm sure there was a lot burried in there ... sorry if it wasn't easy to find.

    8800 gtx and gtx are both no louder than 7900 gtx. 1950 xtx still takes the cake for loudest graphics card around by a long shot -- especially after it heats up in a game.
    Reply
  • crystal clear - Thursday, November 09, 2006 - link

    My comments in Daily Tech on this subject-

    More "G80" Derivatives in February R
    E: More info would be nice
    By crystal clear on 11/8/06, Rating: 2
    By crystal clear on 11/8/2006 8:03:43 AM , Rating: 2

    If you link VISTA -SANTA ROSA platform-Core2DUO(merom)CPU line up(T7300,7500,7700 models)then a matching Graphics card
    to complete the link.

    So a G80 for laptops/notebooks?

    The pairing of Intels Santa Rosa platform with Vista in the 2Q 07 is next big thing for the first tier notebook manufacturers & all they need is a matching G80 for this setup.

    Unquote-
    Nvidia currently caters to Desktop requirement/needs with the new G80 releases,wonder how the notebook/server versions will be-with Vista ofcourse.



    Reply
  • yyrkoon - Thursday, November 09, 2006 - link

    Vitual memory is probably a good thing for most cases, but in the graphics arena, this *could* potentially make for sloppy/ bad coding practises. Knowing a lot of game devers (some of which actually work for well known companies), I've heard them from time to time complain about maxing a 16x PCI-E pipe. What I'm trying to say here, is that while it would be a good thing for never having to run out of texture memory, but that system memory, and definately the swap disk can not hold a candle to the memory bandwidth that most Video cards are capable of. End result, is that you definately *will* get a performance hit. All this, and we already know the memory bandwidth capabilities of modern PCs, suffice it to say, the most we'll see from current systems is what ? 12-13K GB/s ? Even a 7800GS can do roughly 35 GB/s on card. A 7600GT ? 22GB/s ?

    Still I think Directx10 is a very good thing, and as I didnt read the whole article, perhaps a missed a little ? Reason being, I've been reading about Directx10 since April, and a friend of mine was privy to some of this information after an interview with ATI.

    http://www.gamedev.net/reference/programming/featu...">http://www.gamedev.net/reference/programming/featu...
    Reply
  • saratoga - Thursday, November 09, 2006 - link

    I don't know how they threading really works, but its quite possible VM support is required in order to allow multiple threads to run without stepping all over each other,. Reply
  • saratoga - Thursday, November 09, 2006 - link

    Sorry, should read "I don't know how THEIR threading works" Reply
  • falc0ne - Thursday, November 09, 2006 - link

    I don't know what is the problem but I'm really unable to see the images within the latest articles from Anand...Can anyone give me a suggestion? What might be the cause of that?
    The thing is I'm really, really interested in these articles and I need to see those images. Thanks
    Reply
  • yyrkoon - Thursday, November 09, 2006 - link

    Oh, er, then in the options tab of Firefox, (tools->options->content) check the "load images" check box ;) Reply
  • falc0ne - Thursday, November 09, 2006 - link

    well...it would've been simple but I'm afraid is not that...It might be the addblock extension from firefox, other than that I have nooo ideeea...Well I will use the IE tab option instead and load the pages using IE 7. Thanks anyway:) Reply
  • yyrkoon - Thursday, November 09, 2006 - link

    Checked the exceptions list ? I know that firefox makes it really simple to block images from a site (to a point of being too easy). Reply
  • JarredWalton - Thursday, November 09, 2006 - link

    If you've got AdBlock on Firefox, press Ctrl+Shift+A and you can see what it's blocking. If it blocks the images.anandtech.com stuff, you can then see which RegEx isn't working right and edit that. Reply

Log in

Don't have an account? Sign up now