Digging deeper into the shader core

Many of the same patterns that lead designers of current hardware to their conclusions are still true today. For instance, pixels next to each other on the screen still tend to follow a very similar path through the hardware. This means that it still makes sense to process pixels in quads. As for changes, as hardware becomes more programmable, we are seeing a higher percentage of scalar data being used. In spite of the fact that much of the work done by graphics hardware is vector based, it becomes easier to schedule code if we are working with a bunch of parallel, independent, scalar processors. It is also more efficient to build separate units for texture addressing and filtering, and ATI has done this for quite some time now.

NVIDIA has finally decoupled the texture units from their shader hardware, enabling math and texturing to happen at the same time with no scheduling issues. They have also decided to implement their math hardware as a collection of scalar processors that can be used together to perform vector operations. NVIDIA calls the scalar processors Stream Processors (SPs), and they handle all the math performed in the shader core of G80.

It isn't surprising to see that NVIDIA's implementation of a unified shader is based on taking a pixel shader quad pipeline, and breaking up the vector units into 4 scalar units. Now, rather than 4 pixel quads, we see 16 SPs per "quad" or block of stream processors. Each block of 16 SPs shares 4 texture address units, 8 texture filter units, and an L1 cache.

G70 Pixel Shader Quad


G80 Stream Processor Block


The fact that these SPs are now independent and scalar gives NVIDIA the ability to keep more of them busy more of the time. This is very important as programmers start to write longer more complex shaders. Even while working with vectors, programmers need to use scalar values all the time to manipulate and evaluate data.

Each Stream Processor is able to complete one MAD and one MUL per clock cycle. While this is based on maximum throughput, we can reasonably expect to achieve this even though the hardware is pipelined. In spite of the 4 or 5 cycles (depending on precision) latency of a MUL in Conroe, SSE is now capable of one MUL per cycle throughput (as long as there are no stalls in the pipeline). Latency of operations in G80 could be even longer and sustain high throughput, as most of the time we are working with code that isn't riddled with dependencies.

The fact that each SP is capable of IEEE 754 single precision and can sustain high throughput for MAD and MUL operations while running any type of shader code makes this hardware very powerful and more general purpose than ever.

As a thread exits the SP, G80 is capable of writing the output of the shader to memory. The fact that SPs can do this at any time (except after pixel shaders) goes beyond the DX10 spec of just allowing for stream output after the Geometry Shader. On previous hardware, data would have to go through every stage of the pipeline until a value was finally written out to the frame buffer. Now, we can write data out at the end of anything but a pixel shader (as pixel shaders must send their output straight over to the ROPs for processing). This will be a great benefit to GPGPU (general purpose computing on graphics processing units).

G80: A Mile High Overview Branching, Early Z and Memory Interface
Comments Locked

111 Comments

View All Comments

  • Sunrise089 - Thursday, November 9, 2006 - link

    Then I suppose he's in the market to part with an ugly old high-end CRT. I'd love to buy it from him. Seriously.
  • JarredWalton - Thursday, November 9, 2006 - link

    You want an older 21" Cornerstone CRT? It's a beast, but you can have it for the cost of shipping (which unfortunately would probably be ~$50). I'd also sell my Samsung 997DF 19" CRT for about $50, and maybe an NEC FE991-SB for $50 (which unfortunately has a scratch from my daughter in the anti-glare coating). If anyone lives in the Olympia, WA area, you know how to contact me (I hope). I'd rather someone come by to pick up any of these CRTs rather than shipping, as I don't think I have the original boxes.
  • DerekWilson - Thursday, November 9, 2006 - link

    lol next thing you know links to ebay auctions are gonna start showing up in our articles :-)
  • yyrkoon - Thursday, November 9, 2006 - link

    lol, I've got a 21" techtronics I'll sell for $200 usd, plus shipping ;) Hasnt been used since I purchased my Viewsonic VA1912wb (well, been used very little ).
  • imaheadcase - Wednesday, November 8, 2006 - link

    can't stand AA benchmarks myself :)

    Question: Do you have any info on what kinda card nvidia releasing this feb? Is it something in between these 2 cards or something even lower?

    Im looking for a $300ish g80! :D
  • flexy - Wednesday, November 8, 2006 - link

    if ANYTHING counts then how those high-end cards perform WITH their various AA settings.... the power of those cards (and the money spent on :) RIGHT translated into ---> IMAGE QUALITY/PERFORMANCE.

    Please dont tell you you would get an G80 but do NOT care about AA, this does NOT make any sense...sorry...

    I am especially impressed reading that transparency AA has such a LITTLE performance impact. What game engine did you test this on ?

    On the older ATI cards (and am i right that T.A. is the same as "adaptive antialiasing" ? )...this feature (depending on game engine) is the FPS killer....eg. w/ games like oblivion (WHERE ARE THE GOTHIC 3 BENCHEIS BTW ? :)...much vegetation etc. game-engines.

    Enable transparency AA and see all those trees, grass etc. without jaggies.

  • imaheadcase - Thursday, November 9, 2006 - link

    Well lots of people don't are for AA. Even if i had this card I would not use it. I visually see NO difference with it on or off. Its personal test. I don't even see "jaggies" on my older 9700 PRO card.
  • flexy - Thursday, November 9, 2006 - link

    you sure are talking about ANTIALIASING ???

    What resoltions do you run ? Not that my CRT can even handle more than 1600x1200..but even w/ 1600 i get VERY prominent jaggies if i dont run AA.

    I made it a habit to run at least 4xAA in ANY game, and some engines (hl2:source engine) etc. run extremely well with 4xAA, even 6xAA is very playable at elast with HL2.

    The very recent games, namely NWN2 and G3 now dont support AA, playing at 1280x1024 and it looks utterly horrible ! If you say you dont see jags in say ANY resolution under 1600..very hard to believe
  • imaheadcase - Thursday, November 9, 2006 - link

    Yes im talking about antialiasing. I normally play BF2 and oblivion at 1024x768 (9700 pro remember).

    Fact is most people won't see them unless someone points them out. The brain is still better at rendering stuff the way you want to see it vs hardware :)
  • flexy - Thursday, November 9, 2006 - link

    ok..but then it's also a performance problem. If it doesn't bother you, well ok.
    I also have to settle w/ the fact that many RECENT games are even unable to do AA..however i wish they would.

    But once i get a 8800 i will do &&&& to get the most out of IQ, AA, AF, transparency/texture AF, you name it. ALONE also for the reason that i would need a super-high end monitor first to even run resolutions like 2000xsomething...and as long as i have a lame 19" CRT and CANNOT even go over 1600 (99,99% of games even running everything on 1280x or 1360x) i will use all the power to get out best possible IQ in those low resolutions.

    Also..looking at the benchmarks..its NOT that you lose any real time gaming-experiencee since THOSE monster cards are made for exactly this...eg. running oblivion with all those settings at MAX AND AA on and HDR...and you are still in VERY reasobale FPS ranges.

Log in

Don't have an account? Sign up now