Digging deeper into the shader core

Many of the same patterns that lead designers of current hardware to their conclusions are still true today. For instance, pixels next to each other on the screen still tend to follow a very similar path through the hardware. This means that it still makes sense to process pixels in quads. As for changes, as hardware becomes more programmable, we are seeing a higher percentage of scalar data being used. In spite of the fact that much of the work done by graphics hardware is vector based, it becomes easier to schedule code if we are working with a bunch of parallel, independent, scalar processors. It is also more efficient to build separate units for texture addressing and filtering, and ATI has done this for quite some time now.

NVIDIA has finally decoupled the texture units from their shader hardware, enabling math and texturing to happen at the same time with no scheduling issues. They have also decided to implement their math hardware as a collection of scalar processors that can be used together to perform vector operations. NVIDIA calls the scalar processors Stream Processors (SPs), and they handle all the math performed in the shader core of G80.

It isn't surprising to see that NVIDIA's implementation of a unified shader is based on taking a pixel shader quad pipeline, and breaking up the vector units into 4 scalar units. Now, rather than 4 pixel quads, we see 16 SPs per "quad" or block of stream processors. Each block of 16 SPs shares 4 texture address units, 8 texture filter units, and an L1 cache.

G70 Pixel Shader Quad


G80 Stream Processor Block


The fact that these SPs are now independent and scalar gives NVIDIA the ability to keep more of them busy more of the time. This is very important as programmers start to write longer more complex shaders. Even while working with vectors, programmers need to use scalar values all the time to manipulate and evaluate data.

Each Stream Processor is able to complete one MAD and one MUL per clock cycle. While this is based on maximum throughput, we can reasonably expect to achieve this even though the hardware is pipelined. In spite of the 4 or 5 cycles (depending on precision) latency of a MUL in Conroe, SSE is now capable of one MUL per cycle throughput (as long as there are no stalls in the pipeline). Latency of operations in G80 could be even longer and sustain high throughput, as most of the time we are working with code that isn't riddled with dependencies.

The fact that each SP is capable of IEEE 754 single precision and can sustain high throughput for MAD and MUL operations while running any type of shader code makes this hardware very powerful and more general purpose than ever.

As a thread exits the SP, G80 is capable of writing the output of the shader to memory. The fact that SPs can do this at any time (except after pixel shaders) goes beyond the DX10 spec of just allowing for stream output after the Geometry Shader. On previous hardware, data would have to go through every stage of the pipeline until a value was finally written out to the frame buffer. Now, we can write data out at the end of anything but a pixel shader (as pixel shaders must send their output straight over to the ROPs for processing). This will be a great benefit to GPGPU (general purpose computing on graphics processing units).

G80: A Mile High Overview Branching, Early Z and Memory Interface
POST A COMMENT

111 Comments

View All Comments

  • yyrkoon - Thursday, November 09, 2006 - link

    If you're using Firefox, get, and install the extension "flashblock". Just did this myself today, tired of all the *animated* adds bothering me while reading articles.

    Sorry AT guys, but we've had this discussion before, and its realy annoying.
    Reply
  • JarredWalton - Thursday, November 09, 2006 - link

    Do you want to be able for us to continue as a site? Because ads support us. Anyway, his problem is related to not seeing images, so your comment about blocking ads via flashblock is completely off topic. Reply
  • yyrkoon - Thursday, November 09, 2006 - link

    Of course I want you guys to continue on as a site, just wish it were possible without annoying flashing adds in a section where I'm trying to concentrate on the article.

    As for the off topic part, yeah, my bad, I mis-read the full post (bad habit). Feel free to edit or remove that post of mine :)
    Reply
  • archcommus - Thursday, November 09, 2006 - link

    What browser are you using? Reply
  • falc0ne - Thursday, November 09, 2006 - link

    firefox 2.0 Reply
  • JarredWalton - Thursday, November 09, 2006 - link

    If Firefox, I know there's an option to block images not on the originating website. In this case, images come from image.anandtech.com while the article is on www.anandtech.com, so that my be the cause of your problems. IE7 and other browsers might have something similar, though I haven't ever looked. Other than that, perhaps some firewall or ad blocking software is to blame - it might be getting false positives? Reply
  • archcommus - Thursday, November 09, 2006 - link

    Wow to Anandtech - another amazing, incredibly in-depth article. It is so obvious this site is run by dedicated professionals who have degrees in these fields versus most other review sites where the authors just take pictures of the product and run some benches. Articles like this keep the AT reader base very very strong.

    Also wow to the G80, obviously an amazing card. My question, is 450W the PSU requirement for the GTX only or for both the GTX and GTS? I ask because I currently have a 400W PSU and am wondering if it will be sufficient for next-gen DX10 class hardware, and I know I would not be buying the highest model card. I also only have one HDD and one optical drive in my system.

    Yet another wow goes out to the R&D monetary investment - $475 million! It's amazing that that amount is even acceptable to nVidia, I can't believe the sales of such a high end, enthusiast-targeted card are great enough to warrant that.
    Reply
  • JarredWalton - Thursday, November 09, 2006 - link

    Sales of the lower end parts which will be based off G80 are what make it worthwhile, I would guess. As for PSU, I think that 450W is for the GTX, and more is probably a safe bet (550W would be in line with a high-end system these days, although 400W ought to suffice if it's a good quality 400W). You can see that the GTX tops out at just under 300W average system power draw with an X6800, so if you use an E6600 and don't overclock, a decent 400W ought to work. The GTX tops out around 260W average with the X6800, so theoretically even a decent 350W will work fine. Just remember to upgrade the PSU if you ever add other components. Reply
  • photoguy99 - Thursday, November 09, 2006 - link

    I just wanted to second that thought -

    AT articles have incredible quality and depth at this point - you guys are doing great work.

    It's actually getting embarrasing for some of your competing sites, I browsed the Tom's article and it had so much fluff and retread I had to stop.

    Please don't forget the effort is noticed and appreciated.
    Reply
  • shabby - Wednesday, November 08, 2006 - link

    It wasnt mentioned in the review, but whats the purpose of the 2nd sli connector? Reply

Log in

Don't have an account? Sign up now