Different Types of Stream Processors

The first thing we need to do when looking at the R600 shader core is to define our terms. AMD and NVIDIA build and refer to their Stream Processors (SPs) differently, and that makes counting them a little more difficult. Throughout our explanation, it will help to remember from our G80 coverage that threads refer to a vertex, primitive or pixel and not a stream of instructions as it would on a CPU.

Stream Processors: The NVIDIA Way

G80 has 128 SPs (for the 8800 GTX; there are 96 SPs on the 8800 GTS models) that are capable of doing a very small number of things at the same time. They can do either standard FP operations (like a MADD), a special function operation (like sine), or an integer operation. There are some cases where they can squeeze out an extra MUL, but more often than not this MUL isn't accessible. Each of these SPs operates on an individual thread (be it a vertex, primitive or pixel).

This gives us a total of up to 128 threads being processed per clock. It is important to realize that each of the 128 SPs isn't entirely independent. That is, we can't run 128 different instructions in one clock, in spite of the fact that we can run a number of instructions on 128 different threads. We'll delve a little deeper into this shortly, but depending on the type of shader running, the same instruction must be running on multiple threads.

For NVIDIA hardware, the minimum number of threads that must be processed using the same instruction is 16 (for vertex threads). NVIDIA's block diagrams show that each group of 16 SPs shares texture, register, and cache resources, so this makes sense. Pixel shaders, which are more important from a performance perspective, must run one instruction on 32 pixels at a time. What we can extrapolate from this is that NVIDIA can issue up to eight separate instructions across all of its 128 SPs (only four if working on pixels) per clock.

128 SPs / 16 Threads per Instruction per Clock = 8 Vertex Instructions per Clock

128 SPs / 32 Threads per Instruction per Clock = 4 Pixel Instructions per Clock

Stream Processors: AMD's R600

Things are a little different on R600. AMD tells us that there are 320 SPs, but these aren't directly comparable to G80's 128. First of all, most of the SPs are simpler and aren't capable of special function operations. For every block of five SPs, only one can handle either a special function operation or a regular floating point operation. The special function SP is also the only one able to handle integer multiply, while other SPs can perform simpler integer operations.

This isn't a huge deal because straight floating point MAD and MUL performance is by far the limiting factors in shader performance today. The big difference comes in the fact that AMD only executes one thread (vertex, primitive or pixel) across a group of five SPs.

What this means is that each of the five SPs in a block must run instructions from one thread. While AMD can run up to five scalar instructions from that thread in parallel, these instructions must be completely independent from one another. This can place a heavy burden on AMD's compiler to extract parallel operations from shader code. While AMD has gone to great lengths to make sure every block of five SPs is always busy, it's much harder to ensure that every SP within each block is always busy.

If we take a step back, we can determine how many threads AMD is able to work on per clock. With 320 total SPs, each grouped into blocks of five-to-a-thread, we get 64 threads per clock. And here's where it starts to get complicated. Before we go back and compare this to NVIDIA's architecture, let's go a little deeper into the implementation.

R600 Overview Stream Processor Implementation
Comments Locked

86 Comments

View All Comments

  • GoatMonkey - Monday, May 14, 2007 - link

    That's obviously BS. This IS their high end part, it just doesn't perform as well as nVidia's high end part, so it is priced accordingly.
  • poohbear - Monday, May 14, 2007 - link

    sweet review though! thanks for including all the important and pertinent cards in your roundup (the 8800gts 320mb inparticular). also love how neutral Anand is in their reviews, unlike some other sites.:p
  • Creig - Monday, May 14, 2007 - link

    The R600 is finally here. I'm sure the overall performance is not what AMD was hoping for. Nobody ever shoots to have their newest product be the 2nd best. But pricing it at $399 and including a very nice game bundle will make the HD 2900 XT a VERY worthwhile purchase. I also have the feeling that there is a significant amount of performance increase to be realized through future driver releases ala X1800XT.
  • shady28 - Tuesday, May 15, 2007 - link


    Nvidia has gone over the cliff on pricing.

    I know of no one personally who has an 88xx series card. I know one who recently picked up an 8600 of some kind, that's it. I have the best GPU of anyone I know.

    It's a real shame that there is so much focus on graphics cards that virtually no one buys. These are niche products folks - yet 'who is best' seems to be totally dependent on these niche products. That's patently ridiculous.

    It's like saying, since IBM makes the fastest computers in the world (they do), they're the best and you should be buying IBM (or now, lenovo) laptops and desktops.

    No one ever said that sort of thing because it's patently ridiculous. Why do people say it now for graphics cards? The fact that they do says a lot about the mentality of sites like AT.
  • DerekWilson - Tuesday, May 15, 2007 - link

    We don't say what you are implying, and we are also very upset with some of NVIDIA's pricing (specifically the 8800 ultra)

    the 8800 gts 320mb is one of the best values for your money anywhere and isn't crazy expensive -- it's actually the card I'd recommend to anyone who cares about graphics in games and wants good quality and performance at 1600x1200.

    I would never tell anyone to buy an 8600 gts because nvidia has the fastest high end card. In fact, in this article, I hope I made it clear that AMD has the opportunity to capitalize on the huge performance gap nvidia left between the 8600 and 8800 series ... If AMD builds a part that performs in this range is priced competitively, they'll have our recommendation in a flash.

    Recommending parts based on value at each price or performance segment is something we take pride in and will always do, no matter who has the absolute fastest hardware out there.

    The reason our focus was on AMD's fastest part is because they haven't given us any other hardware to test. We will absolutely be talking a lot and in much depth about midrange and budget hardware when AMD makes these parts available to us.
  • yacoub - Monday, May 14, 2007 - link

    $400 is a lot of money. Not terribly long ago the highest end GPU available didn't cost more than $400. Now they hit $750 so you start to think $400 sounds cheap. It's really not. It's a heck of a lot of money for one piece of hardware. You can put together a 650i SLI rig with 2GB of DDR2 6400 and an E4400 for that much money. I know because I just did that. I kept my 7900GT from my old rig because I wanted to see how R600 did before purchasing an 8800GTS 640MB. Now that we've seen initial results I will wait to see how R600 does with more mature drivers and also wait to see the 640MB GTS price come down even more in the meantime.
  • vijay333 - Monday, May 14, 2007 - link

    http://www.randomhouse.com/wotd/index.pperl?date=1...">http://www.randomhouse.com/wotd/index.pperl?date=1...

    "the expression to call a spade a spade is thousands of years old and etymologically has nothing whatsoever to do with any racial sentiment."

  • yacoub - Monday, May 14, 2007 - link

    Yes, a spade was a shovel long before muslims enslaved europeans to do hard labor in north africa and europeans enslaved africans to do hard labor in the 'new world'.
  • vijay333 - Monday, May 14, 2007 - link

    whoops...replied to the wrong one.
  • rADo2 - Monday, May 14, 2007 - link

    It is not 2nd best (after 8800ULTRA), not 3rd best (after 8800GTX), not 4th best (after 8800GTX-640), but 5th best (after 8800GTS-320), or even worse ;)

    Bad performance with AA turned on (everybody turns on AA), huge power consumption, late to the market.

    A definitive failure.

Log in

Don't have an account? Sign up now