Different Types of Stream Processors

The first thing we need to do when looking at the R600 shader core is to define our terms. AMD and NVIDIA build and refer to their Stream Processors (SPs) differently, and that makes counting them a little more difficult. Throughout our explanation, it will help to remember from our G80 coverage that threads refer to a vertex, primitive or pixel and not a stream of instructions as it would on a CPU.

Stream Processors: The NVIDIA Way

G80 has 128 SPs (for the 8800 GTX; there are 96 SPs on the 8800 GTS models) that are capable of doing a very small number of things at the same time. They can do either standard FP operations (like a MADD), a special function operation (like sine), or an integer operation. There are some cases where they can squeeze out an extra MUL, but more often than not this MUL isn't accessible. Each of these SPs operates on an individual thread (be it a vertex, primitive or pixel).

This gives us a total of up to 128 threads being processed per clock. It is important to realize that each of the 128 SPs isn't entirely independent. That is, we can't run 128 different instructions in one clock, in spite of the fact that we can run a number of instructions on 128 different threads. We'll delve a little deeper into this shortly, but depending on the type of shader running, the same instruction must be running on multiple threads.

For NVIDIA hardware, the minimum number of threads that must be processed using the same instruction is 16 (for vertex threads). NVIDIA's block diagrams show that each group of 16 SPs shares texture, register, and cache resources, so this makes sense. Pixel shaders, which are more important from a performance perspective, must run one instruction on 32 pixels at a time. What we can extrapolate from this is that NVIDIA can issue up to eight separate instructions across all of its 128 SPs (only four if working on pixels) per clock.

128 SPs / 16 Threads per Instruction per Clock = 8 Vertex Instructions per Clock

128 SPs / 32 Threads per Instruction per Clock = 4 Pixel Instructions per Clock

Stream Processors: AMD's R600

Things are a little different on R600. AMD tells us that there are 320 SPs, but these aren't directly comparable to G80's 128. First of all, most of the SPs are simpler and aren't capable of special function operations. For every block of five SPs, only one can handle either a special function operation or a regular floating point operation. The special function SP is also the only one able to handle integer multiply, while other SPs can perform simpler integer operations.

This isn't a huge deal because straight floating point MAD and MUL performance is by far the limiting factors in shader performance today. The big difference comes in the fact that AMD only executes one thread (vertex, primitive or pixel) across a group of five SPs.

What this means is that each of the five SPs in a block must run instructions from one thread. While AMD can run up to five scalar instructions from that thread in parallel, these instructions must be completely independent from one another. This can place a heavy burden on AMD's compiler to extract parallel operations from shader code. While AMD has gone to great lengths to make sure every block of five SPs is always busy, it's much harder to ensure that every SP within each block is always busy.

If we take a step back, we can determine how many threads AMD is able to work on per clock. With 320 total SPs, each grouped into blocks of five-to-a-thread, we get 64 threads per clock. And here's where it starts to get complicated. Before we go back and compare this to NVIDIA's architecture, let's go a little deeper into the implementation.

R600 Overview Stream Processor Implementation
Comments Locked

86 Comments

View All Comments

  • johnsonx - Monday, May 14, 2007 - link

    and to which are you going to admit to?

    What was that old saying about glass houses and throwing stones? Shouldn't throw them in one? Definitely shouldn't them if you ARE one!
  • Puddleglum - Monday, May 14, 2007 - link

    quote:

    ATI's latest and greatest doesn't exactly deliver the best performance per watt, so while it doesn't compete performance-wise with the GeForce 8800 GTX it requires more power.
    You mean, while it does compete performance-wise?
  • johnsonx - Monday, May 14, 2007 - link

    No, I'm pretty sure they mean DOESN'T. That is, the card can't compete with a GTX, yet still uses more power.
  • INTC - Monday, May 14, 2007 - link

    quote:

    We certainly hope we won't see a repeat of the R600 launch when Barcelona and Agena take on Core 2 Duo/Quad in a few months....
  • Chadder007 - Monday, May 14, 2007 - link

    When will we have the 2600's out in review?? Thats the card im waiting for.
  • TA152H - Monday, May 14, 2007 - link

    Derek,

    I like the fact you weren't mincing your words, except for a little on the last page, but I'll give you a perspective of why it might be a little better than some people will think.

    There are some of us, and I am one, that will never buy NVIDIA. I bought one, had nothing but trouble with it, and have been buying ATI for 20 years. ATI has been around for so long, there is brand loyalty, and as long as they come out with something that is competent, we'll consider it against their other products without respect to NVIDIA. I'd rather give up the performance to work with something I'm a lot more comfortable with.

    The power though is damning, I agree with you 100% on this. Any idea if these beasts are being made by AMD now, or still whoever ATI contracted out? AMD is typically really poor in their first iteration of a product on a process technology, but tend to improve quite a bit in succeeding ones. I wonder how much they'll push this product initially. It might be they just get it out to have it out, and the next one will be what is really a worthwhile product. That only makes sense, of course, if AMD is now manufacturing this product. I hope they are, they surely don't need to make anymore of their processors that aren't selling well.

    One last thing I noticed is the 2400 Pro had no fan! It had a heatsink from Hell, but that will still make this a really attractive product for a growing market segment. Any chance of you guys doing a review on the best fanless cards?
  • DerekWilson - Wednesday, May 16, 2007 - link

    TSMC is manufacturing the R600 GPUs, not AMD.
  • AnnonymousCoward - Tuesday, May 15, 2007 - link

    "I bought one, had nothing but trouble with it, and have been buying ATI for 20 years."

    That made me laugh. If one bad experience was all it took to stop you from using a computer component, you'd be left with a PS/2 keyboard at best.

    "...to work with something I'm a lot more comfortable with."

    Are you more comfortable having 4:3 resolutions stretched on a widescreen? Maybe you're also more comfortable with having crappier performance than nvidia has offered for the last 6 months and counting? This kind of brand loyalty is silly.
  • MadBoris - Monday, May 14, 2007 - link

    As far as your brand loyalty, ATI doesn't exist anymore. Furthermore AMD executives will got the staff so you can't call it the same.
    Secondly, Nvidia has been a stellar company providing stellar products. Everyone has some ups and downs. Unfortunately with the hardware and drivers this is ATI's (er AMD's) downs.

    This card should do ok in comparison to the GTS, especially as drivers mature. Some reviews show it doing better than GTS640 in most tests, so I am not sure where or how discrepencies are coming about. Maybe hardware compatibility, maybe settings.
  • rADo2 - Monday, May 14, 2007 - link

    Many NVIDIA 8600GT/GTS cards do not have a fan, are available on the market now, and are (probably; different league) much more powerful than 2400 ;) But as you are a fanboy, you are not interested, right?

Log in

Don't have an account? Sign up now