Different Types of Stream Processors

The first thing we need to do when looking at the R600 shader core is to define our terms. AMD and NVIDIA build and refer to their Stream Processors (SPs) differently, and that makes counting them a little more difficult. Throughout our explanation, it will help to remember from our G80 coverage that threads refer to a vertex, primitive or pixel and not a stream of instructions as it would on a CPU.

Stream Processors: The NVIDIA Way

G80 has 128 SPs (for the 8800 GTX; there are 96 SPs on the 8800 GTS models) that are capable of doing a very small number of things at the same time. They can do either standard FP operations (like a MADD), a special function operation (like sine), or an integer operation. There are some cases where they can squeeze out an extra MUL, but more often than not this MUL isn't accessible. Each of these SPs operates on an individual thread (be it a vertex, primitive or pixel).

This gives us a total of up to 128 threads being processed per clock. It is important to realize that each of the 128 SPs isn't entirely independent. That is, we can't run 128 different instructions in one clock, in spite of the fact that we can run a number of instructions on 128 different threads. We'll delve a little deeper into this shortly, but depending on the type of shader running, the same instruction must be running on multiple threads.

For NVIDIA hardware, the minimum number of threads that must be processed using the same instruction is 16 (for vertex threads). NVIDIA's block diagrams show that each group of 16 SPs shares texture, register, and cache resources, so this makes sense. Pixel shaders, which are more important from a performance perspective, must run one instruction on 32 pixels at a time. What we can extrapolate from this is that NVIDIA can issue up to eight separate instructions across all of its 128 SPs (only four if working on pixels) per clock.

128 SPs / 16 Threads per Instruction per Clock = 8 Vertex Instructions per Clock

128 SPs / 32 Threads per Instruction per Clock = 4 Pixel Instructions per Clock

Stream Processors: AMD's R600

Things are a little different on R600. AMD tells us that there are 320 SPs, but these aren't directly comparable to G80's 128. First of all, most of the SPs are simpler and aren't capable of special function operations. For every block of five SPs, only one can handle either a special function operation or a regular floating point operation. The special function SP is also the only one able to handle integer multiply, while other SPs can perform simpler integer operations.

This isn't a huge deal because straight floating point MAD and MUL performance is by far the limiting factors in shader performance today. The big difference comes in the fact that AMD only executes one thread (vertex, primitive or pixel) across a group of five SPs.

What this means is that each of the five SPs in a block must run instructions from one thread. While AMD can run up to five scalar instructions from that thread in parallel, these instructions must be completely independent from one another. This can place a heavy burden on AMD's compiler to extract parallel operations from shader code. While AMD has gone to great lengths to make sure every block of five SPs is always busy, it's much harder to ensure that every SP within each block is always busy.

If we take a step back, we can determine how many threads AMD is able to work on per clock. With 320 total SPs, each grouped into blocks of five-to-a-thread, we get 64 threads per clock. And here's where it starts to get complicated. Before we go back and compare this to NVIDIA's architecture, let's go a little deeper into the implementation.

R600 Overview Stream Processor Implementation
Comments Locked


View All Comments

  • wjmbsd - Monday, July 2, 2007 - link

    What is the latest on the so-called Dragonhead 2 project (aka, HD 2900 XTX)? I heard it was just for OEMs at first...anyone know if the project is still going and how the part is benchmarking with newest drivers?
  • teainthesahara - Monday, May 21, 2007 - link

    After this failure of the R600 and likely overrated(and probably late) Barcelona/Agena processors I think that Intel will finally bury AMD. Paul Ottelini is rubbing his hands with glee at the moment and rightfully so. AMD now stands for mediocrity.Oh dear what a fall from grace.... To be honest Nvidia don't have any real competition on the DX10 front at any price points.I cannot see AMD processors besting Intel's Core 2 Quad lineup in the future especially when 45nm and 32 nm become the norm and they don't have a chance in hell of beating Nvidia. Intel and Nvidia are turning the screws on Hector Ruiz.Shame AMD brought down such a great company like ATI.
  • DerekWilson - Thursday, May 24, 2007 - link

    To be fair, we really don't have any clue how these cards compete on the DX10 front as there are no final, real DX10 games on the market to test.

    We will try really hard to get a good idea of what DX10 will look like on the HD 2000 series and the GeForce 8 Series using game demos, pre-release code, and SDK samples. It won't be a real reflection of what users will experience, but we will certainly hope to get a glimpse at performance.

    It is fair to say that NVIDIA bests AMD in current game performance. But really there are so many possibilities with DX10 that we can't call it yet.
  • spinportal - Friday, May 18, 2007 - link

    From the last posting of results for the GTS 320MB round-up
    http://www.anandtech.com/video/showdoc.aspx?i=2953...">Prey @ AnandTech - 8800GTS320
    we see that the 2900XT review chart pushes the nVidia cards down about 15% across the board.
    http://www.anandtech.com/video/showdoc.aspx?i=2988...">Prey @ AnandTech - ATI2900XT
    The only difference in systems is software drivers as the cpu / mobo / mem are the same.

    Does this mean ATI should be getting a BIGGER THRASHING BEAT-DOWN than the reviewer is stating?
    400$ ATI 2900XT performing as good as a 300$ nVidia 8800 GTS 320MB?

    Its 100$ short and 6 months late along with 100W of extra fuel.

    This is not your uncle's 9700 Pro...
  • DerekWilson - Sunday, May 20, 2007 - link

    We switched Prey demos -- I updated our benchmark.

    Both numbers are accurate for the tests I ran at the time.

    Our current timedemo is more stressful and thus we see lower scores with this test.
  • Yawgm0th - Wednesday, May 16, 2007 - link

    The prices listed in this article are way off.

    Currently, 8800GTS 640MB retails for $350-380, $400+ for OC or special versions. 2900XT retails for $430+. In the article, both are listed as $400, and as such the card is given a decent review in the conclusion.

    Realistically, this card provides slightly inferior performance to the 8800GTS 640MB at a considerably higher price point -- $80-$100 more than the 8800GTS. I mean, it's not like the 8800Ultra, but for the most part this card has little use outside of AMD and/or ATI fanboys. I'd love for this card to do better as AMD needs to be competing with Nvidia and Intel right now, but I just can't see how this is even worth looking at, given current prices.
  • DerekWilson - Thursday, May 17, 2007 - link

    really, this article focuses on architechture more than product, and we went with MSRP prices...

    we will absolutly look closer at price and price/performance when we review retail products.
  • quanta - Tuesday, May 15, 2007 - link

    As I recalled, the Radeon HD 2900 only has DVI ports, but nowhere in DVI documentation specifies it can carry audio signals. Unless the card comes with adapter that accepts audio input, it seems the audio portion of R600 is rendered useless.
  • DerekWilson - Wednesday, May 16, 2007 - link

    the card does come with an adapter of sorts, but the audio input is from the dvi port.

    you can't use a standard DVI to HDMI converter for this task.

    when using AMD's HDMI converter the data sent out over the DVI port does not follow the DVI specification.

    the bottom line is that the DVI port is just a physical connector carrying data. i could take a DVI port and solder it to a stereo and use it to carry 5.1 audio if I wanted to ... wouldn't be very useful, but I could do it :-)

    While connected to a DVI device, the card operates the port according to the DVI specification. When connected to an HDMI device through the special converter (which is not technically "dvi to hdmi" -- it's amd proprietry to hdmi), the card sends out data that follows the HDMI spec.

    you can look at it another way -- when the HDMI converter is connected, just think of the dvi port as an internal connector between an I/O port and the TMDS + audio device.
  • ShaunO - Tuesday, May 15, 2007 - link

    I was at an AMD movie night last night where they discussed the technical details of the HD 2900 XT and also showed the Ruby Whiteout DX10 Demo rendered using the card. It looked amazing and I had high hopes until I checked out the benchmark scores. They're going to need more than free food and popcorn to convince me to buy an obsolete card.

    However there is room for improvement of course. Driver updates, DX10 and whatnot. The main thing for me personally will be driver updates, I will be interested to see how well the card improves over time while I save my pennies for my next new machine.

    Everyone keeps saying "DX10 performance will be better, yadda yadda" but I also want to be able to play the games I have now and older games without having to rely on DX10 games to give me better performance. Nothing like totally underperforming in DX9 games and then only being equal or slightly better in DX10 games compared to the competition. I would rather have a decent performer all-round. Even saying that we don't even know for sure if DX10 games are even going to bring any performance increases of the competition, it's all speculation right now and that's all we can do, speculate.


Log in

Don't have an account? Sign up now