Different Types of Stream Processors

The first thing we need to do when looking at the R600 shader core is to define our terms. AMD and NVIDIA build and refer to their Stream Processors (SPs) differently, and that makes counting them a little more difficult. Throughout our explanation, it will help to remember from our G80 coverage that threads refer to a vertex, primitive or pixel and not a stream of instructions as it would on a CPU.

Stream Processors: The NVIDIA Way

G80 has 128 SPs (for the 8800 GTX; there are 96 SPs on the 8800 GTS models) that are capable of doing a very small number of things at the same time. They can do either standard FP operations (like a MADD), a special function operation (like sine), or an integer operation. There are some cases where they can squeeze out an extra MUL, but more often than not this MUL isn't accessible. Each of these SPs operates on an individual thread (be it a vertex, primitive or pixel).

This gives us a total of up to 128 threads being processed per clock. It is important to realize that each of the 128 SPs isn't entirely independent. That is, we can't run 128 different instructions in one clock, in spite of the fact that we can run a number of instructions on 128 different threads. We'll delve a little deeper into this shortly, but depending on the type of shader running, the same instruction must be running on multiple threads.

For NVIDIA hardware, the minimum number of threads that must be processed using the same instruction is 16 (for vertex threads). NVIDIA's block diagrams show that each group of 16 SPs shares texture, register, and cache resources, so this makes sense. Pixel shaders, which are more important from a performance perspective, must run one instruction on 32 pixels at a time. What we can extrapolate from this is that NVIDIA can issue up to eight separate instructions across all of its 128 SPs (only four if working on pixels) per clock.

128 SPs / 16 Threads per Instruction per Clock = 8 Vertex Instructions per Clock

128 SPs / 32 Threads per Instruction per Clock = 4 Pixel Instructions per Clock

Stream Processors: AMD's R600

Things are a little different on R600. AMD tells us that there are 320 SPs, but these aren't directly comparable to G80's 128. First of all, most of the SPs are simpler and aren't capable of special function operations. For every block of five SPs, only one can handle either a special function operation or a regular floating point operation. The special function SP is also the only one able to handle integer multiply, while other SPs can perform simpler integer operations.

This isn't a huge deal because straight floating point MAD and MUL performance is by far the limiting factors in shader performance today. The big difference comes in the fact that AMD only executes one thread (vertex, primitive or pixel) across a group of five SPs.

What this means is that each of the five SPs in a block must run instructions from one thread. While AMD can run up to five scalar instructions from that thread in parallel, these instructions must be completely independent from one another. This can place a heavy burden on AMD's compiler to extract parallel operations from shader code. While AMD has gone to great lengths to make sure every block of five SPs is always busy, it's much harder to ensure that every SP within each block is always busy.

If we take a step back, we can determine how many threads AMD is able to work on per clock. With 320 total SPs, each grouped into blocks of five-to-a-thread, we get 64 threads per clock. And here's where it starts to get complicated. Before we go back and compare this to NVIDIA's architecture, let's go a little deeper into the implementation.

R600 Overview Stream Processor Implementation
Comments Locked

86 Comments

View All Comments

  • mostlyprudent - Monday, May 14, 2007 - link

    Frankly, neither the NVIDIA nor the AMD part at this price point is all that impressive an upgrade from the prior generations. We keep hearing that we will have to wait for DX10 titles to know the real performance of these cards, but I suspect that by the time DX10 titles are on the shelves we will have at least product line refreshes by both companies. Does anyone else feel like the graphics card industry is jerking our chains?
  • johnsonx - Monday, May 14, 2007 - link

    It seems pretty obvious that AMD needs a Radeon HD2900Pro to fill in the gap between the 2900XT and 2600XT. Use R600 silicon, give it 256Mb RAM with a 256-bit memory bus. Lower the clocks 15% so that power consumption will be lower, and so that chips that don't bin at full XT speeds can be used. Price at $250-$300. It would own the upper-midrange segment over the 8600GTS, and eat into the 8800GTS 320's lunch as well.
  • GlassHouse69 - Monday, May 14, 2007 - link

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.

  • GlassHouse69 - Monday, May 14, 2007 - link

    wrong section. dt is doing that today it seems to a few people
  • xfiver - Monday, May 14, 2007 - link

    Hi, thank you for a really in depth review. While reading other 'earlier' reviews I remember a site using Catalyst 8.38 and reported performance improvements upto 14% from 8.37. Look forward to Anandtech's view on this.
  • xfiver - Monday, May 14, 2007 - link

    My apologies it was VR zone and 8.36 to 8.37 (not 8.38)
  • GlassHouse69 - Monday, May 14, 2007 - link

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.
  • Gary Key - Tuesday, May 15, 2007 - link

    quote:

    If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.


    I have worked extensively with four 8.37 releases and now the 8.38 release for the upcoming P35 release article. The 8.37.4.2 alpha driver had the top performance in SM3.0 heavy apps but was not very stable with numerous games, especially under Vista. The released 8.37.4.3 driver on AMD's website is the most stable driver to date and has decent performance but nothing near the alpha 8.37 or beta 8.38. The 8.38s offer great benchmark performance in the 3DMarks, several games, and a couple of DX10 benchmarks from AMD.

    However, the 8.38s more or less broke CrossFire, OpenGL, and video acceleration in Vista depending upon the app and IQ is not always perfect. While there is a great deal of promise in their performance and we see the potential, they are still Beta drivers that have a long ways to go in certain areas before their final release date of 5/23 (internal target).

    That said, would you rather see impressive results in 3DMarks or have someone tell you the truth about the development progress or lack of it with the drivers. As much as I would like to see this card's performance improve immediately, it is what it is at this time with the released drivers. AMD/ATI will improve the performance of the card with better drivers but until they are released our only choice is to go with what they sent. We said the same thing about NVIDIA's early driver issues with the G80 so there are not any fanboys or people taking money under the table around here. You can put all the lipstick on a pig you want, but in the end, you still have a pig. ;-)
  • Anand Lal Shimpi - Monday, May 14, 2007 - link

    There's nothing sinister going on, ATI gave us 8.37 to test with and told us to use it. We got 8.38 today and are currently testing it for a follow-up.

    Take care,
    Anand
  • GlassHouse69 - Monday, May 14, 2007 - link

    wow dood. you replied!

    Yes, I have been wondering about the ethics of your group here for about a year now. I felt this sorta slick leaning towards and masking thing goign on. Nice to see there is not.

    Thanks for the 1000's of articles and tests!
    -Mr. Glass

Log in

Don't have an account? Sign up now