Stream Processor Implementation

Going Deeper: Single Instruction, Multiple Data

SIMD (single instruction, multiple data) is the concept of running one instruction across lots of data. This is fundamental in the implementation of graphics hardware: multiple vertices, primitives, or pixels will need to have the same shader program run on them. Building hardware to do one operation at a time on massive amounts of data makes processing each piece of data very efficient.

In SIMD hardware, multiple processing units are tied together. The hardware issues one instruction to the SIMD hardware and all the processing units perform that operation on unique data. All graphics hardware is built on this concept at some level. Implementing hardware this way avoids the complexity of requiring each SP to manage not only the data coming through it, but the instructions it will be running as well.

Going Deeper: Very Long Instruction Word

Normally when we think about instructions on a processor, we think about a single operation, like Add or Multiply. But imagine if you wanted to run multiple instructions at once on a parallel array of hardware. You might come up with a technique similar to VLIW (Very Long Instruction Word), which allows you to take simple operations and, if they are not dependent on each other, stick them together as one instruction.

Imagine we have five processing units that operate in parallel. Utilizing this hardware would require us to issue independent instructions on each of the five units. This is hard to determine while code is running. VLIW allows us to take the determination of instruction dependence out of the hardware and put it in the complier. The compiler can then build a single instruction that consists of as much independent processing work as possible.

VLIW is a good way of exploiting parallelism without adding hardware complexity, but it can create a huge headache for compiler designers when dealing with dependencies. Luckily, graphics hardware lends itself well to this type of processing, but as shaders get more complex and interesting we might see more dependent instructions in practice.

Bringing it Back to the Hardware: AMD's R600

AMD implements their R600 shader core using four SIMD arrays. These SIMD arrays are issued 5-wide (6 with a branch) VLIW instructions. These VLIW instructions operate on 16 threads (vertices, primitives or pixels) at a time. In addition to all this, AMD interleaves two different VLIW instructions from different shaders in order to maximize pipeline utilization on the SIMD units. Our understanding is that this is in order to ensure that all the data from one VLIW instruction is available to a following dependent VLIW instruction in the same shader.

Based on this hardware, we can do a little math and see that R600 is capable of issuing up to four different VLIW instructions (up to 20 distinct shader operations), working on a total of 64 different threads. Each thread can have up to five different operations working on it as defined by the VLIW instruction running on the SIMD unit that is processing that specific thread.

For pixel processing, AMD assigns threads to SIMD units in 8x8 blocks (64 pixels) processed over multiple clocks. This is to enable a small branch granularity (each group of 64 pixels must follow the same code path), and it's large enough to exploit locality of reference in tightly packed pixels (in other words, pixels that are close together often need to load similar data/textures). There are apparently cases where branch granularity jumps to 128 pixels, but we don't have the data on when or why this happens yet.

If it seems like all this reads in a very complicated way, don't worry: it is complex. While AMD has gone to great lengths to build hardware that can efficiently handle parallel data, dependencies pose a problem to realizing peak performance. The compiler might not be able to extract five operations for every VLIW instruction. In the worst case scenario, we could effectively see only one SP per block operating with only four VLIW instructions being issued. This drops our potential operations per clock rate down from 320 at peak to only 64.

On the bright side, we will probably not see a shader program that causes R600 to run at its worst case performance. Because vertices and colors are still four components each, we will likely see utilization closer to peak in many common cases.

Different Types of Stream Processors Next Up: NVIDIA's G80
Comments Locked

86 Comments

View All Comments

  • Roy2001 - Tuesday, May 15, 2007 - link

    The reason is, you have to pay extra $ for a power supply. No, most probably your old PSU won't have enough milk for this baby. I will stick with nVidia in future. My 2 cents.
  • Chaser - Tuesday, May 15, 2007 - link

    quote:


    While AMD will tell us that R600 is not late and hasn't been delayed, this is simply because they never actually set a public date from which to be delayed. We all know that AMD would rather have seen their hardware hit the streets at or around the time Vista launched, or better yet, alongside G80.

    First, they refuse to call a spade a spade: this part was absolutely delayed, and it works better to admit this rather than making excuses.



    Such a revealing tech article. Thanks for other sources Tom.
  • archcommus - Tuesday, May 15, 2007 - link

    $300 is the exact price point I shoot for when buying a video card, so that pretty much eliminates AMD right off the bat for me right now. I want to spend more than $200 but $400 is too much. I'm sure they'll fill this void eventually, and how that card will stack up against an 8800 GTS 320 MB is what I'm interested in.
  • H4n53n - Tuesday, May 15, 2007 - link

    Interesting enough in some other websites it wins from 8800 gtx in most games,especially the newer ones and comparing the price i would say it's a good deal?I think it's just driver problems,ati has been known for not having a very good driver compared to nvidia but when they fixed it then it'll win
  • dragonsqrrl - Thursday, August 25, 2011 - link

    lol...fail. In retrospect it's really easy to pick out the EPIC ATI fanboys now.
  • Affectionate-Bed-980 - Tuesday, May 15, 2007 - link

    I skimmed this article because I have a final. ATI can't hold a candle to NV at the moment it seems. Now while the 2900XT might have good value, I am correct in saying that ATI has lost the performance crown by a buttload (not even like X1800 vs 7800) but like they're totally slaughtered right?

    Now I won't go and comment about how the 2900 stacks up against competition in the same price range, but it seems that GTSes can be acquired for cheap.

    Did ATI flop big here?
  • vailr - Monday, May 14, 2007 - link

    I'd rather use a mid-range older card that "only" uses ~100 Watts (or less) than pay ~$400 for a card that requires 300 Watts to run. Doesn't AMD care about "Global Warming"?
    Al Gore would be amazed, alarmed, and astounded !!
  • Deusfaux - Monday, May 14, 2007 - link

    No they dont and that's why the 2600 and 2400 don't exist
  • ochentay4 - Monday, May 14, 2007 - link

    Let me start with this: i always had a nvidia card. ALWAYS.

    Faster is NOT ALWAYS better. For the most part this is true, for me, it was. One year ago I boght a MSI7600GT. Seemed the best bang for the buck. Since I bought it, I had problems with TVout detection, TVout wrong aspect ratios, broken LCD scaling, lot of game problems, inexistent support (nv forum is a joke) and UNIFIED DRIVER ARQUITECTURE. What a terrible lie! The latest official drivers is 6 months ago!!!

    Im really demanding, but i payed enough to demand a 100% working product. Now ATi latest offering has: AVIVO, FULL VIDEO ACC, MONTHLY DRIVER UPDATES, ALL BUGS I NOTICED WITH NVIDIA CARD FIXED, HDMI AND PRICE. I prefer that than a simple product, specially for the money they cost!

    I will never buy a nvidia card again. I'm definitely looking forward ATis offering (after the joke that is/was 8600GT/GTS).

    Enough rant.
    Am I wrong?
  • Roy2001 - Tuesday, May 15, 2007 - link

    Yeah, you are wrong. Spend $400 on a 2900XT and then $150 on a PSU.

Log in

Don't have an account? Sign up now