Stream Processor Implementation

Going Deeper: Single Instruction, Multiple Data

SIMD (single instruction, multiple data) is the concept of running one instruction across lots of data. This is fundamental in the implementation of graphics hardware: multiple vertices, primitives, or pixels will need to have the same shader program run on them. Building hardware to do one operation at a time on massive amounts of data makes processing each piece of data very efficient.

In SIMD hardware, multiple processing units are tied together. The hardware issues one instruction to the SIMD hardware and all the processing units perform that operation on unique data. All graphics hardware is built on this concept at some level. Implementing hardware this way avoids the complexity of requiring each SP to manage not only the data coming through it, but the instructions it will be running as well.

Going Deeper: Very Long Instruction Word

Normally when we think about instructions on a processor, we think about a single operation, like Add or Multiply. But imagine if you wanted to run multiple instructions at once on a parallel array of hardware. You might come up with a technique similar to VLIW (Very Long Instruction Word), which allows you to take simple operations and, if they are not dependent on each other, stick them together as one instruction.

Imagine we have five processing units that operate in parallel. Utilizing this hardware would require us to issue independent instructions on each of the five units. This is hard to determine while code is running. VLIW allows us to take the determination of instruction dependence out of the hardware and put it in the complier. The compiler can then build a single instruction that consists of as much independent processing work as possible.

VLIW is a good way of exploiting parallelism without adding hardware complexity, but it can create a huge headache for compiler designers when dealing with dependencies. Luckily, graphics hardware lends itself well to this type of processing, but as shaders get more complex and interesting we might see more dependent instructions in practice.

Bringing it Back to the Hardware: AMD's R600

AMD implements their R600 shader core using four SIMD arrays. These SIMD arrays are issued 5-wide (6 with a branch) VLIW instructions. These VLIW instructions operate on 16 threads (vertices, primitives or pixels) at a time. In addition to all this, AMD interleaves two different VLIW instructions from different shaders in order to maximize pipeline utilization on the SIMD units. Our understanding is that this is in order to ensure that all the data from one VLIW instruction is available to a following dependent VLIW instruction in the same shader.

Based on this hardware, we can do a little math and see that R600 is capable of issuing up to four different VLIW instructions (up to 20 distinct shader operations), working on a total of 64 different threads. Each thread can have up to five different operations working on it as defined by the VLIW instruction running on the SIMD unit that is processing that specific thread.

For pixel processing, AMD assigns threads to SIMD units in 8x8 blocks (64 pixels) processed over multiple clocks. This is to enable a small branch granularity (each group of 64 pixels must follow the same code path), and it's large enough to exploit locality of reference in tightly packed pixels (in other words, pixels that are close together often need to load similar data/textures). There are apparently cases where branch granularity jumps to 128 pixels, but we don't have the data on when or why this happens yet.

If it seems like all this reads in a very complicated way, don't worry: it is complex. While AMD has gone to great lengths to build hardware that can efficiently handle parallel data, dependencies pose a problem to realizing peak performance. The compiler might not be able to extract five operations for every VLIW instruction. In the worst case scenario, we could effectively see only one SP per block operating with only four VLIW instructions being issued. This drops our potential operations per clock rate down from 320 at peak to only 64.

On the bright side, we will probably not see a shader program that causes R600 to run at its worst case performance. Because vertices and colors are still four components each, we will likely see utilization closer to peak in many common cases.

Different Types of Stream Processors Next Up: NVIDIA's G80
Comments Locked

86 Comments

View All Comments

  • TA152H - Monday, May 14, 2007 - link

    Fanboy? What a dork.

    I've had success with ATI, not with NVIDIA, and I know ATI stuff a lot better so it's just easier for me to work with. It's not an irrational like or dislike. I bought one NVIDIA and it was a nightmare. Plus, I'm not as sure they'll be around for very long as I am ATI/AMD, although they had a good quarter, and AMD surely had a dreadful one.

    Selling discrete video cards alone might get a lot more difficult with the integration of CPUs, and GPUs.
  • yyrkoon - Monday, May 14, 2007 - link

    You are a fanboy, face it. 'I tried a nVidia card once . . .' How long ago was that ? Who made the card ? Did you have it configured properly? Once?! Details like this are important, and seemily/conviently left out. Anyhow, anyone claiming that nVIdia cards are 'junk' has definate issues with assembling/configuring hardware. I say this because my current system uses a nVidia based card, and is 100% rock solid. 'Person between the chair and keyboard' rings a bell.

    Ask any Linux user why they refuse to use ATI cards in their system . . . You are also one of these people out there that claims ATI driver support is superior to nVIdias driver support I suppose ? If you have truely been using ATI products for 20 years, then you know ATI has one of the worst reputations on the planet for driver support(and while it may have improved, it is not as good as nVidias still).

    Yeah, anyhow, ATI, and nVidia both can have problems with their hardware, it is not based 100% on their architecture, but the OEM releasing the products have a lot of effect here also. There are bad OEMs to buy from here on both sides of the fence, knowing who to stay away from, is half the work when building a PC, and probably had a lot more to do with your alleged 'bad nVIdia card', assuming you actually configured the card properly.

    I also had a problem with an nVIdia card once, I bought a brand new GF3 card about 7 years ago, and a few of the older games I had, would not display properly with it. What did I do ? I waited about a month, for a new driver, and the problem was solved. I have also had issues with ATI cards, one of which drew too much power from the AGP slot, and would cause the given system to crash 1-2 times a day. This was a design issue/oversight on ATI's behalf(the card was made by Saphire, who also makes ATIs cards). What did I do ? I replaced the card with an nVIdia card, and the system has been stable since.

    So you see, I too can skew things to make anyone look bad also, and in the end, it would only serve to make me look like the dork. But if you want to pay more, for less, that is perfectly fine by me.
  • Pirks - Monday, May 14, 2007 - link

    I've got all problems and crappy drivers (especially Linux ones) only from ATI while nVidia software was always much better in my experience. power hungry noisy monsters made by whom? by ATI! as always :) same shit as with their x1800/x1900 miserable power guzzling series

    discrete video cards are not going away any time soon. ever heard of integrated video used in games, besides ones from 2000, like old Quake 2? no? then please continue your lovefest with ATI, but for me - it looks like I'll pass on them this time again - since Radeon 9800Pro they went downhill and continue in that direction. they MAY make a decent integrated CPU/GPU budget-oriented vendor in a future, for all those office folks playing simple 2D office games, but real stuff? nope, ATI is still out of the game for me. let's see if they manage to come back with reincarnation of R300 in future.

    ironically, AMD CPUs on the other hand have best price/performance ratio, so intel won't see me as their customer. I wish ATI 3D chips were as good as AMD CPUs in that regard (and overclockers please shut up, I'm not bothering to OC my rig because I don't enjoy benchmark numbers, I enjoy REAL stuff like games, and Intel is out of the game for me as well, at least until their budget single core Conroes are out)
  • utube545 - Tuesday, May 22, 2007 - link

    Get a clue, you fucking cretin.
  • dragonsqrrl - Thursday, August 25, 2011 - link

    haha... lol, wow. facepalm.
  • dragonsqrrl - Thursday, August 25, 2011 - link

    Damn you're a fail noob of an ATI fanboy. Time has not been kind to the HD2900XT, and now you sound more ridiculous then ever... lol.
  • yzkbug - Monday, May 14, 2007 - link

    Not a word about new AVIVO HD and digital sound features?
  • DerekWilson - Wednesday, May 16, 2007 - link

    we mentioned this ...

    on the r600 overview page ...
  • photoguy99 - Monday, May 14, 2007 - link

    First to be clear and I do not condone the title of this article, there's no need to bring racism into this.

    But my point is NVidia can and will react by making the performance per dollar competitive for the R600 vs 8800GTS.

    Once the prices are comparable, why buy a more power hungry part (the ATI)?

    This is one disadvantage they can't correct until the next respin.

  • DrMrLordX - Monday, May 14, 2007 - link

    Based on the benchmarks results, the only reason I can see for getting 2900XTs is if a). you don't care about power consumption and b). want to run a Crossfire rig at a lower cost of entry than dual-8800 GTXs or 8800 Ultras.

    As others have said, some more benchmarks in mature DX10 titles might show who the real winner here is performance-wise, and that holds true for multi-GPU scenarios as well.

Log in

Don't have an account? Sign up now