Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well.  Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory
Comments Locked

86 Comments

View All Comments

  • TA152H - Monday, May 14, 2007 - link

    Fanboy? What a dork.

    I've had success with ATI, not with NVIDIA, and I know ATI stuff a lot better so it's just easier for me to work with. It's not an irrational like or dislike. I bought one NVIDIA and it was a nightmare. Plus, I'm not as sure they'll be around for very long as I am ATI/AMD, although they had a good quarter, and AMD surely had a dreadful one.

    Selling discrete video cards alone might get a lot more difficult with the integration of CPUs, and GPUs.
  • yyrkoon - Monday, May 14, 2007 - link

    You are a fanboy, face it. 'I tried a nVidia card once . . .' How long ago was that ? Who made the card ? Did you have it configured properly? Once?! Details like this are important, and seemily/conviently left out. Anyhow, anyone claiming that nVIdia cards are 'junk' has definate issues with assembling/configuring hardware. I say this because my current system uses a nVidia based card, and is 100% rock solid. 'Person between the chair and keyboard' rings a bell.

    Ask any Linux user why they refuse to use ATI cards in their system . . . You are also one of these people out there that claims ATI driver support is superior to nVIdias driver support I suppose ? If you have truely been using ATI products for 20 years, then you know ATI has one of the worst reputations on the planet for driver support(and while it may have improved, it is not as good as nVidias still).

    Yeah, anyhow, ATI, and nVidia both can have problems with their hardware, it is not based 100% on their architecture, but the OEM releasing the products have a lot of effect here also. There are bad OEMs to buy from here on both sides of the fence, knowing who to stay away from, is half the work when building a PC, and probably had a lot more to do with your alleged 'bad nVIdia card', assuming you actually configured the card properly.

    I also had a problem with an nVIdia card once, I bought a brand new GF3 card about 7 years ago, and a few of the older games I had, would not display properly with it. What did I do ? I waited about a month, for a new driver, and the problem was solved. I have also had issues with ATI cards, one of which drew too much power from the AGP slot, and would cause the given system to crash 1-2 times a day. This was a design issue/oversight on ATI's behalf(the card was made by Saphire, who also makes ATIs cards). What did I do ? I replaced the card with an nVIdia card, and the system has been stable since.

    So you see, I too can skew things to make anyone look bad also, and in the end, it would only serve to make me look like the dork. But if you want to pay more, for less, that is perfectly fine by me.
  • Pirks - Monday, May 14, 2007 - link

    I've got all problems and crappy drivers (especially Linux ones) only from ATI while nVidia software was always much better in my experience. power hungry noisy monsters made by whom? by ATI! as always :) same shit as with their x1800/x1900 miserable power guzzling series

    discrete video cards are not going away any time soon. ever heard of integrated video used in games, besides ones from 2000, like old Quake 2? no? then please continue your lovefest with ATI, but for me - it looks like I'll pass on them this time again - since Radeon 9800Pro they went downhill and continue in that direction. they MAY make a decent integrated CPU/GPU budget-oriented vendor in a future, for all those office folks playing simple 2D office games, but real stuff? nope, ATI is still out of the game for me. let's see if they manage to come back with reincarnation of R300 in future.

    ironically, AMD CPUs on the other hand have best price/performance ratio, so intel won't see me as their customer. I wish ATI 3D chips were as good as AMD CPUs in that regard (and overclockers please shut up, I'm not bothering to OC my rig because I don't enjoy benchmark numbers, I enjoy REAL stuff like games, and Intel is out of the game for me as well, at least until their budget single core Conroes are out)
  • utube545 - Tuesday, May 22, 2007 - link

    Get a clue, you fucking cretin.
  • dragonsqrrl - Thursday, August 25, 2011 - link

    haha... lol, wow. facepalm.
  • dragonsqrrl - Thursday, August 25, 2011 - link

    Damn you're a fail noob of an ATI fanboy. Time has not been kind to the HD2900XT, and now you sound more ridiculous then ever... lol.
  • yzkbug - Monday, May 14, 2007 - link

    Not a word about new AVIVO HD and digital sound features?
  • DerekWilson - Wednesday, May 16, 2007 - link

    we mentioned this ...

    on the r600 overview page ...
  • photoguy99 - Monday, May 14, 2007 - link

    First to be clear and I do not condone the title of this article, there's no need to bring racism into this.

    But my point is NVidia can and will react by making the performance per dollar competitive for the R600 vs 8800GTS.

    Once the prices are comparable, why buy a more power hungry part (the ATI)?

    This is one disadvantage they can't correct until the next respin.

  • DrMrLordX - Monday, May 14, 2007 - link

    Based on the benchmarks results, the only reason I can see for getting 2900XTs is if a). you don't care about power consumption and b). want to run a Crossfire rig at a lower cost of entry than dual-8800 GTXs or 8800 Ultras.

    As others have said, some more benchmarks in mature DX10 titles might show who the real winner here is performance-wise, and that holds true for multi-GPU scenarios as well.

Log in

Don't have an account? Sign up now