Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well.  Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory
Comments Locked

86 Comments

View All Comments

  • Roy2001 - Tuesday, May 15, 2007 - link

    The reason is, you have to pay extra $ for a power supply. No, most probably your old PSU won't have enough milk for this baby. I will stick with nVidia in future. My 2 cents.
  • Chaser - Tuesday, May 15, 2007 - link

    quote:


    While AMD will tell us that R600 is not late and hasn't been delayed, this is simply because they never actually set a public date from which to be delayed. We all know that AMD would rather have seen their hardware hit the streets at or around the time Vista launched, or better yet, alongside G80.

    First, they refuse to call a spade a spade: this part was absolutely delayed, and it works better to admit this rather than making excuses.



    Such a revealing tech article. Thanks for other sources Tom.
  • archcommus - Tuesday, May 15, 2007 - link

    $300 is the exact price point I shoot for when buying a video card, so that pretty much eliminates AMD right off the bat for me right now. I want to spend more than $200 but $400 is too much. I'm sure they'll fill this void eventually, and how that card will stack up against an 8800 GTS 320 MB is what I'm interested in.
  • H4n53n - Tuesday, May 15, 2007 - link

    Interesting enough in some other websites it wins from 8800 gtx in most games,especially the newer ones and comparing the price i would say it's a good deal?I think it's just driver problems,ati has been known for not having a very good driver compared to nvidia but when they fixed it then it'll win
  • dragonsqrrl - Thursday, August 25, 2011 - link

    lol...fail. In retrospect it's really easy to pick out the EPIC ATI fanboys now.
  • Affectionate-Bed-980 - Tuesday, May 15, 2007 - link

    I skimmed this article because I have a final. ATI can't hold a candle to NV at the moment it seems. Now while the 2900XT might have good value, I am correct in saying that ATI has lost the performance crown by a buttload (not even like X1800 vs 7800) but like they're totally slaughtered right?

    Now I won't go and comment about how the 2900 stacks up against competition in the same price range, but it seems that GTSes can be acquired for cheap.

    Did ATI flop big here?
  • vailr - Monday, May 14, 2007 - link

    I'd rather use a mid-range older card that "only" uses ~100 Watts (or less) than pay ~$400 for a card that requires 300 Watts to run. Doesn't AMD care about "Global Warming"?
    Al Gore would be amazed, alarmed, and astounded !!
  • Deusfaux - Monday, May 14, 2007 - link

    No they dont and that's why the 2600 and 2400 don't exist
  • ochentay4 - Monday, May 14, 2007 - link

    Let me start with this: i always had a nvidia card. ALWAYS.

    Faster is NOT ALWAYS better. For the most part this is true, for me, it was. One year ago I boght a MSI7600GT. Seemed the best bang for the buck. Since I bought it, I had problems with TVout detection, TVout wrong aspect ratios, broken LCD scaling, lot of game problems, inexistent support (nv forum is a joke) and UNIFIED DRIVER ARQUITECTURE. What a terrible lie! The latest official drivers is 6 months ago!!!

    Im really demanding, but i payed enough to demand a 100% working product. Now ATi latest offering has: AVIVO, FULL VIDEO ACC, MONTHLY DRIVER UPDATES, ALL BUGS I NOTICED WITH NVIDIA CARD FIXED, HDMI AND PRICE. I prefer that than a simple product, specially for the money they cost!

    I will never buy a nvidia card again. I'm definitely looking forward ATis offering (after the joke that is/was 8600GT/GTS).

    Enough rant.
    Am I wrong?
  • Roy2001 - Tuesday, May 15, 2007 - link

    Yeah, you are wrong. Spend $400 on a 2900XT and then $150 on a PSU.

Log in

Don't have an account? Sign up now