Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well.  Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory
Comments Locked

86 Comments

View All Comments

  • wjmbsd - Monday, July 2, 2007 - link

    What is the latest on the so-called Dragonhead 2 project (aka, HD 2900 XTX)? I heard it was just for OEMs at first...anyone know if the project is still going and how the part is benchmarking with newest drivers?
  • teainthesahara - Monday, May 21, 2007 - link

    After this failure of the R600 and likely overrated(and probably late) Barcelona/Agena processors I think that Intel will finally bury AMD. Paul Ottelini is rubbing his hands with glee at the moment and rightfully so. AMD now stands for mediocrity.Oh dear what a fall from grace.... To be honest Nvidia don't have any real competition on the DX10 front at any price points.I cannot see AMD processors besting Intel's Core 2 Quad lineup in the future especially when 45nm and 32 nm become the norm and they don't have a chance in hell of beating Nvidia. Intel and Nvidia are turning the screws on Hector Ruiz.Shame AMD brought down such a great company like ATI.
  • DerekWilson - Thursday, May 24, 2007 - link

    To be fair, we really don't have any clue how these cards compete on the DX10 front as there are no final, real DX10 games on the market to test.

    We will try really hard to get a good idea of what DX10 will look like on the HD 2000 series and the GeForce 8 Series using game demos, pre-release code, and SDK samples. It won't be a real reflection of what users will experience, but we will certainly hope to get a glimpse at performance.

    It is fair to say that NVIDIA bests AMD in current game performance. But really there are so many possibilities with DX10 that we can't call it yet.
  • spinportal - Friday, May 18, 2007 - link

    From the last posting of results for the GTS 320MB round-up
    http://www.anandtech.com/video/showdoc.aspx?i=2953...">Prey @ AnandTech - 8800GTS320
    we see that the 2900XT review chart pushes the nVidia cards down about 15% across the board.
    http://www.anandtech.com/video/showdoc.aspx?i=2988...">Prey @ AnandTech - ATI2900XT
    The only difference in systems is software drivers as the cpu / mobo / mem are the same.

    Does this mean ATI should be getting a BIGGER THRASHING BEAT-DOWN than the reviewer is stating?
    400$ ATI 2900XT performing as good as a 300$ nVidia 8800 GTS 320MB?

    Its 100$ short and 6 months late along with 100W of extra fuel.

    This is not your uncle's 9700 Pro...
  • DerekWilson - Sunday, May 20, 2007 - link

    We switched Prey demos -- I updated our benchmark.

    Both numbers are accurate for the tests I ran at the time.

    Our current timedemo is more stressful and thus we see lower scores with this test.
  • Yawgm0th - Wednesday, May 16, 2007 - link

    The prices listed in this article are way off.

    Currently, 8800GTS 640MB retails for $350-380, $400+ for OC or special versions. 2900XT retails for $430+. In the article, both are listed as $400, and as such the card is given a decent review in the conclusion.

    Realistically, this card provides slightly inferior performance to the 8800GTS 640MB at a considerably higher price point -- $80-$100 more than the 8800GTS. I mean, it's not like the 8800Ultra, but for the most part this card has little use outside of AMD and/or ATI fanboys. I'd love for this card to do better as AMD needs to be competing with Nvidia and Intel right now, but I just can't see how this is even worth looking at, given current prices.
  • DerekWilson - Thursday, May 17, 2007 - link

    really, this article focuses on architechture more than product, and we went with MSRP prices...

    we will absolutly look closer at price and price/performance when we review retail products.
  • quanta - Tuesday, May 15, 2007 - link

    As I recalled, the Radeon HD 2900 only has DVI ports, but nowhere in DVI documentation specifies it can carry audio signals. Unless the card comes with adapter that accepts audio input, it seems the audio portion of R600 is rendered useless.
  • DerekWilson - Wednesday, May 16, 2007 - link

    the card does come with an adapter of sorts, but the audio input is from the dvi port.

    you can't use a standard DVI to HDMI converter for this task.

    when using AMD's HDMI converter the data sent out over the DVI port does not follow the DVI specification.

    the bottom line is that the DVI port is just a physical connector carrying data. i could take a DVI port and solder it to a stereo and use it to carry 5.1 audio if I wanted to ... wouldn't be very useful, but I could do it :-)

    While connected to a DVI device, the card operates the port according to the DVI specification. When connected to an HDMI device through the special converter (which is not technically "dvi to hdmi" -- it's amd proprietry to hdmi), the card sends out data that follows the HDMI spec.

    you can look at it another way -- when the HDMI converter is connected, just think of the dvi port as an internal connector between an I/O port and the TMDS + audio device.
  • ShaunO - Tuesday, May 15, 2007 - link

    I was at an AMD movie night last night where they discussed the technical details of the HD 2900 XT and also showed the Ruby Whiteout DX10 Demo rendered using the card. It looked amazing and I had high hopes until I checked out the benchmark scores. They're going to need more than free food and popcorn to convince me to buy an obsolete card.

    However there is room for improvement of course. Driver updates, DX10 and whatnot. The main thing for me personally will be driver updates, I will be interested to see how well the card improves over time while I save my pennies for my next new machine.

    Everyone keeps saying "DX10 performance will be better, yadda yadda" but I also want to be able to play the games I have now and older games without having to rely on DX10 games to give me better performance. Nothing like totally underperforming in DX9 games and then only being equal or slightly better in DX10 games compared to the competition. I would rather have a decent performer all-round. Even saying that we don't even know for sure if DX10 games are even going to bring any performance increases of the competition, it's all speculation right now and that's all we can do, speculate.

    Shaun.

Log in

Don't have an account? Sign up now