ATI Radeon HD 2900 XT: Calling a Spade a Spade

Name: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Item: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Author: Derek Wilson

by Derek Wilson on May 14, 2007 12:04 PM EST

Posted in
GPUs

86 Comments | Add A Comment

86 Comments

Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well. Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

86 Comments

View All Comments

mostlyprudent - Monday, May 14, 2007 - link
Frankly, neither the NVIDIA nor the AMD part at this price point is all that impressive an upgrade from the prior generations. We keep hearing that we will have to wait for DX10 titles to know the real performance of these cards, but I suspect that by the time DX10 titles are on the shelves we will have at least product line refreshes by both companies. Does anyone else feel like the graphics card industry is jerking our chains?
johnsonx - Monday, May 14, 2007 - link
It seems pretty obvious that AMD needs a Radeon HD2900Pro to fill in the gap between the 2900XT and 2600XT. Use R600 silicon, give it 256Mb RAM with a 256-bit memory bus. Lower the clocks 15% so that power consumption will be lower, and so that chips that don't bin at full XT speeds can be used. Price at $250-$300. It would own the upper-midrange segment over the 8600GTS, and eat into the 8800GTS 320's lunch as well.
GlassHouse69 - Monday, May 14, 2007 - link
If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.
GlassHouse69 - Monday, May 14, 2007 - link
wrong section. dt is doing that today it seems to a few people
xfiver - Monday, May 14, 2007 - link
Hi, thank you for a really in depth review. While reading other 'earlier' reviews I remember a site using Catalyst 8.38 and reported performance improvements upto 14% from 8.37. Look forward to Anandtech's view on this.
xfiver - Monday, May 14, 2007 - link
My apologies it was VR zone and 8.36 to 8.37 (not 8.38)
GlassHouse69 - Monday, May 14, 2007 - link
If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.
Gary Key - Tuesday, May 15, 2007 - link

quote:
If I know this, and YOU know this.... wouldnt anandtech? I see money under the table or utter stupidity at work at anand. I mean, I know that the .01+ version does a lot better in benches as well as the higher res with aa/af on sometimes get BETTER framerates than lower res, no aa/af settings. This is a driver thing. If I know this, you know this, anand must. I would rather admit to being corrupt rather than that stupid.

I have worked extensively with four 8.37 releases and now the 8.38 release for the upcoming P35 release article. The 8.37.4.2 alpha driver had the top performance in SM3.0 heavy apps but was not very stable with numerous games, especially under Vista. The released 8.37.4.3 driver on AMD's website is the most stable driver to date and has decent performance but nothing near the alpha 8.37 or beta 8.38. The 8.38s offer great benchmark performance in the 3DMarks, several games, and a couple of DX10 benchmarks from AMD.

However, the 8.38s more or less broke CrossFire, OpenGL, and video acceleration in Vista depending upon the app and IQ is not always perfect. While there is a great deal of promise in their performance and we see the potential, they are still Beta drivers that have a long ways to go in certain areas before their final release date of 5/23 (internal target).

That said, would you rather see impressive results in 3DMarks or have someone tell you the truth about the development progress or lack of it with the drivers. As much as I would like to see this card's performance improve immediately, it is what it is at this time with the released drivers. AMD/ATI will improve the performance of the card with better drivers but until they are released our only choice is to go with what they sent. We said the same thing about NVIDIA's early driver issues with the G80 so there are not any fanboys or people taking money under the table around here. You can put all the lipstick on a pig you want, but in the end, you still have a pig. ;-)
Anand Lal Shimpi - Monday, May 14, 2007 - link
There's nothing sinister going on, ATI gave us 8.37 to test with and told us to use it. We got 8.38 today and are currently testing it for a follow-up.

Take care,
Anand
GlassHouse69 - Monday, May 14, 2007 - link
wow dood. you replied!

Yes, I have been wondering about the ethics of your group here for about a year now. I felt this sorta slick leaning towards and masking thing goign on. Nice to see there is not.

Thanks for the 1000's of articles and tests!
-Mr. Glass

ATI Radeon HD 2900 XT: Calling a Spade a Spade

Post Your Comment

86 Comments

View All Comments

mostlyprudent - Monday, May 14, 2007 - link

johnsonx - Monday, May 14, 2007 - link

GlassHouse69 - Monday, May 14, 2007 - link

GlassHouse69 - Monday, May 14, 2007 - link

xfiver - Monday, May 14, 2007 - link

xfiver - Monday, May 14, 2007 - link

GlassHouse69 - Monday, May 14, 2007 - link

Gary Key - Tuesday, May 15, 2007 - link

Anand Lal Shimpi - Monday, May 14, 2007 - link

GlassHouse69 - Monday, May 14, 2007 - link

Log in

Don't have an account? Sign up now