ATI Radeon HD 2900 XT: Calling a Spade a Spade

Name: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Item: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Author: Derek Wilson

by Derek Wilson on May 14, 2007 12:04 PM EST

Posted in
GPUs

86 Comments | Add A Comment

86 Comments

Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well. Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

86 Comments

View All Comments

GoatMonkey - Monday, May 14, 2007 - link
That's obviously BS. This IS their high end part, it just doesn't perform as well as nVidia's high end part, so it is priced accordingly.
poohbear - Monday, May 14, 2007 - link
sweet review though! thanks for including all the important and pertinent cards in your roundup (the 8800gts 320mb inparticular). also love how neutral Anand is in their reviews, unlike some other sites.:p
Creig - Monday, May 14, 2007 - link
The R600 is finally here. I'm sure the overall performance is not what AMD was hoping for. Nobody ever shoots to have their newest product be the 2nd best. But pricing it at $399 and including a very nice game bundle will make the HD 2900 XT a VERY worthwhile purchase. I also have the feeling that there is a significant amount of performance increase to be realized through future driver releases ala X1800XT.
shady28 - Tuesday, May 15, 2007 - link

Nvidia has gone over the cliff on pricing.

I know of no one personally who has an 88xx series card. I know one who recently picked up an 8600 of some kind, that's it. I have the best GPU of anyone I know.

It's a real shame that there is so much focus on graphics cards that virtually no one buys. These are niche products folks - yet 'who is best' seems to be totally dependent on these niche products. That's patently ridiculous.

It's like saying, since IBM makes the fastest computers in the world (they do), they're the best and you should be buying IBM (or now, lenovo) laptops and desktops.

No one ever said that sort of thing because it's patently ridiculous. Why do people say it now for graphics cards? The fact that they do says a lot about the mentality of sites like AT.
DerekWilson - Tuesday, May 15, 2007 - link
We don't say what you are implying, and we are also very upset with some of NVIDIA's pricing (specifically the 8800 ultra)

the 8800 gts 320mb is one of the best values for your money anywhere and isn't crazy expensive -- it's actually the card I'd recommend to anyone who cares about graphics in games and wants good quality and performance at 1600x1200.

I would never tell anyone to buy an 8600 gts because nvidia has the fastest high end card. In fact, in this article, I hope I made it clear that AMD has the opportunity to capitalize on the huge performance gap nvidia left between the 8600 and 8800 series ... If AMD builds a part that performs in this range is priced competitively, they'll have our recommendation in a flash.

Recommending parts based on value at each price or performance segment is something we take pride in and will always do, no matter who has the absolute fastest hardware out there.

The reason our focus was on AMD's fastest part is because they haven't given us any other hardware to test. We will absolutely be talking a lot and in much depth about midrange and budget hardware when AMD makes these parts available to us.
yacoub - Monday, May 14, 2007 - link
$400 is a lot of money. Not terribly long ago the highest end GPU available didn't cost more than $400. Now they hit $750 so you start to think $400 sounds cheap. It's really not. It's a heck of a lot of money for one piece of hardware. You can put together a 650i SLI rig with 2GB of DDR2 6400 and an E4400 for that much money. I know because I just did that. I kept my 7900GT from my old rig because I wanted to see how R600 did before purchasing an 8800GTS 640MB. Now that we've seen initial results I will wait to see how R600 does with more mature drivers and also wait to see the 640MB GTS price come down even more in the meantime.
vijay333 - Monday, May 14, 2007 - link
http://www.randomhouse.com/wotd/index.pperl?date=1...">http://www.randomhouse.com/wotd/index.pperl?date=1...

"the expression to call a spade a spade is thousands of years old and etymologically has nothing whatsoever to do with any racial sentiment."
yacoub - Monday, May 14, 2007 - link
Yes, a spade was a shovel long before muslims enslaved europeans to do hard labor in north africa and europeans enslaved africans to do hard labor in the 'new world'.
vijay333 - Monday, May 14, 2007 - link
whoops...replied to the wrong one.
rADo2 - Monday, May 14, 2007 - link
It is not 2nd best (after 8800ULTRA), not 3rd best (after 8800GTX), not 4th best (after 8800GTX-640), but 5th best (after 8800GTS-320), or even worse ;)

Bad performance with AA turned on (everybody turns on AA), huge power consumption, late to the market.

A definitive failure.

ATI Radeon HD 2900 XT: Calling a Spade a Spade

Post Your Comment

86 Comments

View All Comments

GoatMonkey - Monday, May 14, 2007 - link

poohbear - Monday, May 14, 2007 - link

Creig - Monday, May 14, 2007 - link

shady28 - Tuesday, May 15, 2007 - link

DerekWilson - Tuesday, May 15, 2007 - link

yacoub - Monday, May 14, 2007 - link

vijay333 - Monday, May 14, 2007 - link

yacoub - Monday, May 14, 2007 - link

vijay333 - Monday, May 14, 2007 - link

rADo2 - Monday, May 14, 2007 - link

Log in

Don't have an account? Sign up now