ATI Radeon HD 2900 XT: Calling a Spade a Spade

Name: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Item: ATI Radeon HD 2900 XT: Calling a Spade a Spade
Author: Derek Wilson

by Derek Wilson on May 14, 2007 12:04 PM EST

Posted in
GPUs

86 Comments | Add A Comment

86 Comments

Next Up: NVIDIA's G80

NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.

The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.

The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.

On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.

This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.

Comparing Shader Architectures: R600 vs. G80

The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.

First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.

However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).

The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.

However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.

This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well. Where possible, let's explore those areas a bit more.

Stream Processor Implementation Texturing, Caches and Memory

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

86 Comments

View All Comments

yyrkoon - Tuesday, May 15, 2007 - link
See, the problem here is: guys like you are so bent on saving that little bit of money, by buying a lesser brand name, that you do not even take the time to research your hardware. USe newegg , and read the user reviews, and if that is not enough for you, go to the countless other resources all over the internet.
yyrkoon - Tuesday, May 15, 2007 - link
Blame the crappy OEM you bought the card from, not nVIdia. Get an EVGA card, and embrace a completely different aspect on video card life.

MSI may make some decent motherboards, but their other components have serious issues.
LoneWolf15 - Thursday, May 17, 2007 - link
Um, since 95% of nvidia-GPU cards on the market are the reference design, I'd say your argument here is shaky at best. EVGA and MSI both use the reference design, and it's even possible that cards with the same GPU came off the same production line at the same plant.
DerekWilson - Thursday, May 17, 2007 - link
it is true that the majority of parts are based on reference designs, but that doesn't mean they all come from the same place. I'm sure some of them do, but to say that all of these guys just buy completed boards and put their name on them all the time is selling them a little short.

at the same time, the whole argument of which manufacturer builds the better board on a board component level isn't something we can really answer.

what we would suggest is that its better to buy from OEMs who have good customer service and long extensive warranties. this way, even if things do go wrong, there is some recourse for customers who get bad boards or have bad experiences with drivers and software.
cmdrdredd - Monday, May 14, 2007 - link
you're wrong. 99% of people buying these high end cards are gaming. Those gamers demand and deserve the best possible performance. If a card that uses MORE power and costs MORE (x2900xt vs 8800gts) and performs generally the same or slower what is the point? Fact is...ATI's high end is in fact slower than mid range offerings from Nvidia and consumes alot more power. Regardless of what you think, people are buying these based on performance benchmarks in 99% of all cases.
AnnonymousCoward - Tuesday, May 15, 2007 - link
No, you're wrong. Did you overlook the emphasis he put on "NOT ALWAYS"?

You said 99% use for gaming--so there's 1%. Out of the gamers, many really want LCD scaling to work, so that games aren't stretched horribly on widescreen monitors. Some gamers would also like TVout to work.

So he was right: faster is NOT ALWAYS better.
erwos - Monday, May 14, 2007 - link
It'd be nice to get the scoop on the video decode acceleration present on these boards, and how it stocks up to the (excellent) PureVideo HD found in the 8600 series.
imaheadcase - Tuesday, May 15, 2007 - link
I agree! They need to do a whole article on video acceleration on a range of cards and show the pluses and cons of each card in respective areas. A lot of people like myself like to watch videos and game on cards, but like the option open to use the advanced video features.
Turnip - Monday, May 14, 2007 - link
"We certainly hope we won't see a repeat of the R600 launch when Barcelona and Agena take on Core 2 Duo/Quad in a few months...."

Why, that's exactly what I had been thinking :)

Phew! I made it through the whole thing though, I even read all of those awfully big words and everything! :)

Thanks guys, another top review :)
Kougar - Monday, May 14, 2007 - link
First, great article! I will be going back to reread the very indepth analysis of the hardware and features, something that keeps me a avid Anandtech reader. :)

Since it was mentioned that overclocking will be included in a future article, I would like to suggest that if possible watercooling be factored into it. So far one review site has already done a watercooled test with a low-end watercooling setup, and without mods acheived 930MHz on the Core, which indirectly means 930MHz shaders if I understand the hardware.

I'm sure I am not the only reader extremely interested to see if all R600 needs is a ~900-950MHz overclock to offer some solid GTX level performance... or if it would even help at all. Again thanks for the consideration, and the great article! Now off to find some Folding@Home numbers...

ATI Radeon HD 2900 XT: Calling a Spade a Spade

Post Your Comment

86 Comments

View All Comments

yyrkoon - Tuesday, May 15, 2007 - link

yyrkoon - Tuesday, May 15, 2007 - link

LoneWolf15 - Thursday, May 17, 2007 - link

DerekWilson - Thursday, May 17, 2007 - link

cmdrdredd - Monday, May 14, 2007 - link

AnnonymousCoward - Tuesday, May 15, 2007 - link

erwos - Monday, May 14, 2007 - link

imaheadcase - Tuesday, May 15, 2007 - link

Turnip - Monday, May 14, 2007 - link

Kougar - Monday, May 14, 2007 - link

Log in

Don't have an account? Sign up now