Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
ATI Radeon HD 2900 XT: Calling a Spade a Spade
ATI Radeon HD 2900 XT: Calling a Spade a Spade
Date: May 14th, 2007
Topic: Video Card
Manufacturer: ATI
Author: Derek Wilson
Buy the XFX HD-575X-ZNF XFX ATI Radeon HD 5750
Blank
 BestBuy $159.99
 
 

Stream Processor Implementation

Going Deeper: Single Instruction, Multiple Data

SIMD (single instruction, multiple data) is the concept of running one instruction across lots of data. This is fundamental in the implementation of graphics hardware: multiple vertices, primitives, or pixels will need to have the same shader program run on them. Building hardware to do one operation at a time on massive amounts of data makes processing each piece of data very efficient.

In SIMD hardware, multiple processing units are tied together. The hardware issues one instruction to the SIMD hardware and all the processing units perform that operation on unique data. All graphics hardware is built on this concept at some level. Implementing hardware this way avoids the complexity of requiring each SP to manage not only the data coming through it, but the instructions it will be running as well.

Going Deeper: Very Long Instruction Word

Normally when we think about instructions on a processor, we think about a single operation, like Add or Multiply. But imagine if you wanted to run multiple instructions at once on a parallel array of hardware. You might come up with a technique similar to VLIW (Very Long Instruction Word), which allows you to take simple operations and, if they are not dependent on each other, stick them together as one instruction.

Imagine we have five processing units that operate in parallel. Utilizing this hardware would require us to issue independent instructions on each of the five units. This is hard to determine while code is running. VLIW allows us to take the determination of instruction dependence out of the hardware and put it in the complier. The compiler can then build a single instruction that consists of as much independent processing work as possible.

VLIW is a good way of exploiting parallelism without adding hardware complexity, but it can create a huge headache for compiler designers when dealing with dependencies. Luckily, graphics hardware lends itself well to this type of processing, but as shaders get more complex and interesting we might see more dependent instructions in practice.

Bringing it Back to the Hardware: AMD's R600

AMD implements their R600 shader core using four SIMD arrays. These SIMD arrays are issued 5-wide (6 with a branch) VLIW instructions. These VLIW instructions operate on 16 threads (vertices, primitives or pixels) at a time. In addition to all this, AMD interleaves two different VLIW instructions from different shaders in order to maximize pipeline utilization on the SIMD units. Our understanding is that this is in order to ensure that all the data from one VLIW instruction is available to a following dependent VLIW instruction in the same shader.

Based on this hardware, we can do a little math and see that R600 is capable of issuing up to four different VLIW instructions (up to 20 distinct shader operations), working on a total of 64 different threads. Each thread can have up to five different operations working on it as defined by the VLIW instruction running on the SIMD unit that is processing that specific thread.

For pixel processing, AMD assigns threads to SIMD units in 8x8 blocks (64 pixels) processed over multiple clocks. This is to enable a small branch granularity (each group of 64 pixels must follow the same code path), and it's large enough to exploit locality of reference in tightly packed pixels (in other words, pixels that are close together often need to load similar data/textures). There are apparently cases where branch granularity jumps to 128 pixels, but we don't have the data on when or why this happens yet.

If it seems like all this reads in a very complicated way, don't worry: it is complex. While AMD has gone to great lengths to build hardware that can efficiently handle parallel data, dependencies pose a problem to realizing peak performance. The compiler might not be able to extract five operations for every VLIW instruction. In the worst case scenario, we could effectively see only one SP per block operating with only four VLIW instructions being issued. This drops our potential operations per clock rate down from 320 at peak to only 64.

On the bright side, we will probably not see a shader program that causes R600 to run at its worst case performance. Because vertices and colors are still four components each, we will likely see utilization closer to peak in many common cases.

Next Up: NVIDIA's G80   Next Page

 
  Index

Tools Share
Digg   del.icio.us   E-mail  
Print This Article Print this article  

82 Comments - Last by SiliconDoc, 216 days ago
Username:
Password:
Calling a Spade a Spade by Creig, 1002 days ago
The R600 is finally here. I'm sure the overall performance is not what AMD was hoping for. Nobody ever shoots to have their newest product be the 2nd best. But pricing it at $399 and including a very nice game bundle will make the HD 2900 XT a VERY worthwhile purchase. I also have the feeling that there is a significant amount of performance increase to be realized through future driver releases ala X1800XT.

Reply
RE: Calling a Spade a Spade by rADo2, 1002 days ago
It is not 2nd best (after 8800ULTRA), not 3rd best (after 8800GTX), not 4th best (after 8800GTX-640), but 5th best (after 8800GTS-320), or even worse ;)

Bad performance with AA turned on (everybody turns on AA), huge power consumption, late to the market.

A definitive failure.

Reply
RE: Calling a Spade a Spade by imaheadcase, 1001 days ago
quote:

Bad performance with AA turned on (everybody turns on AA), huge power consumption, late to the market.


Says who? Most people I know don't care to turn on AA since they visually can't see a difference. Only people who are picky about everything they see do normally, the majority of people don't notice "jaggies" since the brain fixes it for you when you play.

Reply
RE: Calling a Spade a Spade by Amuro, 1001 days ago
quote:

the majority of people don't notice "jaggies" since the brain fixes it for you when you play.

Says who? No one spent $400 on a video card would turn off AA.

Reply
RE: Calling a Spade a Spade by motiv8, 1001 days ago
Depends on the game or player tbh.

I play within ladders without AA turned on, but for games like oblivion I would use AA. Depends on your needs at the time.

Reply
RE: Calling a Spade a Spade by imaheadcase, 1001 days ago
quote:

Says who? No one spent $400 on a video card would turn off AA.


Sure they do, because its a small "tweak" with a performance hit. I say who spends $400 on a video card to remove "jaggies" when they are not noticeable in the first place to most people. Same reason most people don't go for SLI or Crossfire, because it really in the end offers nothing substantial for most people who play games.

Some might like it, but they would not miss it if they stopped using it for some time. Its not like its make or break feature of a video card.

Reply
RE: Calling a Spade a Spade by SiliconDoc, 216 days ago
Boy we'd sure love to hear those red fans claiming they turn off AA nowadays and it doesn't matter.
LOL
It's just amazing how thick it gets.

Reply
RE: Calling a Spade a Spade by Roy2001, 1001 days ago
Says who? Most people I know don't care to turn on AA since they visually can't see a difference.
------------------------------------------
Wow, I never turn it of once I am used to have AA. I cannot play games anymore without AA.

Reply
RE: Calling a Spade a Spade by yacoub, 1002 days ago
$400 is a lot of money. Not terribly long ago the highest end GPU available didn't cost more than $400. Now they hit $750 so you start to think $400 sounds cheap. It's really not. It's a heck of a lot of money for one piece of hardware. You can put together a 650i SLI rig with 2GB of DDR2 6400 and an E4400 for that much money. I know because I just did that. I kept my 7900GT from my old rig because I wanted to see how R600 did before purchasing an 8800GTS 640MB. Now that we've seen initial results I will wait to see how R600 does with more mature drivers and also wait to see the 640MB GTS price come down even more in the meantime.

Reply
RE: Calling a Spade a Spade by vijay333, 1002 days ago
http://www.randomhouse.com/wotd/index.pperl?date=19970115

"the expression to call a spade a spade is thousands of years old and etymologically has nothing whatsoever to do with any racial sentiment."



Reply
Comments Page 1 of 9

Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!
Unlicensed Software at Your Last Company
Anonymously Report Unlicensed Software with Our Form Now. Get Up to $1 Million.
Special Offer from The Economist
Get 12 issues of The Economist for $12. US subscribers only.
Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
Report Unlicensed Business Software Use
Earn Up to $1 Million by Reporting Unlicensed Software Use. Fill Out Our Form!




Latest news by
DailyTech

 February 9, 2010

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 February 8, 2010

Blank




pipeboost
Copyright © 1997-2010 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information