ATI Radeon HD 2900 XT: Calling a Spade a Spadeby Derek Wilson on May 14, 2007 12:04 PM EST
- Posted in
From a very high level, we have the same capabilities we saw in the G80, where each step in the pipeline runs on the same hardware. There are a lot of similarities when stepping way back, as the same goals need to be accomplished: data comes into the GPU, gets setup for processing, shader code runs on the data, and the result either heads back up for another pass through the shaders or moves on to be rendered out to the framebuffer.
The obvious points are that R600 is a unified architecture that supports DX10. The set of requirements for DX10 are very firm this time around, so we won't see any variations in feature support on a basic level. AMD and NVIDIA are free to go beyond the DX10 spec, but these features might not be exposed through the Microsoft API without a little tweaking. AMD includes one such feature, a tessellator unit, which we'll talk about more later. For now, let's take a look at the overall layout of R600.
Our first look shows a huge amount of stream processing power: 320 SPs all told. These are a little different than NVIDIA's SPs, and over the next few pages we'll talk about why. Rather than a small number of SPs spread across eight groups, our block diagram shows R600 has a high number of SPs in each of four groups. Each of these four groups is connected to its own texture unit, while they share a connection to shader export hardware and a local read/write cache.
All of this is built on an 80nm TSMC process and uses in the neighborhood of 720 Million transistors. All other R6xx parts will be built on a 65nm processes with many fewer transistors, making them much smaller and more power efficient. Core clock speed is on the order of 740MHz for R600 with memory running at 825MHz.
Memory is slower this time around with higher bandwidth, as R600 implements a 512-bit memory bus. While we're speaking about memory, AMD has revised their Ring Bus architecture for this round, which we'll delve into later. Unfortunately we won't be able to really compare it to NVIDIA's implementation, as they won't go into any detail with us on internal memory buses.
And speaking of things NVIDIA won't go into detail on, AMD was good enough to share very low level details, including information on cache sizes and shader hardware implementation. We will be very happy to spend time talking about this, and hopefully AMD will inspire NVIDIA to start opening up a little more and going deeper into their underlying architecture.
To hit the other hot points, R600 does have some rather interesting unique features to back it up. Aside from including a tessellation unit, they have also included an audio processor on their hardware. This will accept audio streams and send them out over their DVI port through a special converter to integrate audio with a video stream over HDMI. This is unique, as current HDMI converters only work with video. AMD also included a programmable AA resolve feature that allows their driver team to create new ways of filtering subsample data.
R600 also features an independent DMA engine that can handle moving and managing all memory to and from the GPU, whether it's over the PCIe bus or local memory channels. This combined with huge amounts of memory bandwidth should really assist applications that require large amounts of data. With DX10 supporting up to 8k x 8k textures, we are very interested in seeing these limits pushed in future games.
That's enough of a general description to whet your appetite: let's dig down under the surface and find out what makes this thing tick.