ATI Radeon HD 2900 XT: Calling a Spade a Spadeby Derek Wilson on May 14, 2007 12:04 PM EST
- Posted in
Memory and Data Movement
Internal cache bandwidth on the R600 is 180GB/sec, while the internal memory bus, a second generation Ring Bus that builds on the X1k series idea, is able to deliver 100GB/sec of throughput in either read or write capacity. Memory offers nearly 110GB/sec, and AMD has stated that the internal bus is well matched to this due to the fact that some external bandwidth is wasted on overhead. The bottom line here is that a whole of data can move very quickly into and out of this hardware.
As we mentioned, R600 sees a reincarnation of the Ring Bus which can now handle both read and write data (X1k could only handle reads on the Ring Bus while writes were run through a crossbar). An independent DMA controller manages a bus comprised of multiple ring stops. There is one ring stop per pair of memory channels, and each ring stop is connected to two others via a 256 bit wide connection. The ring bus is 1024 wires total and can move read and write data in either direction to follow the shortest path around the ring to or from the memory client or memory.
The Ring Bus allows the PCI Express bus to be treated like just another memory device by the rest of the hardware. The DMA hardware is able to manage all the traffic to and from onboard and system memory in the same manner, and the memory clients on the GPU don't need to know what device they're talking to. The Ring Bus services 84 read clients and 70 write clients.
The external memory interface is 512-bit, doubling the X1k maximum of 256-bit and surpassing G80's 384-bit memory bus. Memory speeds are lower than on previous generation high end AMD hardware, but total bandwidth is higher. The net result is that AMD only slightly edges out G80 for memory bandwidth.
In implementing the 512-bit memory interface, AMD didn't want to add any more I/O pads to its package. They accomplished this by making use of a stacked I/O pad design. Unfortunately, details were vague on the implementation and methods used to keep clock speed high in spite of the proximity of other high frequency I/O.