ATI Radeon Xpress 200: Performance, PCI Express & DX9 for Athlon 64by Wesley Fink on November 8, 2004 6:00 AM EST
- Posted in
SidePort: On-Board GPU MemoryJust before the launch of the Athlon 64, we found that many chipset manufacturers were a bit worried about the performance of integrated graphics solutions and AMD's new CPU. The worries stemmed from the fact that in previous CPU/chipset architectures, the integrated graphics cores resided on the North Bridge and shared access to the system memory controller - also located on the North Bridge. With the Athlon 64, however, the memory controller resides in the CPU, increasing memory access latencies from the perspective of the integrated graphics core.
The Radeon Xpress 200 supports a local frame buffer attached to what ATI refers to as their "SidePort". The SidePort is a 32-bit DDR memory interface that the integrated graphics can use either instead of or alongside the Athlon 64's memory controller.
While we assumed that SidePort was included to hide some of the latencies of using the Athlon 64's memory controller, that ended up not being true as performance in UMA mode (using the Athlon 64's memory controller) was quite respectable. It turns out that most games don't benefit too much from lower latency memory accesses (through SidePort). So, why would ATI include support for a local frame buffer with the Radeon Xpress 200? Although performance is improved with SidePort enabled, the biggest reason for supporting the feature is to reduce power consumption in mobile environments. Without SidePort enabled, the CPU needs to be awake to fetch data for refreshing the display, but with SidePort enabled, all memory accesses can occur via the Radeon Xpress 200 and the CPU can remain asleep in power saving modes.
Because of the added cost of supporting SidePort, it isn't a requirement - the Radeon Xpress 200 has four memory operating modes:
- SidePort only - In this mode, the integrated graphics core treats the SidePort memory as its local memory. If more memory is needed, it is allocated dynamically through system memory by the driver, which is significantly higher latency than the local SidePort memory.
- UMA only - In UMA mode, the only memory to which the integrated graphics has access is a dynamically allocated partition of system memory. The size of the parition is selectable from within the BIOS (ATI's reference board allows for 16 - 128MB sizes). If more memory is needed, it is allocated dynamically through system memory by the driver.
- UMA + SidePort (Interleaving Disabled) - In this mode, the total amount of "local" graphics memory is the size of the UMA partition and the amount of memory connected to the Radeon Xpress' SidePort. The integrated graphics core will first use SidePort memory until it runs out, then using system memory. If more memory is needed, it is allocated dynamically through system memory by the driver.
- UMA + SidePort (Interleaving Enabled) - By enabling Interleaving and setting the UMA frame buffer size to the same size as the memory connected to the Radeon Xpress' SidePort, a special Interleaving mode is enabled. In this mode, the integrated graphics cores will request data from both the UMA space and SidePort memory. The benefit of Interleaving is that now two reads or writes can occur at the same time, whereas with just SidePort only a single 32-bit read/write can happen at any given time. Despite the fact that UMA accesses will be higher latency, the dual ported nature of this setup improves overall performance. There are situations when a SidePort only configuration will offer greater performance if the application depends on lower latency memory accesses. If more memory is needed it is dynamically allocated through system memory by the driver.
ATI's reference board features 16MB of DDR memory attached to the Radeon Xpress' SidePort. The memory can either run synchronously with the system memory clock (200MHz for DDR400) or asynchronously, where the speed is bound by the type of memory used. In our case, the 2.5ns Samsung DDR located on the board was capable of running at the maximum frequency the BIOS allowed - 350MHz.
As we mentioned before, the SidePort memory interface is a single 32-bit channel, which at 350MHz provides 1.4GB/s of bandwidth to the integrated graphics core. At 200MHz SidePort can only provide 800MB/s of bandwidth, so the additional latency incurred by running the SidePort asynchronously with main memory is well worth the additional bandwidth.
What's truly interesting is the pretty impressive performance of running in SidePort-only mode. Granted you are limited to low resolutions, but as you will soon see, the integrated graphics core isn't really designed to run at very high resolutions. In fact, running in SidePort-only mode is faster than running in UMA only mode with a single-channel Socket-754 Athlon 64.
The charts below do a good job of showing off the performance advantages to the various operating modes of the Radeon Xpress 200.
The first thing we see is that there's a huge performance advantage to the dual channel memory controller of the Socket-939 Athlon 64, - 33% in Doom 3 and 27% in UT2004. This is far from unexpected given that the more system memory bandwidth you have, the more graphics memory bandwidth you have.
The performance advantage to using the SidePort + UMA configuration isn't insignificant either - 8.5% in Doom 3 and 7.7% in UT2004, however with the added cost we would say that the SidePort isn't absolutely necessary for desktops (but we understand its usefulness in notebooks).
We compared the graphics performance of the Radeon Xpress 200 to ATI's lowest end discrete PCI Express graphics card: the Radeon X300 SE. The X300 SE is a four-pipe version of the Radeon Xpress 200 but with only a 64-bit DDR memory interface, so the Radeon Xpress 200 actually holds a memory bandwidth advantage over the X300 SE while it is at a fill rate deficit.
We also compared the Radeon Xpress 200 to Intel's Graphics Media Accelerator 900. While the GMA 900 is obviously only available on the Intel-only 915G and the Radeon Xpress 200 is an AMD-only solution, the two offerings are slow enough that most games end up being completely GPU limited and thus the CPU differences become negligible.
You'll notice that not all of the benchmarks have scores for Intel's GMA 900; those that don't have GMA 900 scores are ones where the GMA 900 was not able to either run the game or complete the benchmark without crashing.