Budget Battle: HyperMemory vs. TurboCacheby Derek Wilson on May 12, 2005 9:00 AM EST
- Posted in
Round 1: ArchitectureThe technology in these products has to do with making games think that they have more graphics memory than what the cards physically have on board. ATI and NVIDIA have taken different approaches to solving the problem.
NVIDIA has a solution that goes way down to the inner workings of the GPU. They haven't released details about the specifics on what has been changed with their TurboCache parts, but they state that everything they've done has been to hide the latency of system memory accesses in their pixel and ROP pipelines. Likely, this includes adding larger local caches and doing other things to increase the number of pixels that can be inflight at any given time. A very important factor of NVIDIA's architecture is that it is designed to operate on system memory as if it was local - the only thing that NVIDIA doesn't allow to operate directly in system RAM is the front buffer.
The ATI approach is distinctly more software based, though they do state that the memory controller on their GPU is what makes HyperMemory possible. The extent of these changes is significantly less than the NVIDIA solution. The ATI approach creates more of a virtualized memory system for the graphics card, allowing the driver to allocate system memory as needed and page data in and out of graphics RAM at will. The system memory is windows-managed and so, is virtualized out to the hard disk if necessary (which could really kill performance). Of course, if enough RAM is being used to page graphics data, there are more issues at hand that are likely also causing performance problems.
We haven't talked about GART memory very much since the decline of AGP, but the brief explanation is that GART memory is linearly addressable non-paged memory allocated to the graphics subsystem for external storage. With PCI Express based systems, it seems that the graphics driver manages GART memory completely rather than allowing the system BIOS to set a default size. We haven't been able to get solid details on how this memory is managed from either ATI or NVIDIA.
To take a further step back, the organization of ATI's graphics memory is set up in stages. First the driver determines what surfaces are the highest priorities and loads those into local memory. Whenever anything new comes along after local memory gets crowded, ATI demotes lower priority surfaces to GART memory. When GART memory gets too full, surfaces can further be demoted to pageable, Windows managed system memory. This system memory is requested by the driver as necessary and freed when memory pressure decreases again.
Microsoft's next Windows OS will require graphics drivers to support fully virtualized and windows managed graphics memory. Along with their VPU recover (graphics hardware reset), HyperMemory may be a product of ATI's preliminary Longhorn work. To be sure, including the ability to incorporate windows-managed memory with driver-managed local RAM is a drop in the bucket compared to handing over all local and system graphics memory management to the OS.
The inclusion of virtualized graphics memory is actually something that workstation users have been calling for for quite some time. It's very interesting to see the technology end up in a value product first. Hopefully, ATI will follow 3Dlabs' lead and bring their virtualization technology to the workstation space as well.
The major difference between TurboCache and HyperMemory is that the latter must first load a required surface into local memory before operating on it - possibly requiring the driver to kick something else off of local memory into system RAM. The separation of up and down stream bandwidth in PCI Express makes this relatively painless. TurboCache, on the other hand, sees all graphics memory as local and does not need to load a surface or texture to local RAM before operating on it. Shaders are able to read and write directly over the PCI Express bus into system RAM. Under the NVIDIA solution, the driver carries the burden of keeping the most used and most important bits of data in local memory.
The underlying architectures of these cards dictate the comparison points that we will chose. The ATI card needs more local RAM than the NVIDIA card because it isn't rearchitected to support operating on the majority of its data at across the high latency of a system bus. More fast local RAM is good, but with more RAM comes more cost. The balance will be found in who can afford to charge the least - ATI with a pretty much stock R42x and more RAM, or NVIDIA with less RAM and a rearchitected GPU. Price is a huge factor in determining the better solution here, and performance often comes as an afterthought.
Happily, we embrace the new move to eliminate graphics API features as a distinguishing factor in graphics hardware decisions.