Round 1: Architecture

The technology in these products has to do with making games think that they have more graphics memory than what the cards physically have on board. ATI and NVIDIA have taken different approaches to solving the problem.

NVIDIA has a solution that goes way down to the inner workings of the GPU. They haven't released details about the specifics on what has been changed with their TurboCache parts, but they state that everything they've done has been to hide the latency of system memory accesses in their pixel and ROP pipelines. Likely, this includes adding larger local caches and doing other things to increase the number of pixels that can be inflight at any given time. A very important factor of NVIDIA's architecture is that it is designed to operate on system memory as if it was local - the only thing that NVIDIA doesn't allow to operate directly in system RAM is the front buffer.

The ATI approach is distinctly more software based, though they do state that the memory controller on their GPU is what makes HyperMemory possible. The extent of these changes is significantly less than the NVIDIA solution. The ATI approach creates more of a virtualized memory system for the graphics card, allowing the driver to allocate system memory as needed and page data in and out of graphics RAM at will. The system memory is windows-managed and so, is virtualized out to the hard disk if necessary (which could really kill performance). Of course, if enough RAM is being used to page graphics data, there are more issues at hand that are likely also causing performance problems.

We haven't talked about GART memory very much since the decline of AGP, but the brief explanation is that GART memory is linearly addressable non-paged memory allocated to the graphics subsystem for external storage. With PCI Express based systems, it seems that the graphics driver manages GART memory completely rather than allowing the system BIOS to set a default size. We haven't been able to get solid details on how this memory is managed from either ATI or NVIDIA.

To take a further step back, the organization of ATI's graphics memory is set up in stages. First the driver determines what surfaces are the highest priorities and loads those into local memory. Whenever anything new comes along after local memory gets crowded, ATI demotes lower priority surfaces to GART memory. When GART memory gets too full, surfaces can further be demoted to pageable, Windows managed system memory. This system memory is requested by the driver as necessary and freed when memory pressure decreases again.

Microsoft's next Windows OS will require graphics drivers to support fully virtualized and windows managed graphics memory. Along with their VPU recover (graphics hardware reset), HyperMemory may be a product of ATI's preliminary Longhorn work. To be sure, including the ability to incorporate windows-managed memory with driver-managed local RAM is a drop in the bucket compared to handing over all local and system graphics memory management to the OS.

The inclusion of virtualized graphics memory is actually something that workstation users have been calling for for quite some time. It's very interesting to see the technology end up in a value product first. Hopefully, ATI will follow 3Dlabs' lead and bring their virtualization technology to the workstation space as well.

The major difference between TurboCache and HyperMemory is that the latter must first load a required surface into local memory before operating on it - possibly requiring the driver to kick something else off of local memory into system RAM. The separation of up and down stream bandwidth in PCI Express makes this relatively painless. TurboCache, on the other hand, sees all graphics memory as local and does not need to load a surface or texture to local RAM before operating on it. Shaders are able to read and write directly over the PCI Express bus into system RAM. Under the NVIDIA solution, the driver carries the burden of keeping the most used and most important bits of data in local memory.

The underlying architectures of these cards dictate the comparison points that we will chose. The ATI card needs more local RAM than the NVIDIA card because it isn't rearchitected to support operating on the majority of its data at across the high latency of a system bus. More fast local RAM is good, but with more RAM comes more cost. The balance will be found in who can afford to charge the least - ATI with a pretty much stock R42x and more RAM, or NVIDIA with less RAM and a rearchitected GPU. Price is a huge factor in determining the better solution here, and performance often comes as an afterthought.

Happily, we embrace the new move to eliminate graphics API features as a distinguishing factor in graphics hardware decisions.

Index Round 2: Performance
Comments Locked

33 Comments

View All Comments

  • OrSin - Thursday, May 12, 2005 - link

    Why test $60 video cards on systems with the highend chips and memory. No one test goodyear tires on ferrari. I want to see these test ran on 2800 CPU and kinmax memory. As it stands this a waste to me.

    Also why not test some on board cards to these.
    To see if its even worth upgrade the Intel ,ATI, or NV solution to these.

  • stevty2889 - Thursday, May 12, 2005 - link

    What the heck? The system it was tested on was:
    Microsoft Windows XP SP2
    ASUS A8N-SLI Deluxe
    AMD Athlon FX-53
    1GB OCZ PC3200 @ 2:2:2:9
    Seagate 7200.7 HD
    OCZ Powerstream 600W PS

    Nobody that buy's these cards is going to be running on a system like that..should have been tested with a sempron, or 2.8ghz P4/A64 2800+ type setup instead..
  • DerekWilson - Thursday, May 12, 2005 - link

    Sorry about leaving off the machine specs -- I've updated the article.

    Actually, that was quite an oversight as the system these cards are run in is very important to note when looking at the numbers.

    Marlin1975, InuYasha is correct -- the 32MB card out performed the 64MB part in almost every test. It wasn't until we upped the resolution to unplayable degrees that the 64MB part was able to make up the difference.
  • bob661 - Thursday, May 12, 2005 - link

    Does anyone know what motherboard and how much ram was used in their test system?
  • CrystalBay - Thursday, May 12, 2005 - link

    #3 I was wondering the same thing myself, hmmm.
  • InuYasha - Thursday, May 12, 2005 - link

    #4 the 32MB and 64MB results are not backward.

    if i remember correctly from HOCP, the 32MB has faster memory and that makes a huge difference than the amount of memory.
  • Icehawk - Thursday, May 12, 2005 - link

    I don't see the machine specs anywhere either? I'm curious if these were tested on the "standard" uber-machine or tested on what kind of PC someone buying these would actually have. Somehow, by the #s generated I think this was on the uber-machine. While interesting to see ultimate performance I think end-users would also be served by showing more realistic performance #s.

    A couple of minor typos ;)
  • Marlin1975 - Thursday, May 12, 2005 - link

    OK I think it is a mis-tpe, not the graohs that are wrong on page 7...

    "Unreal Tournament 2004 shows our 32MB HyperMemory performing on par with the 64MB TurboCache part in the middle of the pack. The 16 and 32 MB TC cards round out the bottom and top of the pack respectively."
  • gimpsoft - Thursday, May 12, 2005 - link

    be alot better if it could be the other way around taking video card memory and using it for windows then ill really pay for it lol letting video card borrow system memory bad idea for the future

    anyways that's just me
  • Marlin1975 - Thursday, May 12, 2005 - link

    Do you have the 32mb and 64mb cards backwards?

Log in

Don't have an account? Sign up now