Round 1: Architecture

The technology in these products has to do with making games think that they have more graphics memory than what the cards physically have on board. ATI and NVIDIA have taken different approaches to solving the problem.

NVIDIA has a solution that goes way down to the inner workings of the GPU. They haven't released details about the specifics on what has been changed with their TurboCache parts, but they state that everything they've done has been to hide the latency of system memory accesses in their pixel and ROP pipelines. Likely, this includes adding larger local caches and doing other things to increase the number of pixels that can be inflight at any given time. A very important factor of NVIDIA's architecture is that it is designed to operate on system memory as if it was local - the only thing that NVIDIA doesn't allow to operate directly in system RAM is the front buffer.

The ATI approach is distinctly more software based, though they do state that the memory controller on their GPU is what makes HyperMemory possible. The extent of these changes is significantly less than the NVIDIA solution. The ATI approach creates more of a virtualized memory system for the graphics card, allowing the driver to allocate system memory as needed and page data in and out of graphics RAM at will. The system memory is windows-managed and so, is virtualized out to the hard disk if necessary (which could really kill performance). Of course, if enough RAM is being used to page graphics data, there are more issues at hand that are likely also causing performance problems.

We haven't talked about GART memory very much since the decline of AGP, but the brief explanation is that GART memory is linearly addressable non-paged memory allocated to the graphics subsystem for external storage. With PCI Express based systems, it seems that the graphics driver manages GART memory completely rather than allowing the system BIOS to set a default size. We haven't been able to get solid details on how this memory is managed from either ATI or NVIDIA.

To take a further step back, the organization of ATI's graphics memory is set up in stages. First the driver determines what surfaces are the highest priorities and loads those into local memory. Whenever anything new comes along after local memory gets crowded, ATI demotes lower priority surfaces to GART memory. When GART memory gets too full, surfaces can further be demoted to pageable, Windows managed system memory. This system memory is requested by the driver as necessary and freed when memory pressure decreases again.

Microsoft's next Windows OS will require graphics drivers to support fully virtualized and windows managed graphics memory. Along with their VPU recover (graphics hardware reset), HyperMemory may be a product of ATI's preliminary Longhorn work. To be sure, including the ability to incorporate windows-managed memory with driver-managed local RAM is a drop in the bucket compared to handing over all local and system graphics memory management to the OS.

The inclusion of virtualized graphics memory is actually something that workstation users have been calling for for quite some time. It's very interesting to see the technology end up in a value product first. Hopefully, ATI will follow 3Dlabs' lead and bring their virtualization technology to the workstation space as well.

The major difference between TurboCache and HyperMemory is that the latter must first load a required surface into local memory before operating on it - possibly requiring the driver to kick something else off of local memory into system RAM. The separation of up and down stream bandwidth in PCI Express makes this relatively painless. TurboCache, on the other hand, sees all graphics memory as local and does not need to load a surface or texture to local RAM before operating on it. Shaders are able to read and write directly over the PCI Express bus into system RAM. Under the NVIDIA solution, the driver carries the burden of keeping the most used and most important bits of data in local memory.

The underlying architectures of these cards dictate the comparison points that we will chose. The ATI card needs more local RAM than the NVIDIA card because it isn't rearchitected to support operating on the majority of its data at across the high latency of a system bus. More fast local RAM is good, but with more RAM comes more cost. The balance will be found in who can afford to charge the least - ATI with a pretty much stock R42x and more RAM, or NVIDIA with less RAM and a rearchitected GPU. Price is a huge factor in determining the better solution here, and performance often comes as an afterthought.

Happily, we embrace the new move to eliminate graphics API features as a distinguishing factor in graphics hardware decisions.

Index Round 2: Performance
Comments Locked

33 Comments

View All Comments

  • JarredWalton - Thursday, May 12, 2005 - link

    Despite the fact that the system was high-end, the performance of the cards is the limiting factor. A 3.0 GHz or 3000+ CPU with the same amount of RAM would likely post similar scores. However, 512MB of RAM would have something of an impact on performance. Anyway, $100 gets you 1GB of RAM these days, so we're more or less through with testing anything with 512MB of RAM or less.

    Consider these tests as a "best case" scenario for budget graphics performance. It will work, but it won't be too impressive. Dropping detail levels will also help, of course. However, reduced detail levels don't really help out performance in some games as much as you might want. Doom 3, for instance, doesn't get a whole lot faster going from High to Medium to Low detail (at least in my experience).
  • patrick0 - Thursday, May 12, 2005 - link

    A Budget-card comparision, with 1024MB System memory? I think it would have been better to use 512MB of system-memory.
  • DerekWilson - Thursday, May 12, 2005 - link

    Wellsoul2,

    The ATI solution does not have DVI -- just one HD15 analog port.

    These cards are fine for running native resolution desktops, but you are not going to want to push them over 1024x768 even without AA/AF ... as you can see the numbers dropped off at that resolution for most tests and 800x600 is really the sweet spot for gaming with these cards.

    If you really want to guess about the framerate, just remember that moving from 1024x768 to 1280x1024 increases the number of pixels per frame by a greater ammount and percentage than when moving from 800x600 to 1024x768. Eventhough we can't effectively extrapolate performance from our (fairly linear) graph, we can bet that we'd be getting < 30fps which is considered unplayable by our standards. As HL2 posts relatively high framerates (compared to other games of this generation), we made the call to limit our tests to 1024x768 and lower.

    If LCD users want to game on a 1280x1024 panel with these cards, they'll need to do it at a non-native resolution.
  • Wellsoul2 - Thursday, May 12, 2005 - link

    Again..anyone with an LCD monitor would prefer
    native resolution numbers. No amount of AA/AF
    seems to make up for the crappy interpolated
    video at lower resolutions.
    At least give the numbers for Doom3/HL2..
  • Wellsoul2 - Thursday, May 12, 2005 - link

    I'd like video reviews with 1280x1024 resolution.
    This is the native resolution of my 19in LCD.

    It would seem more useful to have tests for the
    low budget cards at high resolution for LCD and
    using no AA/AF.

    People I know are buying the new cheap cards because they have DVI output to use with LCD.
    If you buy a cheap/mid priced computer they often
    have built in analog video only.

    I'd like to know if you can run Half Life 2 at
    1280x1024 with no AA/AF on these cards and what
    the frame rate is.
  • cryptonomicon - Thursday, May 12, 2005 - link

    WTF?

    what is this thing? these puny performance cards with only 32mb memory can beat previous generations of retail mid-range cards??? how the heck...
  • KristopherKubicki - Thursday, May 12, 2005 - link

    stevty2889 and others:

    The tests are run on the highest end components to assure the bottlenecks are the video card and not the CPU. Furthermore, since we test every single other video card and motherboard on the same setup, it makes sense for us to use the same hardware this time around as well.

    Neither the X300 nor the 6200 will receive a magical advantage by using low end hardware instead of high end.

    Kristopher
  • Hikari - Thursday, May 12, 2005 - link

    The x300 is more like a neutered 9600 instead of a 9200 I thought. Given that the one tested here won in HL2, that would seem to be the case, no?
  • Marlin1975 - Thursday, May 12, 2005 - link

    ^

    Ignore, just re-read :)
  • Marlin1975 - Thursday, May 12, 2005 - link

    DerekWilson see my other post at 5. I then saw the graphs are not wrong, BUT you said the "The 16 and 32 MB TC cards round out the bottom and top of the pack respectively" which is not true. The 64mb and 16mb round out the bottom, while the 32mb is at the top.

Log in

Don't have an account? Sign up now