Let's Talk Performance

This section is likely to generate a lot of flames if left unchecked. First, though, we want to make it abundantly clear that raw, theoretical performance numbers (which is what is listed here) rarely manage to match real world performance figures. There are numerous reasons for this discrepancy, for example the game or application in use may stress different parts of the architecture. A game that pushes a lot of polygons with low resolution textures is going to stress the geometry engine, while a game that uses high resolution textures with lower polygon counts is more likely to stress the memory bandwidth. Pixel and Vertex Shaders are even more difficult to judge, as both ATI and NVIDIA are relatively tight-lipped about the internal layout of their pipelines. These functions are the most like an actual CPU, but they're also highly proprietary and the companies feel a need to protect their technology (probably with good cause). So while we know that AMD Athlon 64 chips have a 12 stage Integer/ALU pipeline and 17 stage FPU/SSE pipeline, we really have no idea how many stages are in the pixel and vertex pipelines of ATI and NVIDIA cards. In fact, we really don't have much more than a simplistic functional overview.

So why even bother talking about performance without benchmarks? In part, by looking at the theoretical performance and comparing it to the real world performance (you'll have to find such real world figures in another article), we can get a better idea of what went wrong and what worked well. More importantly, though, most people referring to a GPU Guide are going to expect some sort of comparison and ranking of the parts. It is by no means definitive, and for some people choosing a graphics card is akin to joining a religion. So, take these numbers with a grain of salt and know that they are not intentionally meant to make one card look better than another. Where performance seriously fails to match expectations, it will be noted.

There are numerous factors that can affect performance, other than the application itself. Drivers are a major one, and it is not unheard of for the performance of a particular card to increase by as much as 50% over its lifetime due to driver enhancements. In light of such examples (i.e. both Radeon and GeForce cards in Quake 3 performance increased dramatically over time), it is somewhat difficult to say that theoretical performance numbers are really that much worse than changing real world numbers. With proper optimization, real world numbers can usually approach theoretical numbers, but this really only occurs for the most popular applications. Features also play a part, all other things being equal, so if two cards have the same theoretical performance but one card is DX9 based and the other is DX8 based, the DX9 card is should be faster.

Speaking of drivers, we would be remiss if we didn't at least mention OpenGL support. Brought into the consumer segment with GLQuake back in 1997, OpenGL is a different platform and requires different drivers. NVIDIA and ATI both have full OpenGL drivers, but all evidence indicates that NVIDIA's drivers are simply better at this point in time. Doom 3 is the latest example of this. However, OpenGL is also used in the professional world, and again NVIDIA tends to lead in performance, even with inferior hardware. Part of the problem is that very few games other than id Software titles and their licensees use OpenGL, so it often takes a back seat to DirectX. However, ATI has vowed to improve their OpenGL performance since the release of Doom 3, and hopefully they can close the gap between their DirectX and OpenGL drivers.

So, how is overall performance determined - in other words, how will the tables be sorted? The three main factors are fill rate, memory bandwidth, and processing power. Fill rate and bandwidth have been used for a long time, and they are well understood. Processing power, on the other hand, is somewhat more difficult to determine, especially with DX8 and later Pixel and Vertex Shaders. We will use the vertices/second rating as am estimate of processing power. For the charts, each section will be normalized relative to the theoretically fastest member of the group, and equal weight will be given to the fill rate, bandwidth, and vertex rate. That's not the best way of measuring performance, of course, but it's a start, and everything is theoretical at this point anyway. If you really want a suggestion on a specific card, the forums and past articles are a better place to search. Another option is to decide which games (or applications) you are most concerned about, and then go find an article that has benchmarks with that particular title.

To reiterate, this is more of a historical perspective on graphics chips and not a comparison of real world performance. And with that disclaimer, let's get on to the performance charts.

The Way It's Meant to be Played Number nine… Number nine…
POST A COMMENT

43 Comments

View All Comments

  • suryad - Monday, September 6, 2004 - link

    What about the mobility x800 graphics card? I didnt see that thrown into the mix? Reply
  • coldpower27 - Monday, September 6, 2004 - link

    Thank you Bloodshredder, yeh after reading a little about the Radeon LE, it's almost as good as a Radeon DDR, except with lower working frequencies.

    so if it's DDR then the correct no. are 148/296 and 32MB VRAM only.
    Reply
  • Bloodshedder - Monday, September 6, 2004 - link

    For the Radeon LE, I noticed a question mark next to the amount of RAM. I own one of these cards, and can confirm that 32MB DDR is the only configuration it comes in. Reply
  • Draven31 - Monday, September 6, 2004 - link

    You skipped which OpenGL version and features the various cards support... maybe add that when you add the various workstation cards to the listings... Reply
  • coldpower27 - Monday, September 6, 2004 - link


    Yeh, Nvidia learned it's lesson, last gen, with the 0.13 micron new at the time process delaying the introduction of the NV30, thy learned to play it safe using a tried and tested process is a good idea for such high complexity chips initially, though they of course plan to shift these chips to the 110nm process when the process matures enough, possibly on the NV48 and R480 hopefully allowing higher clocks in the process:D, maybe not for R480 unless low-k is ready for 110nm by that time.

    It does make more sense to use the newer manufacturing process to help save costs on the volume shipping GPU, as the cost savings will beaccumulated much better in the mainstream and value arena's thanks to sheer volume.

    We also see this with Intel, when Intel yields on the 90nm were only so so, they introduced Prescott up to 3.2GHZ in quanitity, but introduced their Pentium 4 3.4GHZ on the northwood core on 0.13 micron. Though over time Intel is making all efforts to transfer everything to 90nm, with Prescott and Prescott 2M w/1066FSB for EE Edition.
    Reply
  • JarredWalton - Monday, September 6, 2004 - link

    8 - Intel does this as well, testing a new process on their non-flagship parts. For example, after the launch of the P4, Intel piloted their 130 nm copper technology with the Tualatin CPU before releasing the Northwood. It probably has something to do with the amount of extra time a more complex design takes to test and verify. Reply
  • stephenbrooks - Monday, September 6, 2004 - link

    Interesting how on the die sizes chart, I notice they're phasing in the 110nm process only for their mid-range-ish cards and sticking to the tried and tested 130nm for the high-end one. I suppose you can't blame them for that really, given it's their flagship product and all, but it could contribute to the huge die sizes. Reply
  • JarredWalton - Monday, September 6, 2004 - link

    Thank, AtaStrumf - any errors in the numbers are ColdPower's fault. Heheheh. Really, he already caught a bunch of small mistakes, so hopefully the number of remaining errors is very small.

    For what it's worth, there are various versions of some of the chips that have different clock speeds and RAM speeds from what is listed. The models in the chart should reflect the most common configurations, though.

    BTW, the article text is now tweaked somewhat on the ATI and NVIDIA overview pages. Derek Wilson provided some additional insight on the subject of AA and AF that clarified things a little.
    Reply
  • JarredWalton - Monday, September 6, 2004 - link

    Argon was the name for the .25 micron K7, while Pluto and Orion were .18 micron.

    #2 and #4: I realize you're kidding, but in all seriousness we did think about including other architectures. With the broken features on some of the more recent cards and the lack of T&L on 3dfx and older cards, we just decided to stick with the two major players. And hey - it's all fair, as we didn't include Cyrix/Via or Transmeta processors in the CPU cheatsheet! ;)
    Reply
  • AtaStrumf - Monday, September 6, 2004 - link

    OMFG, this is awsome!!!! You really outdid youself this time! I have been collecting data on GPUs for quite a while and have been planing on making a spreadsheet just like the first two for my, so called, web site, but WAU, this rocks. Thanks for saving me a lot of work :)

    When I get the time, I'll check your munbers a bit, just to make sure there aren't any typos in there.
    Reply

Log in

Don't have an account? Sign up now