GPU Cheatsheet - A History of Modern Consumer Graphics Processorsby Jarred Walton on September 6, 2004 12:00 AM EST
- Posted in
Let's Talk Performance
This section is likely to generate a lot of flames if left unchecked. First, though, we want to make it abundantly clear that raw, theoretical performance numbers (which is what is listed here) rarely manage to match real world performance figures. There are numerous reasons for this discrepancy, for example the game or application in use may stress different parts of the architecture. A game that pushes a lot of polygons with low resolution textures is going to stress the geometry engine, while a game that uses high resolution textures with lower polygon counts is more likely to stress the memory bandwidth. Pixel and Vertex Shaders are even more difficult to judge, as both ATI and NVIDIA are relatively tight-lipped about the internal layout of their pipelines. These functions are the most like an actual CPU, but they're also highly proprietary and the companies feel a need to protect their technology (probably with good cause). So while we know that AMD Athlon 64 chips have a 12 stage Integer/ALU pipeline and 17 stage FPU/SSE pipeline, we really have no idea how many stages are in the pixel and vertex pipelines of ATI and NVIDIA cards. In fact, we really don't have much more than a simplistic functional overview.
So why even bother talking about performance without benchmarks? In part, by looking at the theoretical performance and comparing it to the real world performance (you'll have to find such real world figures in another article), we can get a better idea of what went wrong and what worked well. More importantly, though, most people referring to a GPU Guide are going to expect some sort of comparison and ranking of the parts. It is by no means definitive, and for some people choosing a graphics card is akin to joining a religion. So, take these numbers with a grain of salt and know that they are not intentionally meant to make one card look better than another. Where performance seriously fails to match expectations, it will be noted.
There are numerous factors that can affect performance, other than the application itself. Drivers are a major one, and it is not unheard of for the performance of a particular card to increase by as much as 50% over its lifetime due to driver enhancements. In light of such examples (i.e. both Radeon and GeForce cards in Quake 3 performance increased dramatically over time), it is somewhat difficult to say that theoretical performance numbers are really that much worse than changing real world numbers. With proper optimization, real world numbers can usually approach theoretical numbers, but this really only occurs for the most popular applications. Features also play a part, all other things being equal, so if two cards have the same theoretical performance but one card is DX9 based and the other is DX8 based, the DX9 card is should be faster.
Speaking of drivers, we would be remiss if we didn't at least mention OpenGL support. Brought into the consumer segment with GLQuake back in 1997, OpenGL is a different platform and requires different drivers. NVIDIA and ATI both have full OpenGL drivers, but all evidence indicates that NVIDIA's drivers are simply better at this point in time. Doom 3 is the latest example of this. However, OpenGL is also used in the professional world, and again NVIDIA tends to lead in performance, even with inferior hardware. Part of the problem is that very few games other than id Software titles and their licensees use OpenGL, so it often takes a back seat to DirectX. However, ATI has vowed to improve their OpenGL performance since the release of Doom 3, and hopefully they can close the gap between their DirectX and OpenGL drivers.
So, how is overall performance determined - in other words, how will the tables be sorted? The three main factors are fill rate, memory bandwidth, and processing power. Fill rate and bandwidth have been used for a long time, and they are well understood. Processing power, on the other hand, is somewhat more difficult to determine, especially with DX8 and later Pixel and Vertex Shaders. We will use the vertices/second rating as am estimate of processing power. For the charts, each section will be normalized relative to the theoretically fastest member of the group, and equal weight will be given to the fill rate, bandwidth, and vertex rate. That's not the best way of measuring performance, of course, but it's a start, and everything is theoretical at this point anyway. If you really want a suggestion on a specific card, the forums and past articles are a better place to search. Another option is to decide which games (or applications) you are most concerned about, and then go find an article that has benchmarks with that particular title.
To reiterate, this is more of a historical perspective on graphics chips and not a comparison of real world performance. And with that disclaimer, let's get on to the performance charts.