DirectX 9 Performance

Below you can see our plot of the DirectX9 components.

9600 Pro 400 600 4 1 2 128 1600 200 9155 100.0% 100.0% 100.0% 100.0%
DirectX 9
GF 6800UE 450 1200 16 1 6 256 7200 675 36621 450.0% 400.0% 337.5% 475.0%
X800 XT PE 520 1120 16 1 6 256 8320 780 34180 520.0% 373.3% 390.0% 470.6%
X800 XT PE 520 1120 16 1 6 256 8320 780 34180 520.0% 373.3% 390.0% 470.6%
X800 XT 500 1000 16 1 6 256 8000 750 30518 500.0% 333.3% 375.0% 443.1%
GF 6800U 400 1100 16 1 6 256 6400 600 33569 400.0% 366.7% 300.0% 426.7%
X800 GT? 425 900 16 1 6 256 6800 638 27466 425.0% 300.0% 318.8% 382.7%
GF 6800GT 350 1000 16 1 6 256 5600 525 30518 350.0% 333.3% 262.5% 378.3%
X800 Pro 475 900 12 1 6 256 5700 713 27466 356.3% 300.0% 356.3% 371.3%
X800 Pro 475 900 12 1 6 256 5700 713 27466 356.3% 300.0% 356.3% 371.3%
X800 SE? 425 800 8 1 6 256 3400 638 24414 212.5% 266.7% 318.8% 292.6%
X700 XT? 500 1000 8 1 6 128 4000 750 15259 250.0% 166.7% 375.0% 290.3%
GF 6800 325 700 12 1 5 256 3900 406 21362 243.8% 233.3% 203.1% 272.1%
GF 6800 325 700 12 1 5 256 3900 406 21362 243.8% 233.3% 203.1% 272.1%
GF 6600GT 500 1000 8 1 3 128 4000 375 15259 250.0% 166.7% 187.5% 241.7%
GF 6800LE 320 700 8 1 5 256 2560 400 21362 160.0% 233.3% 200.0% 237.3%
GF 6800LE 320 700 8 1 5 256 2560 400 21362 160.0% 233.3% 200.0% 237.3%
9800 XT 412 730 8 1 4 256 3296 412 22278 206.0% 243.3% 206.0% 218.4%
GFFX 5950U 475 950 4 2 3 256 3800 356 28992 237.5% 316.7% 178.1% 207.5%
9800 Pro 256 380 700 8 1 4 256 3040 380 21362 190.0% 233.3% 190.0% 204.4%
9800 Pro 128 380 680 8 1 4 256 3040 380 20752 190.0% 226.7% 190.0% 202.2%
GFFX 5900U 450 850 4 2 3 256 3600 338 25940 225.0% 283.3% 168.8% 191.8%
GFFX 5900 400 850 4 2 3 256 3200 300 25940 200.0% 283.3% 150.0% 179.4%
9700 Pro 325 620 8 1 4 256 2600 325 18921 162.5% 206.7% 162.5% 177.2%
9800 325 600 8 1 4 256 2600 325 18311 162.5% 200.0% 162.5% 175.0%
9800 SE 256 380 680 4 1 4 256 1520 380 20752 95.0% 226.7% 190.0% 170.6%
GFFX 5900XT/SE 400 700 4 2 3 256 3200 300 21362 200.0% 233.3% 150.0% 165.3%
9800 "Pro" 380 680 8 1 4 128 3040 380 10376 190.0% 113.3% 190.0% 164.4%
GFFX 5800U 500 1000 4 2 2 128 4000 250 15259 250.0% 166.7% 125.0% 153.5%
9700 275 540 8 1 4 256 2200 275 16479 137.5% 180.0% 137.5% 151.7%
GF 6600 300 550 8 1 3 128 2400 225 8392 150.0% 91.7% 112.5% 141.7%
9800 SE 128 325 580 8 1 4 128 2600 325 8850 162.5% 96.7% 162.5% 140.6%
GFFX 5700U GDDR3 475 950 4 1 3 128 1900 356 14496 118.8% 158.3% 178.1% 129.0%
GFFX 5700U 475 900 4 1 3 128 1900 356 13733 118.8% 150.0% 178.1% 126.6%
X600 XT 500 740 4 1 2 128 2000 250 11292 125.0% 123.3% 125.0% 124.4%
GFFX 5800 400 800 4 2 2 128 3200 200 12207 200.0% 133.3% 100.0% 122.8%
9500 Pro 275 540 8 1 4 128 2200 275 8240 137.5% 90.0% 137.5% 121.7%
9600 XT 500 600 4 1 2 128 2000 250 9155 125.0% 100.0% 125.0% 116.7%
9600 Pro 400 600 4 1 2 128 1600 200 9155 100.0% 100.0% 100.0% 100.0%
X600 Pro 400 600 4 1 2 128 1600 200 9155 100.0% 100.0% 100.0% 100.0%
GFFX 5700 425 500 4 1 3 128 1700 319 7629 106.3% 83.3% 159.4% 98.9%
9500 275 540 4 1 4 128 1100 275 8240 68.8% 90.0% 137.5% 98.8%
GFFX 5600U FC 400 800 4 1 1 128 1600 100 12207 100.0% 133.3% 50.0% 80.3%
9600 325 400 4 1 2 128 1300 163 6104 81.3% 66.7% 81.3% 76.4%
X300 325 400 4 1 2 128 1300 163 6104 81.3% 66.7% 81.3% 76.4%
GFFX 5600U 350 700 4 1 1 128 1400 88 10681 87.5% 116.7% 43.8% 70.2%
9600 SE 325 400 4 1 2 64 1300 163 3052 81.3% 33.3% 81.3% 65.3%
X300 SE 325 400 4 1 2 64 1300 163 3052 81.3% 33.3% 81.3% 65.3%
GFFX 5200U 325 650 4 1 1 128 1300 81 9918 81.3% 108.3% 40.6% 65.2%
9550 250 400 4 1 2 128 1000 125 6104 62.5% 66.7% 62.5% 63.9%
GFFX 5700LE 250 400 4 1 3 128 1000 188 6104 62.5% 66.7% 93.8% 63.2%
GFFX 5600 325 500 4 1 1 128 1300 81 7629 81.3% 83.3% 40.6% 58.1%
9550 SE 250 400 4 1 2 64 1000 125 3052 62.5% 33.3% 62.5% 52.8%
GFFX 5500 270 400 4 1 1 128 1080 68 6104 67.5% 66.7% 33.8% 47.6%
GFFX 5200 250 400 4 1 1 128 1000 63 6104 62.5% 66.7% 31.3% 45.5%
GFFX 5600XT 235 400 4 1 1 128 940 59 6104 58.8% 66.7% 29.4% 43.9%
GFFX 5200LE 250 400 4 1 1 64 1000 63 3052 62.5% 33.3% 31.3% 36.0%
* RAM clock is the effective clock speed, so 250 MHz DDR is listed as 500 MHz.
** Textures/Pipeline is the maximum number of texture lookups per pipeline.
*** NVIDIA says their GFFX cards have a "vertex array", but in practice it generally functions as indicated.
**** Single-texturing fill rate = core speed * pixel pipelines
+ Multi-texturing fill rate = core speed * maximum textures per pipe * pixel pipelines
++ Vertex rates can vary by implementation. The listed values reflect the manufacturers' advertised rates.
+++ Bandwidth is expressed in actual MB/s, where 1 MB = 1024 KB = 1048576 Bytes.
++++ Relative performance is normalized to the Radeon 9600 pro, but these values are at best a rough estimate.

There are numerous footnotes that are worth pointing out, just in case some people missed them. For starters, the memory bandwidth is something that many people may not like. Normally, all companies list MB/s and GB/s calculating MB as one million bytes and GB as one billion bytes. That's incorrect, but since everyone does it, it begins to not matter. However, in this chart, real MB/s values are listed, so they will all be lower than what the graphics card makers advertise.

Fill rate can also be calculated in various ways, and for ATI's older Radeon cards (the DX7 models), they could apply three textures per pipeline per pass, or so they claimed. Two of the texture lookups, however, had to use the same texture, which made it a little less useful. Anyway, these are all purely theoretical numbers, and it is almost impossible to say how accurate they are in the real world without some specialized tools. To date, no one has created "real world" tools that measure these values, and they probably never will, so we are stuck with synthetic benchmarks at best. Basically, don't take the fill rate scores too seriously.

You can read the remaining footnotes above, and they should be self-explanatory. We just wanted to clarify those two points up front, and they apply to all of the performance charts. Now, on to the comments specifically related to DirectX 9.

The most important thing to point out first is that this chart has an additional weighting. This is due to the discrepancies in features and performance that exist among the various models of DirectX 9 hardware. The biggest concern is the theoretical performance of the GeForce FX cards. Most people should know this by now, but simply put the FX cards do not manage to live up to expectations at all when running DirectX 9 code. In DirectX 8.1 and earlier, the theoretical performance is a relatively accurate reflection of the real world, but overall the cards are far from perfect. We felt that the initial sorting was so unrealistic that a further weighting of the scores was in order, however you can view the unweighted chart if you wish. Newer features help improve performance at the same clock speed for cards as well, for example the optimizations to the memory controller in the GF6 line make the 6800 vanilla a faster card in almost all cases compared to the FX5950U and 9800 Pro cards. In fact, the GF6 cards are really only beaten by the X800 cards, and that's still not always the case.

The weighting used was relatively simple (and arbitrary). After averaging the fill rate, bandwidth and vertex rate scores, we multiply the result by a weighting factor.

NV3x Series: 0.85
R3xx Series: 1.00
R4xx Series: 1.10
NV4x Series: 1.20

This gives a rough approximation of how the features and architectural differences play out. Also note that certain chips lack some of the more specialized hardware optimizations, so while theoretical performance of the 5200U appears better than the 5600 and 5700LE, in most situations it ends up slower. Similarly, the X600 Pro and X300 chips should beat the 9600 Pro and 9600 chips in real performance, as the RV370 and RV380 probably contain a few optimizations and enhancements. They are also PCI Express parts, but that is not something to really worry about. PCI Express, at least for the time being, seems to be of little impact in actual performance - sometimes it's a little faster, sometimes it's a little slower. If you're looking at buying a PCIe based system for the other parts, that's fine, but we recommend that you don't waste your money on such an expensive system solely for PCIe - by the time PCIe really has a performance lead, today's systems will need upgrading anyway.

If you refer back to the earlier charts, you will notice that the X600 and X300 do not include any of the SM2.0b features. This is not a mistake - only the forthcoming X700 cards will bring the new features to ATI's mid-range cards. This is in contrast to the 6600 cards, which are functionally identical to the 6800 cards, only with fewer pipelines. The X700 is likely to have a performance advantage over the 6600 in many situations, as it will have a full six vertex pipelines compared to three vertex pipelines on the 6600. Should the 6800LE become widely available, however, it could end up the champion of the $200 and under segment, as the 256-bit memory bus may be more important than clock speeds. Having more than 25 GB/s of memory bandwidth does not always help performance without extremely fast graphics cores, but having less than 16 GB/s can slow things down. We'll find out how things play out in a few months.

The need, for speed Bring on the Crazy Eighty Eight!
POST A COMMENT

43 Comments

View All Comments

  • suryad - Monday, September 6, 2004 - link

    What about the mobility x800 graphics card? I didnt see that thrown into the mix? Reply
  • coldpower27 - Monday, September 6, 2004 - link

    Thank you Bloodshredder, yeh after reading a little about the Radeon LE, it's almost as good as a Radeon DDR, except with lower working frequencies.

    so if it's DDR then the correct no. are 148/296 and 32MB VRAM only.
    Reply
  • Bloodshedder - Monday, September 6, 2004 - link

    For the Radeon LE, I noticed a question mark next to the amount of RAM. I own one of these cards, and can confirm that 32MB DDR is the only configuration it comes in. Reply
  • Draven31 - Monday, September 6, 2004 - link

    You skipped which OpenGL version and features the various cards support... maybe add that when you add the various workstation cards to the listings... Reply
  • coldpower27 - Monday, September 6, 2004 - link


    Yeh, Nvidia learned it's lesson, last gen, with the 0.13 micron new at the time process delaying the introduction of the NV30, thy learned to play it safe using a tried and tested process is a good idea for such high complexity chips initially, though they of course plan to shift these chips to the 110nm process when the process matures enough, possibly on the NV48 and R480 hopefully allowing higher clocks in the process:D, maybe not for R480 unless low-k is ready for 110nm by that time.

    It does make more sense to use the newer manufacturing process to help save costs on the volume shipping GPU, as the cost savings will beaccumulated much better in the mainstream and value arena's thanks to sheer volume.

    We also see this with Intel, when Intel yields on the 90nm were only so so, they introduced Prescott up to 3.2GHZ in quanitity, but introduced their Pentium 4 3.4GHZ on the northwood core on 0.13 micron. Though over time Intel is making all efforts to transfer everything to 90nm, with Prescott and Prescott 2M w/1066FSB for EE Edition.
    Reply
  • JarredWalton - Monday, September 6, 2004 - link

    8 - Intel does this as well, testing a new process on their non-flagship parts. For example, after the launch of the P4, Intel piloted their 130 nm copper technology with the Tualatin CPU before releasing the Northwood. It probably has something to do with the amount of extra time a more complex design takes to test and verify. Reply
  • stephenbrooks - Monday, September 6, 2004 - link

    Interesting how on the die sizes chart, I notice they're phasing in the 110nm process only for their mid-range-ish cards and sticking to the tried and tested 130nm for the high-end one. I suppose you can't blame them for that really, given it's their flagship product and all, but it could contribute to the huge die sizes. Reply
  • JarredWalton - Monday, September 6, 2004 - link

    Thank, AtaStrumf - any errors in the numbers are ColdPower's fault. Heheheh. Really, he already caught a bunch of small mistakes, so hopefully the number of remaining errors is very small.

    For what it's worth, there are various versions of some of the chips that have different clock speeds and RAM speeds from what is listed. The models in the chart should reflect the most common configurations, though.

    BTW, the article text is now tweaked somewhat on the ATI and NVIDIA overview pages. Derek Wilson provided some additional insight on the subject of AA and AF that clarified things a little.
    Reply
  • JarredWalton - Monday, September 6, 2004 - link

    Argon was the name for the .25 micron K7, while Pluto and Orion were .18 micron.

    #2 and #4: I realize you're kidding, but in all seriousness we did think about including other architectures. With the broken features on some of the more recent cards and the lack of T&L on 3dfx and older cards, we just decided to stick with the two major players. And hey - it's all fair, as we didn't include Cyrix/Via or Transmeta processors in the CPU cheatsheet! ;)
    Reply
  • AtaStrumf - Monday, September 6, 2004 - link

    OMFG, this is awsome!!!! You really outdid youself this time! I have been collecting data on GPUs for quite a while and have been planing on making a spreadsheet just like the first two for my, so called, web site, but WAU, this rocks. Thanks for saving me a lot of work :)

    When I get the time, I'll check your munbers a bit, just to make sure there aren't any typos in there.
    Reply

Log in

Don't have an account? Sign up now