Final Words

In talking about pure pixel drawing power, NV35 and NV38 didn't have it too bad as their clock speed helped push fill rate up to 1800 and 1900 Mpixels/s at their theoretic peaks. This number is simply a multiplication of how many pixels can be drawn at a time and clock speed. The NV3x architecture could also push twice as many textured pixels (if multitextureing was employed) or twice as many z / stencil operations as pixels. The problems with performance in NV3x didn't come in theoretical maximum limitations, but rather in not being able to come anywhere near theoretical maximums in the real world due to all the issues we have explored in addition to a couple other caveats. Here's a brief rundown of the bottlenecks.

If a game uses single textures rather than multitextures, texture rate is automatically cut in half. If very complex vertex or pixel shaders are used, multiple clock cycles can be spent per pixel without drawing anything. This is heavily affected both by how many pixels we can be working on at one time, as well as how able the shaders are to handle common shader code. Enabling antialiasing incurs a performance hit, as does trilinear and anisotropic filtering. There will always be some overdraw (pixels being drawn on top of other pixels), which also wastes time. This all translates into a good amount of time spent not drawing pixels on an architecture without a lot of leeway for this.

In moving to NV40 there were lots of evolutionary fixes that really helped bring the performance of the architecture up. The most significant improvements were touched on earlier: the quadrupling of the pixel pipes while doubling the number of texture units (creating a 16x1 architecture), and increasing the number of vertex shader units while adding a second math unit and more registers to pixel shaders to avoid scheduling issues. Further improvements in NV40 were made to the help eliminate hidden pixels earlier in the pipeline at the vertex shaders (which helps keep from doing unnecessary work), and optimizations were made to the anisotropic filtering engine to match ATI's method of doing things with approximated (rather than actual) distances.

In the end, it wasn't the architecture of the NV3x GPU that was flawed, but rather an accumulation of an unfortunate number smaller issues that held the architecture back from its potential.

It is important to take away from this that NV40 is very evolutionary, and that NVIDIA were pushing very hard to make this next step an incredible leap in performance. In order to do so, they have had to squeeze 222 million transistors on something in the neighborhood of a 300mm^2 die. By its very nature, graphics rendering is infinitely parallelizable, and NVIDIA has taken clear advantage of this, but it has certainly come at a cost. They needed the performance leap, and now they will be in a tough position when it comes to making money on this chip. Yields will be lower than NV3x, but retail prices are not going to move beyond the $500 mark.

On a side note, ATI will most likely not be this aggressive with their new chip. The performance of the R300 was very good with what we have seen of current games, and they just won't need to push as hard as NVIDIA did this time around. Their architecture was already well suited to the current climate, and we can expect small refinements as well as an increase in the width of their pixel pipe (which also looks like it will be 16x1). Of course, ATI's performance this time around will be increased, but just how much will have to remain a mystery for a couple more weeks.
Shedding Light On Shader Performance
Comments Locked

18 Comments

View All Comments

  • PrinceGaz - Tuesday, April 20, 2004 - link

    Maybe they've been saving this article up until after the NV40 was launched? I must admit I subconciously substituted postmortem myself when reading it.

    Well put together article Derek, it clearly explains some of the problems that led to the entire GeForce FX (NV3x) range of cards always being in second-place behind equivalent R3xx cards (the NV34 core FX5200 had no R3xx based competitor).

    I'm surprised no mention was made of the FP16/FP32 performance issue with the NV3x core and the consequences that is having on image quality, and also how its inferior ordered-grid anti-aliasing couldn't compare with the rotated-grid used in the R3xx.
  • Phiro - Tuesday, April 20, 2004 - link

    wahahahaa a "moratorium" hahaha

    You guys need to quit looking up big words that sound like the word you really mean to use, and actually grow some vocabulary.

    How about "postmortem"? Do you even know what the word "moratorium" means?
  • Cybercat - Tuesday, April 20, 2004 - link

    Nice work, you made it user friendly, so even I can understand it! lol, when it comes to explaining graphics architectures to me, that's no small feat.
  • newuser12 - Monday, April 19, 2004 - link

    this article is depressing, being as I have an mx-440....... :(
  • ZobarStyl - Monday, April 19, 2004 - link

    By this hardware I meant the x800 sorry.
  • ZobarStyl - Monday, April 19, 2004 - link

    Thank you, Derek, for politely avoiding going into the current speculation war about the new hardware...frankly no one but ATI knows how this hardware is going to hold up against the 6800, and it's sad that people on both sides have already pronounced winners and losers. A good article; it sheds light on these dark times for nVidia. I'm no fanboy...frankly I'm most swayed by their quality driver support rather than the sheer speed factor.
  • Modal - Monday, April 19, 2004 - link

    Thanks for this elucidating article; I find articles of this type (that is "this is what your hardware is doing and why") very interesting.
  • Regs - Monday, April 19, 2004 - link

    lol. I just find it funny you wait until now to write this article. But then again, you would likely have a better understanding when we could compare it to the NV40.

Log in

Don't have an account? Sign up now