Final Words

In talking about pure pixel drawing power, NV35 and NV38 didn't have it too bad as their clock speed helped push fill rate up to 1800 and 1900 Mpixels/s at their theoretic peaks. This number is simply a multiplication of how many pixels can be drawn at a time and clock speed. The NV3x architecture could also push twice as many textured pixels (if multitextureing was employed) or twice as many z / stencil operations as pixels. The problems with performance in NV3x didn't come in theoretical maximum limitations, but rather in not being able to come anywhere near theoretical maximums in the real world due to all the issues we have explored in addition to a couple other caveats. Here's a brief rundown of the bottlenecks.

If a game uses single textures rather than multitextures, texture rate is automatically cut in half. If very complex vertex or pixel shaders are used, multiple clock cycles can be spent per pixel without drawing anything. This is heavily affected both by how many pixels we can be working on at one time, as well as how able the shaders are to handle common shader code. Enabling antialiasing incurs a performance hit, as does trilinear and anisotropic filtering. There will always be some overdraw (pixels being drawn on top of other pixels), which also wastes time. This all translates into a good amount of time spent not drawing pixels on an architecture without a lot of leeway for this.

In moving to NV40 there were lots of evolutionary fixes that really helped bring the performance of the architecture up. The most significant improvements were touched on earlier: the quadrupling of the pixel pipes while doubling the number of texture units (creating a 16x1 architecture), and increasing the number of vertex shader units while adding a second math unit and more registers to pixel shaders to avoid scheduling issues. Further improvements in NV40 were made to the help eliminate hidden pixels earlier in the pipeline at the vertex shaders (which helps keep from doing unnecessary work), and optimizations were made to the anisotropic filtering engine to match ATI's method of doing things with approximated (rather than actual) distances.

In the end, it wasn't the architecture of the NV3x GPU that was flawed, but rather an accumulation of an unfortunate number smaller issues that held the architecture back from its potential.

It is important to take away from this that NV40 is very evolutionary, and that NVIDIA were pushing very hard to make this next step an incredible leap in performance. In order to do so, they have had to squeeze 222 million transistors on something in the neighborhood of a 300mm^2 die. By its very nature, graphics rendering is infinitely parallelizable, and NVIDIA has taken clear advantage of this, but it has certainly come at a cost. They needed the performance leap, and now they will be in a tough position when it comes to making money on this chip. Yields will be lower than NV3x, but retail prices are not going to move beyond the $500 mark.

On a side note, ATI will most likely not be this aggressive with their new chip. The performance of the R300 was very good with what we have seen of current games, and they just won't need to push as hard as NVIDIA did this time around. Their architecture was already well suited to the current climate, and we can expect small refinements as well as an increase in the width of their pixel pipe (which also looks like it will be 16x1). Of course, ATI's performance this time around will be increased, but just how much will have to remain a mystery for a couple more weeks.
Shedding Light On Shader Performance
Comments Locked

18 Comments

View All Comments

  • WizzBall - Tuesday, May 4, 2004 - link

    Nice article... sooo, when are we going to see the follow-up to this now that ATI came forward with their cards ? May I suggest 'What went wrong with NV4.x' ? :D
  • TrogdorJW - Tuesday, April 27, 2004 - link

    Hey... anyone else having password issues, or is that something my company network admins f'ed up? I keep entering my password, but it doesn't get remembered. Ugh....
  • TrogdorJW - Tuesday, April 27, 2004 - link

    Personally, I think it's all about the alliteration: "The Pixel Pipe Performance Picture!" :)

    Anyway, I imagine the moratorium will end once the R420 is released and we can talk about all four chips (R3xx, R4xx, NV3x, and NV4x), right? Yeah, that's it....

    On a side note, I wonder how much going from FP24 to FP32 would cost ATI in terms of transistors, not to mention the Shader Model 3.0 stuff. It's not that we really need it, but going from 24-bit to 32-bit color basically makes everthing that operates on the data 25% larger in terms of transistor usage. Add in the other missing SM3.0 features, and I think a 160-180 million transistor R420 would suddenly become a 222 million transistor NV40. Basically, I think performance from the next generation cards will be about the same given the same GPU/VPU and RAM speeds. The only difference will be that NV4x has SM3.0 support, which looks to be a marketing point more than anything.
  • greendonuts3 - Thursday, April 22, 2004 - link

    that's "Post-Mortem", as in "Post-Mortem Analysis" as in "Autopsy," not "Moratorium," as in "banzored."

    Thank you very much.
    And DON'T forget to hyphenate "Post-Mortem."
    "Post Mortem" means "dead letter" or some such.
  • ianmills - Thursday, April 22, 2004 - link

    this article is crap. The real reason NV30 sucked is because nvidia slept with 3Dfx and got caught pixel herpes.
  • TauCeti - Wednesday, April 21, 2004 - link

    Moratorium:
    > We have stopped -- we are done with NV3x analysis.

    Well, if you have _stopped_ writing NV3X-content, it is _not_ a moratorium.

    After a moratorium ends, you are obliged to continue with your _suspended_ activity.

    Besides that: good article ;)


  • DerekWilson - Wednesday, April 21, 2004 - link

    #11:

    We have stopped -- we are done with NV3x analysis. I'll admit that the title could have been phrased a bit better, but we did mean moratorium... Of all the articles I have written I think I've gotten the highest volume of emails on this one -- to tell me that I don't know what moratorium means ;-)

    But on topic ... The big problem with an article like this (or any architectural or deeply technical article) is balancing depth, clarity, and length.

    If you guys have any suggestions on balancing these aspects in another way, please let us know. We want to write the articles that you want to read!
  • GomezAddams - Wednesday, April 21, 2004 - link

    "Can it not be a moratorium on NV3x articles?"

    It will be when you stop writing them. ;)

    I thought it was a pretty decent article too. I am looking forward to one that compares ATIs next contestant on these issues.

    Personally, I can handle a lot more detail but I would prefer not to spend so much time reading articles. :)
  • Phiro - Wednesday, April 21, 2004 - link

    My earlier outburst aside, it's a very good article.
  • DerekWilson - Wednesday, April 21, 2004 - link

    Can it not be a moratorium on NV3x articles? I thought it was funny ;-)

    fp16 vs fp32 and image quality is a very tough nut to crack. there are a lot of things going on on the side of compiler optimizations that we really need to look into in order to understand what's going on.

    also, rotated vs. ordered grid has no performance difference. or it shouldn't anyway. we wanted to focus on performance in this article.

Log in

Don't have an account? Sign up now