Of Shader Details ...


One of the complaints with the NV3x architecture was its less than desirable shader performance. Code had to be well optimized for the architecture, and even then the improvement made to NVIDIA's shader compiler is the only reasons NV3x can compete with ATI's offerings.

There were a handful of little things that added up to hurt shader performance on NV3x, and it seems that NVIDIA has learned a great deal from its past. One of the main things that hurt NVIDIA's performance was that the front end of the shader pipe had a texture unit and a math unit, and instruction order made a huge difference. To fix this problem, NVIDIA added an extra math unit to the front of the vertex pipelines so that math and texture instructions no longer need to be interleaved as precisely as they had to be in NV3x. The added benefit is that twice the math throughput in NV40 means the performance of math intensive shaders approach a 2x gain per clock over NV3x (the ability to execute 2 instructions per clock per shader is called dual issue). Vertex units can still issue a texture command with a math command rather than two math commands. This flexibility and added power make it even easier to target with a compiler.

And then there's always register pressure. As anyone who has ever programmed on in x86 assembly will know, having a shortage of usable registers (storage slots) available to use makes it difficult to program efficiently. The specifications for shader model 3.0 bumps the number of temporary registers up to 32 from 13 in the vertex shader while still requiring at least 256 constant registers. In PS3.0, there are still 10 interpolated registers and 32 temp registers, but now there are 224 constant registers (up from 32). What this all adds up to mean is that developers can work more efficiently and work on large sets of data. This ends up being good for extending both the performance and the potential of shader programs.

There are 50% more vertex shader units bringing the total to 6, and there are 4 times as many pixel pipelines (16 units) in NV40. The chip was already large, so its not surprising that NVIDIA only doubled the number of texture units from 8 to 16 making this architecture 16x1 (whereas NV3x was 4x2). The architecture can handle 8x2 rendering for multitexture situations by using all 16 pixel shader units. In effect, the pixel shader throughput for multitextured situations is doubled, while single textured pixel throughput is quadrupled. Of course, this doesn't mean performance is always doubled or quadrupled, just that that's the upper bound on the theoretical maximum pixels per clock.

As if all this weren't enough, all the pixel pipes are dual issue (as with the vertex shader units) and coissue capable. DirectX 9 co-issue is the ability to execute two operations on different components of the same pixel at the same time. This means that (under the right conditions), both math units in a pixel pipe can be active at once, and two instructions can be run on different component data on a pixel in each unit. This gives a max of 4 instructions per clock per pixel pipe. Of course, how often this gets used remains to be seen.

On the texturing side of the pixel pipelines, we can get upto 16x anisotropic filtering with trilinear filtering (128 tap). We will take a look at anisotropic filtering in more depth a little later.

Theoretical maximums aside, all this adds up to a lot of extra power beyond what NV3x offered. The design is cleaner and more refined, and allows for much more flexibility and scalability. Since we "only" have 16 texture units coming out of the pipe, on older games it will be hard to get more than 2x performance per clock with NV40, but for newer games with single textured and pixel shaded rendering, we could see anywhere from 4x to 8x performance gain per clock cycle when compared to NV3x. Of course, NV38 is clocked about 18.8% faster than NV40. And performance isn't made by shaders alone. Filtering, texturing, antialising, and lots of other issues come into play. The only way we will be able to say how much faster NV40 is than NV38 will be (you guessed it) game performance tests. Don't worry, we'll get there. But first we need to check out the rest of the pipeline.
Power Requirements … And the Pipeline
Comments Locked

77 Comments

View All Comments

  • Pete - Monday, April 19, 2004 - link

    Shinei,

    I did not know that. </Johnny Carson>

    Derek,

    I think it'd be very helpful if you listed the game version (you know, what patches have been applied) and map tested, for easier reference. I don't even think you mentioned the driver version used on each card, quite important given the constant updates and fixes.

    Something to think about ahead of the X800 deadline. :)
  • zakath - Friday, April 16, 2004 - link

    I've seen a lot of comments on the cost of these next-gen cards. This shouldn't surprise anyone...it has always been this way. The market for these new parts is small to begin with. The best thing the next gen does for the vast majority of us non-fanbois-who-have-to-have-the-bleeding-edge-part is that it brings *todays* cutting edge parts into the realm of affordability.
  • Serp86 - Friday, April 16, 2004 - link

    Bah! My almost 2 year old 9700pro is good enough for me now. i think i'll wait for nv50/r500....

    Also, a better investment for me is to get a new monitor since the 17" one i have only supports 1280x1024 and i never turn it that high since the 60hz refresh rate makes me go crazy
  • Wwhat - Friday, April 16, 2004 - link

    that was to brickster, neglected to mention that
  • Wwhat - Friday, April 16, 2004 - link

    Yes you are alone
  • ChronoReverse - Thursday, April 15, 2004 - link

    Ahem, this card has been tested by some people with a high-quality 350W power supply and it was just fine.


    Considering that anyone who could afford a 6800U would have a good powersupply (Thermaltake, Antec or Enermax), it really doesn't matter.


    The 6800NU uses only one molex.
  • deathwalker - Thursday, April 15, 2004 - link

    Oh my god...$400 and u cant even put it in 75% of the systems on peoples desks today without buying a new power supply at a cost of nearly another $100 for a quailty PS...i think this just about has to push all the fanatics out there over the limit...no way in hell your going to notice the perform improvement in a multiplayer game over a network..when does this maddness stop.
  • Justsomeguy21 - Monday, November 29, 2021 - link

    LOL, this was too funny to read. Complaining about a bleeding edge graphics card costing $400 is utterly ridiculous in the year 2021 (almost 2022). You can barely get a midrange card for that price and that's assuming you're paying MSRP and not scalper prices. 2004 was a great year for PC gaming, granted today's smartphones can run circles around a Geforce 6800 Ultra but for the time PC hardware was being pushed to the limits and games like Doom 3, Far Cry and Half Life 2 felt so nextgen that console games wouldn't catch up for a few years.
  • deathwalker - Thursday, April 15, 2004 - link

  • Shinei - Thursday, April 15, 2004 - link

    Pete, MP2 DOES use DX9 effects, mirrors are disabled unless you have a PS2.0-capable card. I'm not sure why, since AvP1 (a DX7 game) had mirrors, but it does nontheless. I should know, since my Ti4200 (DX8.1 compatible) doesn't render mirrors as reflective even though I checked the box in the options menu to enable them...
    Besides, it does have some nice graphics that can bog a card down at higher resolutions/AA settings. I'd love to see what the game looks like at 2048x1536 with 4xAA and maxed AF with a triple buffer... Or even a more comfortable 1600x1200 with same graphical settings. :D

Log in

Don't have an account? Sign up now