Of Shader Details ...


One of the complaints with the NV3x architecture was its less than desirable shader performance. Code had to be well optimized for the architecture, and even then the improvement made to NVIDIA's shader compiler is the only reasons NV3x can compete with ATI's offerings.

There were a handful of little things that added up to hurt shader performance on NV3x, and it seems that NVIDIA has learned a great deal from its past. One of the main things that hurt NVIDIA's performance was that the front end of the shader pipe had a texture unit and a math unit, and instruction order made a huge difference. To fix this problem, NVIDIA added an extra math unit to the front of the vertex pipelines so that math and texture instructions no longer need to be interleaved as precisely as they had to be in NV3x. The added benefit is that twice the math throughput in NV40 means the performance of math intensive shaders approach a 2x gain per clock over NV3x (the ability to execute 2 instructions per clock per shader is called dual issue). Vertex units can still issue a texture command with a math command rather than two math commands. This flexibility and added power make it even easier to target with a compiler.

And then there's always register pressure. As anyone who has ever programmed on in x86 assembly will know, having a shortage of usable registers (storage slots) available to use makes it difficult to program efficiently. The specifications for shader model 3.0 bumps the number of temporary registers up to 32 from 13 in the vertex shader while still requiring at least 256 constant registers. In PS3.0, there are still 10 interpolated registers and 32 temp registers, but now there are 224 constant registers (up from 32). What this all adds up to mean is that developers can work more efficiently and work on large sets of data. This ends up being good for extending both the performance and the potential of shader programs.

There are 50% more vertex shader units bringing the total to 6, and there are 4 times as many pixel pipelines (16 units) in NV40. The chip was already large, so its not surprising that NVIDIA only doubled the number of texture units from 8 to 16 making this architecture 16x1 (whereas NV3x was 4x2). The architecture can handle 8x2 rendering for multitexture situations by using all 16 pixel shader units. In effect, the pixel shader throughput for multitextured situations is doubled, while single textured pixel throughput is quadrupled. Of course, this doesn't mean performance is always doubled or quadrupled, just that that's the upper bound on the theoretical maximum pixels per clock.

As if all this weren't enough, all the pixel pipes are dual issue (as with the vertex shader units) and coissue capable. DirectX 9 co-issue is the ability to execute two operations on different components of the same pixel at the same time. This means that (under the right conditions), both math units in a pixel pipe can be active at once, and two instructions can be run on different component data on a pixel in each unit. This gives a max of 4 instructions per clock per pixel pipe. Of course, how often this gets used remains to be seen.

On the texturing side of the pixel pipelines, we can get upto 16x anisotropic filtering with trilinear filtering (128 tap). We will take a look at anisotropic filtering in more depth a little later.

Theoretical maximums aside, all this adds up to a lot of extra power beyond what NV3x offered. The design is cleaner and more refined, and allows for much more flexibility and scalability. Since we "only" have 16 texture units coming out of the pipe, on older games it will be hard to get more than 2x performance per clock with NV40, but for newer games with single textured and pixel shaded rendering, we could see anywhere from 4x to 8x performance gain per clock cycle when compared to NV3x. Of course, NV38 is clocked about 18.8% faster than NV40. And performance isn't made by shaders alone. Filtering, texturing, antialising, and lots of other issues come into play. The only way we will be able to say how much faster NV40 is than NV38 will be (you guessed it) game performance tests. Don't worry, we'll get there. But first we need to check out the rest of the pipeline.
Power Requirements … And the Pipeline
Comments Locked

77 Comments

View All Comments

  • Reliant - Wednesday, April 14, 2004 - link

    Any ideas how the Non Ultra version will perform?
  • segagenesis - Wednesday, April 14, 2004 - link

    I cant agree with #45 more. People rush to judgement when its no secret that ATI will be coming out with thier goods very soon also. "Wow look this card is really fast!!! I cant believe it!" well this sounds like almost every other graphics card release from ATI or nVidia in the past. To me nVidia had better have come out with something good after ther lackluster Geforce FX 5800 wasnt anything terribly special. I used to like nVidia alot (heh my ti4600 still runs fine) but when it comes to looking for a new card, I'll pick whichever one is faster *and* has the features I want. If it wasnt for such turnoffs like the 2-slot design and now even 2 power connections required Im not sure I am ready to spend $500 just yet...

    Sorry if im obtuse but if ATI comes out with a part thats either equal (note the key term there) in performance or maybe even slightly slower... I'd go for ATI and thier better IQ that the Radeon 9700 series so impressed me on and made me wish for more out of my ti4600. That and a single slot/single power type design would probably put me in thier boat.

    Fanboy ATI opinion? I've owned nVidia from the Riva TNT to the ti4600 and many in-between.
  • Lonyo - Wednesday, April 14, 2004 - link

    #42, the jump from the Ti4600 to the 9700Pro wasn't good for you? I woul dhave thought finally playable AA/AF was quite a jump.
    Personally, it seems less of a jump than the 4600 -> 9700.


    And I will reserve judgement on how much of an accomplishment nVidia have made until I see what ATi release.
    If it's of similar power, but maybe has 1 molex, or is a single slot solution, they will have accomplished more.
    It's not just raw performance, we'll have to see how it all stacks up, and how long it takes to release the things!
  • ChronoReverse - Wednesday, April 14, 2004 - link

    Some site tested the 6800U on a 350W supply and it worked just fine.


    Myself, I think my Enermax 350W with its enhanced 12V rail will take it just fine as well.
  • Regs - Wednesday, April 14, 2004 - link

    Yeah, Nvidia did make one hell of an accomplishment. They just earned a lot of respect back from both fan clubs. You have to respect the development and research that went into this card and the end result turns out to be just as we anticipated if not more.

    I really don't know how anybody could pick a "club" when seeing hardware like this perform so well.

    Im hoping to see the same results from ATI.

    Just too bad they are some costly pieces of hardware ;)
  • araczynski - Wednesday, April 14, 2004 - link

    nice to FINALLY see a universally quantifiable performance increase from one generation to the next.

    but the important thing is how it competes with the x800 from ati, not against older cards.

    as for the power supply, i think the hardcore crowd that these are geared at already have more then enough power, and quite frankly i would be suprised if these woudln't work fine on a solid 350W from a reputable source (i.e. not your 350W ps for $10 from some 'special' sale).

    They're being conservative knowing that many of the people have crappy power supplies and don't know better.
  • klah - Wednesday, April 14, 2004 - link

    "Anyone know when it ships to retail stores?"

    http://www.eetimes.com/semi/news/showArticle.jhtml...

    "GeForce 6800 Ultra and GeForce 6800 models, are currently shipping to add-in-card partners, OEMs, system builders and game developers. Retail graphics boards based on the GeForce 6800 models are scheduled for release in the next 45 days."
  • Jeff7181 - Wednesday, April 14, 2004 - link

    This has me a bit curious... maybe I didn't read close enough... but is this the 6800 or the 6800 Ultra?
  • saechaka - Wednesday, April 14, 2004 - link

    wow impressive. i really want one. wonder if it will run ok with my 380w powersupply
  • Cygni - Wednesday, April 14, 2004 - link

    Personally, im very impressed, and i havent had an Nvidia product in my main gaming rig since my Geforce256. The card may be huge, power hungry, hot, and loud (maybe), but that is some SERIOUS performance.

    How long has it been since Nvidia has had a top end card that DOUBLED the performance of the last top end card? Pretty awesome, I think. I dont have the money to pick one up, but hopefully the mid/low end gets some love from both ATI and Nvidia as well. The 9200/9600/5200/5600 dont really appeal to me... not enough of a performance leap over a $20 8500!

Log in

Don't have an account? Sign up now