Of Shader Details ...


One of the complaints with the NV3x architecture was its less than desirable shader performance. Code had to be well optimized for the architecture, and even then the improvement made to NVIDIA's shader compiler is the only reasons NV3x can compete with ATI's offerings.

There were a handful of little things that added up to hurt shader performance on NV3x, and it seems that NVIDIA has learned a great deal from its past. One of the main things that hurt NVIDIA's performance was that the front end of the shader pipe had a texture unit and a math unit, and instruction order made a huge difference. To fix this problem, NVIDIA added an extra math unit to the front of the vertex pipelines so that math and texture instructions no longer need to be interleaved as precisely as they had to be in NV3x. The added benefit is that twice the math throughput in NV40 means the performance of math intensive shaders approach a 2x gain per clock over NV3x (the ability to execute 2 instructions per clock per shader is called dual issue). Vertex units can still issue a texture command with a math command rather than two math commands. This flexibility and added power make it even easier to target with a compiler.

And then there's always register pressure. As anyone who has ever programmed on in x86 assembly will know, having a shortage of usable registers (storage slots) available to use makes it difficult to program efficiently. The specifications for shader model 3.0 bumps the number of temporary registers up to 32 from 13 in the vertex shader while still requiring at least 256 constant registers. In PS3.0, there are still 10 interpolated registers and 32 temp registers, but now there are 224 constant registers (up from 32). What this all adds up to mean is that developers can work more efficiently and work on large sets of data. This ends up being good for extending both the performance and the potential of shader programs.

There are 50% more vertex shader units bringing the total to 6, and there are 4 times as many pixel pipelines (16 units) in NV40. The chip was already large, so its not surprising that NVIDIA only doubled the number of texture units from 8 to 16 making this architecture 16x1 (whereas NV3x was 4x2). The architecture can handle 8x2 rendering for multitexture situations by using all 16 pixel shader units. In effect, the pixel shader throughput for multitextured situations is doubled, while single textured pixel throughput is quadrupled. Of course, this doesn't mean performance is always doubled or quadrupled, just that that's the upper bound on the theoretical maximum pixels per clock.

As if all this weren't enough, all the pixel pipes are dual issue (as with the vertex shader units) and coissue capable. DirectX 9 co-issue is the ability to execute two operations on different components of the same pixel at the same time. This means that (under the right conditions), both math units in a pixel pipe can be active at once, and two instructions can be run on different component data on a pixel in each unit. This gives a max of 4 instructions per clock per pixel pipe. Of course, how often this gets used remains to be seen.

On the texturing side of the pixel pipelines, we can get upto 16x anisotropic filtering with trilinear filtering (128 tap). We will take a look at anisotropic filtering in more depth a little later.

Theoretical maximums aside, all this adds up to a lot of extra power beyond what NV3x offered. The design is cleaner and more refined, and allows for much more flexibility and scalability. Since we "only" have 16 texture units coming out of the pipe, on older games it will be hard to get more than 2x performance per clock with NV40, but for newer games with single textured and pixel shaded rendering, we could see anywhere from 4x to 8x performance gain per clock cycle when compared to NV3x. Of course, NV38 is clocked about 18.8% faster than NV40. And performance isn't made by shaders alone. Filtering, texturing, antialising, and lots of other issues come into play. The only way we will be able to say how much faster NV40 is than NV38 will be (you guessed it) game performance tests. Don't worry, we'll get there. But first we need to check out the rest of the pipeline.
Power Requirements … And the Pipeline
Comments Locked

77 Comments

View All Comments

  • Deanz79 - Wednesday, April 14, 2004 - link

    AWESOME!! I wouldnt mind borrowing the card for the night :P
  • TrogdorJW - Wednesday, April 14, 2004 - link

    There are a few things I take away from all the previews of the 6800 Ultra.

    One is that ATI is going to be hard pressed to actually top it. Both will have 16x1 designs, but I don't think ATI will have the 32x0 option, which might be important for games with lots of shadows. (I believe the ATI cards are going to be around 180 million transistors, which leads me to believe that they will not have quite as many features.) I also doubt that ATI will actually support fp32 this time around, which aces DX9.0c/PS3 support from them. That may or may not really matter.

    The next thing is sort of related to the first point: Nvidia now has more features that ATI, but there are still some bugs to work out. DX9 games that were optimized for NV3x seem to be dropping quality on the 6800U. Hopefully the fix to use fp32 instead of fp16 will be both easy and not result in a major performance drop. We'll have to wait and see, though. Other sites have shown quite a few areas that need driver revs, but that's nothing new. At least with NVidia, I feel confident their driver team will fix any major issues and probably increase performance a decent amount as well.

    I also agree with someone else that said the previews might be lower clocked than the final release. First, the RAM is spec'ed for 600 MHz, which makes it odd that they're running at 550 MHz. They may not hit 600, but 575 or maybe 585 seems likely (or at the very least that should be an OC'ing option). The core is currently at 400 MHz, and I think they might be able to bump that up a bit more, but 222 million transistors at .13 micron might not go much higher. We'll have to see what some of the shipping cards from GB, A-bit, Asus, etc. offer in terms of OC'ing headroom, as they might offer better cooling solutions.

    Related to the heat and clockspeed, I'm a little shocked at the heatsink/fan design. If they're going to all the trouble of having a huge HSF, I can't see any reason to not switch the direction it blows and have the Ultra version vent the hot air outside the case. Maybe noise was the reason, or component placement, but I would really prefer to have anything that size making use of external venting. It would be like having your power supply sucking air into the case instead of blowing out... Sure, it might cool the PS better, but the case temp would jump dramatically.

    My final thought is that it will be very interesting to see what sort of price and performance can be had from the regular 6800 cards, and even the 6800XT. I didn't think there would be a "soft-mod" option for Nvidia this round, but it appears I was wrong. Unless NVidia has some way of preventing this from being done. Regardless, if the 6800U is going to start at $500 and the 6800 will go for $300, we could be looking at a 6800XT for $200 or so. It should also have at least the performance of the 5950U, and most likely better.

    Incidentally, I'm betting the mid-range cards (i.e. 6500 or 6600 or whatever) will not really be that great, though, as they'll likely trim them down to 2 or 4 vertex pipelines and 4 or 8 pixel pipelines, so they'll end up looking like something inbetween the 5700U and the 5900XT. And don't look for help from ATI here, as the X300 and X600 look to be renamed 9600SE and 9600XT parts, respectively (a la the Radeon 9000 to 9200 line).
  • IamTHEsnake - Wednesday, April 14, 2004 - link

    Whoops! The Radeon 9800 xt only scored 6138 while NV40 scored 12350+ in 3DMark'03. That Ladies and Gentlemen is 2x as many points!
  • IamTHEsnake - Wednesday, April 14, 2004 - link

    Wow I read the review and all I can say is WoW. I read somewhere else that this card scored 12250+ in 3dMark'03 while the 9800 xt scored 8350 on the same system, same set-up. From one generation to the next 33% increase is not bad. not bad at all.


    Come on ATi! I'm rootin' for you!!!
  • Schadenfroh - Wednesday, April 14, 2004 - link

    what mobo and mobo drivers were used? i hear that the nforce2 provides an unfair performance advantage for nvidia, even tho the ati should run at the same speed as on a differant motherboard, nvidia just gets an extra boost
  • Warder45 - Wednesday, April 14, 2004 - link

    I want to see the multimedia bench's. Hopefully another article with AMD vs Intel.
  • AlexWade - Wednesday, April 14, 2004 - link

    The thing is freakin' huge! I'm willing to bet dollars-to-doughnuts that ATI's new card isn't the size of a football. Even if this huge beast tops in performance, the extra 20 pounds rules out LAN parties.

    I'll admit, the performance is great. But if ATI is smaller and performs near, or slightly below, then that is the one to buy.
  • AlexWade - Wednesday, April 14, 2004 - link

    The thing is freakin' huge! I'm willing to bet dollars-to-doughnuts that ATI's new card isn't the size of a football. Even if this huge beast tops in performance, the extra 20 pounds rules out LAN parties.

    I'll admit, the performance is great. But if ATI is smaller and performs near, or slightly below, then that is the one to buy.
  • Reflex - Wednesday, April 14, 2004 - link

    Personally I'll wait to see the budget line on these, I refuse to spend more than $200 on a video card. Chances are I'll end up going Ati however, the 2D video quality is just noticably better, and most of my time on my PC is spent reading, not gaming.

    Oh well, at least the gamers can be happy again. Too bad the AGP slot is not at the bottom of the motherboard, could build some interesting external vented cases if the card could stick that fan outside the case. ;)
  • Reliant - Wednesday, April 14, 2004 - link

    Any ideas how the Non Ultra version will perform?

Log in

Don't have an account? Sign up now