What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Friday, September 12, 2003 - link

    #61.. i take it YOU have the money to shell out for top of the line hardware ????????? i sure as hell don't, but like #42 said, " more widely used comp "

    i my self am running a 1700+ at 2400+ speeds, no way in hell am i gonna go spend the 930 bucks ( in cdn funds )on a 3.2c P4, thats NOT inc the mobo and ram, and i'm also not gonna spend the 700 cdn on a barton 3200+ either, for the price of the above P4 chip i can get a whole decient comp, may not be able to run halflife at its fullest, but still, i'm not even interested in HL2, it just not the kind of game i play, but if i was, whay i typed above, is still valid..


    anand... RUN THESE HL2 BENCHES ON HARDWARE THE AVERAGE PERSON CAN AFFORD !!!!!!!!!!!!!!!!!!!!!!!! not he spoiled rich kid crap .....
  • Anonymous User - Friday, September 12, 2003 - link

    #42 "...should have benchmarked on a more widely used computer like a 2400 or 2500+ AMD...":

    The use of 'outdated' hardware such as your 2400 AMD would have increased the possibility of cpu limitations taking over the benchmark. Historically all video card benchmarks have used the fastest (or near fastest) GPU available to ensure the GPU is able to operate in the best possible scenario. If you want to know how your 2400 will work with HL2, wait and buy it when it comes out.

    In reference to the 16/32 bit floating point shaders and how that applies to ATI's 24 bit shaders:

    It was my understanding that this quote was referencing the need for Nvidia to use it's 32 bit shaders as future support for its 16 bit shaders would not exist. I don't see this quote pertaining to ATI's 24 bit shaders as they meet the DX9 specs. The chance of future HL2 engine based games leaving ATI users out in the cold is somewhere between slim and none. For an example of how software vendor's react to leaving out support for a particular line of video card, simply look at how much work Valve put into making Nvidia's cards work. If it was feasible for a software vendor to leave out support for an entire line like your are refering to (ATI in your inference) we would have had HL2 shipping by now (for ATI only though...).
  • Anonymous User - Friday, September 12, 2003 - link

    58, http://myweb.cableone.net/jrose/Jeremy/HL2.jpg
  • Anonymous User - Friday, September 12, 2003 - link

    Are pixel shader operations anti-aliased on current generation video cards? I ask because in the latest Half Life 2 technology demo movie, anti-aliasing is enabled. Everything looks smooth except for the specular highlights on the roof and other areas, which are still full of shimmering effects. Just seems a little sore on the eyes.
  • Anonymous User - Friday, September 12, 2003 - link

    An observation:

    Brian Burke = Iraqi Information Officer

    I mean this guy rode 3dfx into the dirt nap and he's providing the same great service to Nvidia.

    Note to self: Never buy anything from a company that has this guy spewing lies.
  • Anonymous User - Friday, September 12, 2003 - link

    OK, this article was great.

    For us freaks, can you do a supplement article. Do 1600x1200 benchmarks!!!

    Things will probably crawl, but it would be nice to know that this should be the worst case at this resolution when ATI and NVidia come out with next gen cards.

    Also, was any testing done to see if the benchmarks were CPU or GPU limited? Maybe use the CPU utilization montior in Windows o see what the CPU thought. maybe a 5.0 GHz processor down the road will solve some headaches. Doubtful, but maybe....
  • Anonymous User - Friday, September 12, 2003 - link

    Whats really funny is that Maximum PC magazine built an $11000 "Dream Machine", using a GeforeFX 5900 and i can built a machine for less then $2000 and beat it using a 9800 pro.

    Long Live my 9500 pro!
  • Anonymous User - Friday, September 12, 2003 - link

    I can play Frozen Throne and I am doing so on a GeForce2MX LOL (on a P2@400mhz).
  • Anonymous User - Friday, September 12, 2003 - link

    look at my #46 posting - i know it's different engines, different API's, different driver revisions etc...
    but still it's interesting..

    enigma
  • Anonymous User - Friday, September 12, 2003 - link

    #52 different engines, different results. hl 2 is probably more shader limited than doom 3. The 9600pro has strong shader performance, which narrows the gap in shader limited situations such as hl 2.

    btw, where did you get those doom 3 results? Only doom 3 benches I know about are based off the old alpha or that invalid test from back when the nv35 was launched...

Log in

Don't have an account? Sign up now