What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • dvinnen - Friday, September 12, 2003 - link

    #31: I know what I said. DX9 dosen't require 32 bit. It's not in the spec so you couldn't write shader that uses more than 24bit percision.
  • XPgeek - Friday, September 12, 2003 - link

    Well #26, if the next gen of games do need 32 bit precision, then the tides will once again be turned. and all these "my ATi is so faster than for nVidia" will have to just suck it up and buy another new card, whereas the GFFX's will still be plugging along. by then, who knows, maybe DX10 will support 32 bit precision on the nVidia cards better...
    btw, im still loading down my GF3 Ti500. so regardless, i will have crappy perf. but i also buy cards from the company i like, that being Gainward/Cardex nVidia based boards. no ATi for me, also no Intel for me. Why? bcuz its my choice. so it may be slower, whoopty-doo!

    for all i know, HL2 could run for crap on AMD CPUs as well. so i'll be in good shape then with my XP2400+ and GF3

    sorry, i know my opinions dont matter, but i put em here anyhow.

    buy what you like, dont just follow the herd... unless you like having your face in everyones ass.
  • Anonymous User - Friday, September 12, 2003 - link

    #28 Not 24bit, 32 bit.
  • Anonymous User - Friday, September 12, 2003 - link

    Yeah, like mentioned above, what about whether or not AA and AF were turned on in these tests? Do you talk about it somewhere in your article?

    I can't believe it's not mentioned since this site was the one that make a detailed (and excellent) presentation of the differences b/w ati and nvdia's AA and AF back in the day.

    Strange your benchmarks appear to be silent on the matter. I assume they were both turned off.
  • Anonymous User - Friday, September 12, 2003 - link

    >>thus need full 32-bit precision."<<

    Huh? Wha?

    This is an interesting can of worms. So in the future months time, if ATI stick to 24bit, or cannot develop 32 bit precision, the tables will have reversed on the current situation - but even moreso because there would not be a work around (Or optimization).

    Will ATI users in the future accuse Valve of sleeping with Nvidia because their cards cannot shade with 32-bit precision?

    Will Nvidia users claim that ATI users are "non-compliant with directX 9"? Will ATI users respond that 24bit precision is the only acceptable standard Direct 9 standard, and that Valve are traitors?

    Will Microsoft actually force manufacturers to bloody well wait and force them to follow the standard.

    And finally, who did shoot Colonel Mustard in the Dining Room?

    Questions, Questions.
  • dvinnen - Friday, September 12, 2003 - link

    #26: It means it can't cheat and use 16 bit registries to do it and need a full 24bit. SO it would waste the rest of the registry
  • Anonymous User - Friday, September 12, 2003 - link

    #26 That was in reference to the fx cards. They can do 16 or 32 bit precision. Ati cards do 24 bit precision, which is the dx 9 standard.

    24 bit is the dx 9 standard because it's "good enough." It's much faster than 32 bit, and much better looking then 16 bit. So 16 bit will wear out sooner. Of course, someday 24 bit won't be enough, either, but there's no way of knowing when that'll be.
  • Anonymous User - Friday, September 12, 2003 - link

    Valve says no benchmarks on Athlon 64! :-/
    Booo!

    Quote:
    http://www.tomshardware.com/business/20030911/inde...
    "Valve was able to heavily increase the performance of the NVIDIA cards with the optimized path but Valve warns that such optimizations won't be possible in future titles, because future shaders will be more complex and will thus need full 32-bit precision."

    The new ATI cards only have 24bit shaders!
    So would that make ALL current ATI cards without any way to run future Valve titles?

    Perhaps I do not understand the technology fully, can someone elaborate on this?
  • Anonymous User - Friday, September 12, 2003 - link

    I agree with #23 in terms of money making power the ATI/Valve combo is astounding. ATI's design is superior as we can see but the point is that ATI is going to get truckloads of money and recognition for this. Its a good day to have stock in ATI, lets all thank them for buying ArtX!
  • Anonymous User - Friday, September 12, 2003 - link

    I emailed gabe about my 9600 pro, but he didnt have to do all this just for me :D

    I love it.

Log in

Don't have an account? Sign up now