What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Friday, September 12, 2003 - link

    I think the insinuation is clear from that nVidia email posted and Gabe's comments. Valve believed nVidia was trying to "cheat" with their D50s by intentionally having fog disabled etc. Rather than toss around accusations, it was simpler for them to just require that the benchmarks at this point be run with released drivers and avoid the issue of currently bugged drivers with non-working features, whether the reason was accidental or intentional.

    Considering that the FXes fared poorly with 3DMark and again with HL2 - both using DX9 implementations, I think it might be fair to say that the FXes aren't going to do too much better in the future. Especially considering the way they reacted to 3DMark 03 - fighting the benchmark rather than releasing drivers to remedy the performance issue.

    I'd like to see how the FXes do running HL2 with pure DX8 rather than DX9 or a hybrid, as I think most people owning current nVidia cards are going to have to go that route to achieve the framerates desired.
  • Anonymous User - Friday, September 12, 2003 - link

    I dont see how the minimum requirements set but valve are going to play this game. 700mhz and a TNT2. The FX5200's could barely keep up.
  • Anonymous User - Friday, September 12, 2003 - link

    #68: 33 fps * 1.73 = 57.09 fps (add the one to account for the intial 33 score).

    This doesn't quite work out based on the 57.3 score of the 9800 Pro so corrected score on the Nvidia was probably closer to this:
    57.3 / 1.73 = 33.12 fps

    #69: I would definitely try to find a 9600 Pro before I bought a 9500 Pro. The 9600 fully supports DX9 whereas the 9500 does not.
  • Anonymous User - Friday, September 12, 2003 - link

    Guess Its time to upgrade...
    Now where's my &*&%%'n wallet!!


    Wonder where I'll be able to find a R9500Pro (Sapphire)
  • Anonymous User - Friday, September 12, 2003 - link

    The performance increase between the FX5900 and Rad9800Pro is not 73%. Do the math correctly and it turns into 36.5% lead. The article should be revised.
  • atlr - Friday, September 12, 2003 - link

    If anyone sees benchmarks for 1 GHz computers, please post a URL. Thanks.
  • WooDaddy - Friday, September 12, 2003 - link

    Hmmm... I understand that Nvidia would be upset. But it's not like ATI is using a special setting to run faster. They're using DX9.. Nvidia needs to get on the ball. I'm going to have to upgrade my video card since I have a now obsolete Ti4200 GF4.

    GET IT TOGETHER NVIDIA!!! DON'T MAKE ME BUY ATI!

    I might just sell my Nvidia stock while I'm at it. HL2 is a big mover and I believe can make or break the card on the consumer side.
  • Anonymous User - Friday, September 12, 2003 - link

    I had just ordered a 5600 Ultra thinking it would be a great card. It's going back.

    If I can get full DX 9 performance with a 9600 Pro for around $180, and that card's performance is better than the 5900 Ultra - then I'm game.

    I bought a TNT when Nvidia was making a name for it's self. I bought a GF2 GTS when Nvida was destroying the 3dfx - now Nvidia seems to have droped the ball on DX9. I want to play HL2 on what ever card I buy. A 5600 ultra won't seem to cut it. I know the 50's are out there, but I've seen the Aquamark comparision with the 45's and 50's and I'm not impressed.

    I really wanted to buy Nvidia, but I cannot afford it.

  • Anonymous User - Friday, September 12, 2003 - link

    #62: I do have the money but I choose to spend it elsewhere. FYI: I spend $164 US on my 2.4C and I'm running speeds faster than the system used for this benchmark.

    "The Dell PCs we used were configured with Pentium 4 3.0C processors on 875P based motherboards with 1GB of memory. We were running Windows XP without any special modifications to the OS or other changes to the system."

    Anand was using a single system to show what HL2 performance would be on video cards available on the market today. If we was to run benchmarks on different CPU's he would have to spend a tremendous amount more time doing so. In the interest of getting the info out as soon as possible, he limited himself to a single system.

    I would deduce from the performance numbers of HL2 in Anand's benchmarks that unless you have a 9600 Pro/9800 Pro, your AMD will not be able to effectively run HL2.
  • Anonymous User - Friday, September 12, 2003 - link

    Woohoooo!!!
    My ATI 9500@9700 128MB with 8 pixel pipelines and 256bit access beats the crap out of any FX.
    And it only costed me 190euros/190dollars

    Back to the drawing board NVidia.
    Muahahahah!!!

Log in

Don't have an account? Sign up now