What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • atlr - Friday, September 12, 2003 - link


    Quote from http://www.anandtech.com/video/showdoc.html?i=1863...
    "The Radeon 9600 Pro manages to come within 4% of NVIDIA's flagship, not bad for a ~$100 card."

    Anyone know where a ~$100 9600 Pro is sold? I thought this was a ~$200 card.
  • Anonymous User - Friday, September 12, 2003 - link

    Time to load up on ATI stock :)
  • Anonymous User - Friday, September 12, 2003 - link

    Quoted from Nvidia.com:

    "Microsoft® DirectX® 9.0 Optimizations and Support
    Ensures the best performance and application compatibility for all DirectX 9 applications."

    Oops, not this time around...
  • Anonymous User - Friday, September 12, 2003 - link

    #74 - No, D3 isn't a DX9 game, its OGL. What it shows is that the FX series isn't bad - they just don't do so well under DX9. If you stick primarily to OpenGL games and run your DX games under the 8.1 spec, the FX should perform fine. It's the DX9 code that the FXes seem to really struggle with.
  • Anonymous User - Friday, September 12, 2003 - link

    #74: I have commonly heard this blamed on a bug in an older release of the CATALYST drivers that were used in the Doom3 benchmark. It is my understanding that if the benchmark was repeated with the 3.7 (RELEASED) drivers, the ATI would perform much better.

    #75: I believe this goes back to prior instances where Nvidia has claimed that some new driver would increase performance dramatically to get it into a benchmark and then never release the driver for public use. If this happened, the benchmark would be unreliable as it could not be repeated by an end-user with similar results.

    Also, the Det50 drivers from Nvidia do not have a working fog system. It has been hinted that this could be intentional to improve performance. Either way, I saw a benchmark today (forgot where) that compared the Det45's to the beta Det50's. The 50's did improve performance in 3DMark03 but no where near the 73% gap in performance seen in HL2.
  • Anonymous User - Friday, September 12, 2003 - link

    Because Gabe controls how representative the hl2 beta is of the final hl2 product but he cannot control how representative the nvidia det50 beta is if the final det50s.

    And besides that there have been rumours of "optimalisations" in the new det50s.
  • Anonymous User - Friday, September 12, 2003 - link

    How is it that Gabe can recommend not running benckmarks on an publicly unavailable driver or hardware, yet the game itself is unavailable? Seems a little hypocritical....
  • Anonymous User - Friday, September 12, 2003 - link

    I didn't have time to look into this but can someone enlilghten me as to why the 5900 Ultra outperformed the 9800 PRO in the Doom 3 benchmarks we saw awhile back...is that not using DX9 as well? If I am way off the mark here or am even wrong on which outperformed which go easy on the flames!

    Thanks
  • Anonymous User - Friday, September 12, 2003 - link

    "Not true, the 9500 is a true DX9 part. The 9600 does have faster shader units though."

    My bad, should have looked at ATI first. I guess I'm thinking about the 8500. Either way, I would still go 9600 Pro, especially given that it is cheaper than a 9500 non-pro.
  • Anonymous User - Friday, September 12, 2003 - link

    "The 9600 fully supports DX9 whereas the 9500 does not."

    Not true, the 9500 is a true DX9 part. The 9600 does have faster shader units though.

Log in

Don't have an account? Sign up now