What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Friday, September 12, 2003 - link

    another thing i just noticed looking at the doom 3 and hl2 benchies.

    take a look at the performance of 9800pro and 9600pro...

    in hl2, the 9800pro is about 27% ahead of the 9600pro, in doom 3 the 9800pro is near 50% faster than the 9600pro. the whole thing just feels weird.

    enigma
  • Anonymous User - Friday, September 12, 2003 - link

    I'm surprised that Anand mentioned nothing about the comparisons between 4x2 and 8x1 pipelines? Does he even know that MS is working to included paired textures with simutainious wait states for the nV arcitexture? You see the DX9 SDK was developed thinking only one path and since each texture has a defined FIFO during the pass the second pipe in the nV is dormant until the first pipe FIFO operation is complete, with paired textures in the pipe using syncronus wait states this 'problem' will be greatly relieved.
  • Anonymous User - Friday, September 12, 2003 - link

    its fake.... HL2 test are not ready today , great fake Anandtech :)
  • rogerw99 - Friday, September 12, 2003 - link

    #28
    Ooo Ooo Ooo... I know the answer to that one.
    It was Mrs. White, but it wasn't with the gun, it was the lead pipe.
  • Anonymous User - Friday, September 12, 2003 - link

    ATI The Way It Should Be Played
  • Anonymous User - Friday, September 12, 2003 - link

    Quote: 'So why is it that in the age of incredibly fast, absurdly powerful DirectX 9 hardware do we find it necessary to bicker about everything but the hardware? Because, for the most part, we've had absolutely nothing better to do with this hardware.'

    Don't we? Wrong!

    http://www.cs.virginia.edu/~gfx/pubs/multigridGPU/

    ;)
  • Anonymous User - Friday, September 12, 2003 - link

    one thing that i think is kinda interesting. check out this benchmark hardocp did - fx5900 ultra vs. radeon 9800 pro in doom 3 (with help from id software).

    http://www.hardocp.com/article.html?art=NDc0LDE=

    after reading this, read carmack's Jan 03 .plan, where he states that under the default openGL codepath, the fx architecture is about half as fast as the r300 - something that is pretty much resembled in the hl2 benchmarks. furthermore he states that using the default path the r300 is clearly superior (+100%), but when converting to vendor-specific codepaths, the fx series is the clear winner.

    conclusions? none, but some possibilities
    .) ati is better in directx, nvidia in opengl
    .) id can actually code, valve cannot
    .) and your usual conspiracy theories, feel free to use one you specifically like

    bottom line. neither ati nor nvidia cards are the "right ones" at the moment, wait for the next generation of video cards and upgrade THEN.

    enigma
  • Anonymous User - Friday, September 12, 2003 - link

    I'm so glad i converted to Ati, i have never regret it & now it feels even better. Ati rules
  • notoriousformula - Friday, September 12, 2003 - link

    i'm sure Nvidia will strike back.. prolly with DOOM III..well till then i'll enjoy my little army of ATI cards: ATI 9800NP>PRO, ATI 9700, ATI 9600PRO :P..long live ATI!!! :D
  • Anonymous User - Friday, September 12, 2003 - link

    Anand should have benchmarked on a more widely used computer like a 2400 or 2500+ AMD. Who here has the money to buy a p4 3Gb 8000mhz FSB cpu?

Log in

Don't have an account? Sign up now