What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Friday, September 12, 2003 - link

    Anand: When you re-test with the Det 50's, make sure you rename the HL2 exe!!!

    Gotta make the comparison as fair as possible...
  • Anonymous User - Friday, September 12, 2003 - link

    #69 How does the 9500 not fully support DX9? It's the same core EXACTLY as the 9700.
  • Anonymous User - Friday, September 12, 2003 - link

    #53 - So YOU'RE that bastard who's been lagging us out!!! Get out of the dark ages!
  • Anonymous User - Friday, September 12, 2003 - link

    What kind of conclusion was that ?

    In terms of the performance of the cards you've seen here today, the standings shouldn't change by the time Half-Life 2 ships - although NVIDIA will undoubtedly have newer drivers to improve performance. Over the coming weeks we'll be digging even further into the NVIDIA performance mystery to see if our theories are correct; if they are, we may have to wait until NV4x before these issues get sorted out.

    For now, Half-Life 2 """ SEEMS """ to be best paired with ATI hardware and as you've seen thorugh our benchmarks, whether you have a Radeon 9600 Pro or a Radeon 9800 Pro you'll be running just fine. Things are finally """heating up""" and it's a good feeling to have back...

    HL2 ""seems"" better on ATI??? , should be, HL2 looks way better and faster on ATI.

    Things are finally """heating up""" ??? shoul have been , ATI's performance is killing Nvidia's FX.

    The conclusion should have been :
    Nvidia lied and sucks , Valva had to lower standard ( actually optimize (cheat) in favor of Nvidia) and make HL2 game look bad , just so you could play on your overpriced Nvidia Fx cards.

    How about a word of apology from Anand to have induced readers in errors , and have told them to buy Nvidia Fx card's in is last Fx5900 review. ???

    From a future ATI card owner, (bundled with HL2 of course)

    Boy I'm pissed off!
  • Anonymous User - Friday, September 12, 2003 - link

    82, those are 9600 regulars (!), click the links. Pricewatch has been fooled. A Pro isn't much more, though, just about $136.

    I'd go with a 9500 over a 9600 any day. The 9500 can be softmodded to 9700 performance levels (about 50-70% of the time, IIRC, and it's actually a little cheaper than the 9600 Pro!). If the softmod doesn't work out, then you return it for a new one. Of course, not everyone wants to do this, and a 9600 Pro is a respectable and highly overclockable card.. but..

    I'd still love to see 9500 Pros at lower prices, like they would have been if ATi had kept it out.. but whatever. If you don't know, the 9500 Pro is/was considerably faster than the 9600 Pro. Valve said that HL2 isn't memory-limited, so the 128-bit memory interface on the 9500 Pro (which never made a big difference vs. the 9700 anyway) shouldn't even be noticeable, and the fact that the Sapphire-made ones were just as overclockable as the 9500 regulars and 9700s (think up to 340 core, 350 if you're lucky) is going to make it one HELL of an HL2 card for the $175 most people paid.
  • Anonymous User - Friday, September 12, 2003 - link

    Nvidia got schooled, but not on hardware or drivers. ATI locked this up long ago with their deal with Gabe and buddies.
    Why is everyone just trying to keep a straight face about it? ATI paid handsomely for exactly what has happened to NVidia.
    But as always happens, watch out when the tables turn, as they ALWAYS do, and Valve could be on the OUTSIDE of lots of other deals.
  • Anonymous User - Friday, September 12, 2003 - link

    I am just glad there is finally a damn game that can stress out these video cards. Wonder if Bitboys Oy of whatever there name is come out saying they have a new video card out now that will run Half Life 2 at 100+ FPS :) What made me think of them I have no idea!
  • Anonymous User - Friday, September 12, 2003 - link

    Not to detract from the main issue here, but #19 raises a good point. Why does the 9600Pro lose only <1% performance going from 1024 to 1280? The 9800P and 9700P lose between 10-15%. The 5900U loses 30%, sometimes more. I wonder if the gap between the 9800P and 9600P shrinks even more at higher resolutions.

    What aspect of the technology in the 9600 could possibly account for this?
  • Anonymous User - Friday, September 12, 2003 - link

    #81 You can find 9600pro's for ~$160 from newegg.

    A couple of small webstores have a "Smart PC 9600" non-pro 128 meg for <$100. But the smart pc card is a cheap oem unit...I'm not sure if it's as good as the more expensive 9600's.
  • Anonymous User - Friday, September 12, 2003 - link

    Pricewatch:
    $123 - RADEON 9600 Pro 256MB
    $124 - RADEON 9600 Pro 128MB

Log in

Don't have an account? Sign up now