What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Sunday, September 14, 2003 - link

    Umm.. could you PLEASE not use shockwave for those
    tables? Our firewalls & browser configs also won't let it through, so these reviews become pretty much useless to read.
  • Anonymous User - Sunday, September 14, 2003 - link

    where are the benchmarks comparing HL2 at different CPUs? I mean, i obviously know I'm gonna have to upgrade my gf3 to a new card (first game to make me even think of that... didnt care for ut2k3), but what about my venerable athlon xp 1800+ ? :(
  • Anonymous User - Saturday, September 13, 2003 - link

    #98, look at the 9700 Pro numbers, subtract 4-5%.

    Still, if I were to see another set of benchmarks, I'd DEFINITELY want these:

    GeForce4 MX440 OR GeForce2 Ti - As an example of how well GF2/GF4MX cards perform on low detail settings, being DX7 parts.
    GeForce3 Ti200 OR GeForce3 Ti500 - It's a DX8 part, and still respectably fast; lots of people have Ti200s, anyway.
    GeForce4 Ti4200 - This is an incredibly common and respectably fast card, tons of people would be interested in seeing the numbers for these.
    GeForce FX 5600 Ultra - Obvious.
    GeForce FX 5900 Ultra - Obvious.
    Radeon 8500 - It's still a good card, you know.
    Radeon 9500 Pro - Admit it, you're all interested.
    Radeon 9600 Pro - Obvious.
    Radeon 9700 vanilla - Because it would show how clock speed scales, and besides these (and softmodded 9500s) are quite common.
    Radeon 9700 Pro - Obvious.
    Radeon 9800 Pro - Obvious.

    The GeForce FX 5200 and GeForce4 Ti4600 might be nice too, but the Radeons 9000 through 9200 would be irrelevant (R200-based).

    Also, obviously, I'd like to see them on two or three different detail levels (preferably three), to show how well some of the slower ones run at low detail and see how scalable Source really is. Speaking of scalability, a CPU scaling test would be extremely useful as well, like AnandTech's UT2003 CPU scaling test.

    This sort of thing would probably take a lot of time, but I'd love to see it, and I bet I'm not alone there. I think something like what AnandTehc did with UT2003 would be great.

    Just my ~$0.11.
  • clarkmo - Saturday, September 13, 2003 - link

    I can't believe the Radeon 9500 hacked to a 9700 wasn't included in the benchmarks. What was he thinking? I guess Anand didn't have any luck scoring the right card. There are some still available, you know.
  • Anonymous User - Saturday, September 13, 2003 - link

    Quote from the Irari information minister:
    "Nvidia is kicking ATI's butt. Their hardware is producing vastly superior numbers."
  • Anonymous User - Saturday, September 13, 2003 - link

    Nvidia quote: "Part of this is understanding that in many cases promoting PS 1.4 (DirectX 8) to PS 2.0 (DirectX 9) provides no image quality benefit."

    3Dfx said some years ago that no one ever would use or notice the benefits of 32 bit textures. Nvidia did and 3Dfx is gone. Will Nvidia follow the 3Dfx path?
  • Anonymous User - Saturday, September 13, 2003 - link

    Anyone remember when ati.com sold rubber dog crap?
  • Pete - Saturday, September 13, 2003 - link

    #74, straight from the horse's mouth:

    http://www.nvnews.net/#1063313306
    "The GeForce FX is currently the fastest card we've benchmarked the Doom technology on and that's largely due to NVIDIA's close cooperation with us during the development of the algorithms that were used in Doom. They knew that the shadow rendering techniques we're using were going to be very important in a wide variety of games and they made some particular optimizations in their hardware strategy to take advantage of this and that served them well. --John Carmack"

    Of course those D3 numbers were early (as are these HL2 ones), so things can change with updated drivers.
  • Anonymous User - Saturday, September 13, 2003 - link

    I don't know if it has already been asked, but even if it has I ask again for emphasis.

    Anand, it would be nice if you could add a 9600 non-pro bench to the results. You mention raw GPU power being the determining factor now, and as the 9600 Pro's difference in memory clock is more significant than its engine clock, it would be interesting and informative to the budget/performance croud to note the 9600 non-pro performance in HL2.

    Thanks for all your informative, insightful, accurate, in-depth articles.
  • Anonymous User - Friday, September 12, 2003 - link

    I always find it interesting how people say ATI is the "little guy" in this situation.

    ATI has been a major player in the video card market since the eighties (I had a 16-bit VGA Wonder stuck in an 8-bit ISA slot in an 8088/2 system) and that didn't change much even when 3dfx came onto the scene. A lot of those voodoo pass through cards got their video from an ATI 3d expression or some other cheap 2d card (Cirrus Logic or Trident anyone?).

    Nvidia and ATI have been at each others throats ever since Nvidia sold its first video card on the OEM market. 3dfx was just a little blip to ATI, Nvidia stealing away a bunch of its OEM sales with a bad 2d/good 3d video card on the other hand, well, that was personal.

    I imagine someone at ATi saying something like this:

    "All of you guys working on making faster DACs and better signal quality are being transferred to our new 3d department. Its sort of like 2d cept its got one more d, thats all we know for right now.".

    ATI knows how to engineer and build a video card, they have been doing it for long enough. Same with Matrox (Matrox builds the Rolls Royce's of video cards for broadcast and video editing use), Nvidia on the other hand knew how to build 3d accelerators, and not much else. The 2d on any early Rage card slaughtered the early Nvidia cards.

    Course, the 3d sucked balls, thats what a Canopus Pure 3d was for though.

    Now ATI has the whole "3d" part of the chip figured out. The driver guys have their heads wrapped around the things as well (before 3d cards came around ATI's drivers were the envy of the industry). Its had many years of experience dealing with games companies, OS companies, standards, and customers. And its maturity is really starting to show after a few minor bumps and bruises.

    ATI wants its market back, and after getting artx it has the means to do it. Of course, Nvidia is going to come out of this whole situation a lot more grown up as well. Both companies are going to have to fight blood tooth and nail to stay on top now. If they don't Matrox might just step up to the plate and bloody both of their noses. Or any of those "other" long forgotten video card companies that have some engineers stashed away working on DX 7 chips.

    God knows what next month is going to bring.

    Anyways, sorry for the rant..

Log in

Don't have an account? Sign up now