What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • Anonymous User - Friday, September 12, 2003 - link

    ==="full 32-bit would be required" not 24-bit. So that leaves all ATI cards out in the cold.===

    By the time full 32-bit becomes standard (probably with DX10 in 2-3 years) there will be NEW cards that make current cards look like sh!t. ATi will have DX10 cards for under $100, same as nVidia and their 5200. People have been upgrading their PC's for new games for YEARS! Only an [nv]IDIOT would attempt to use an old card for new games and software (TNT2 for Doom3? NOT!).
  • Anonymous User - Friday, September 12, 2003 - link

    Funny that you guys think nVidia will be still "plugging along" with the GFFX if the DX spec changes to 32bit... you _do_ know what happens to the GFFX when it's forced to run 32bit prcession don't you? You'd get faster framerates by drawing each frame by hand on your monitor with a sharpie.
  • Pete - Friday, September 12, 2003 - link

    #23, the second quote in the first post here may be of interest: http://www.beyond3d.com/forum/viewtopic.php?t=7839... Note the last sentence, which I surrounded by ***'s.

    "nVidia has released the response as seen in the link. Particularly interesting, however, is this part of the e-mail sent to certain nVidia employees ( this was not posted at the given link ):

    'We have been working very closely with Valve on the development of Half Life 2 and tuning for NVIDIA GPU's. And until a week ago had been in close contact with their technical team. It appears that, in preparation for ATI's Shader Days conference, they have misinterpreted bugs associated with a beta version of our release 50 driver.

    You also may have heard that Valve has closed a multi-million dollar marketing deal with ATI. Valve invited us to bid on an exclusive marketing arrangement but we felt the price tag was far too high. We elected not to participate. ***We have no evidence or reason to believe that Valve's presentation yesterday was influenced by their marketing relationship with ATI.***'"

    If this document is indeed real, nV themselves told their own employees Gabe's presentation wasn't skewed by Valve's marketing relationship with ATi.
  • Anonymous User - Friday, September 12, 2003 - link

    Link please #38
  • Anonymous User - Friday, September 12, 2003 - link

    LOL! 19, I saw that too. Looks like I'll be replacing my nVidia 'the way it's meant to be played in DX8 because our DX9 runs like ass, and we still sell it for $500+ to uninformed customers' card with an ATi Radeon. Thanks for the review Anand; it will be interesting to see the AA/AF benchmarks, but I have a pretty good idea of who will win those as well.
  • Anonymous User - Friday, September 12, 2003 - link

    >>>>>>>ANYONE ELSE CATCH THE FOLLOWING IN THE ARTICLE<<<<<<<<<<<<<<<

    ""One thing that is also worth noting is that the shader-specific workarounds for NVIDIA that were implemented by Valve, will not immediately translate to all other games that are based off of Half-Life 2's Source engine. Remember that these restructured shaders are specific to the shaders used in Half-Life 2, which won't necessarily be the shaders used in a different game based off of the same engine.""

    So I guess the nvidia fan boys won't be able to run their $500 POS cards with Counterstrike 2 since it will probably be based on the HL2 engine.

    buhahahaha

    >>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<
  • Anonymous User - Friday, September 12, 2003 - link

    Valve specifically said "full 32-bit would be required" not 24-bit. So that leaves all ATI cards out in the cold.
  • Pete - Friday, September 12, 2003 - link

    #23, I believe you're inferring far too much from ATi's HL2 bundling. Check TechReport's article on Gabe's presentation, in which Gabe is noted as saying Valve chose ATi (in the bidding war to bundle HL2) because their cards quite obviously performed so much better (and look better doing it--keep in mind, as Anand said, all those nVidia mixed modes look worse than pure DX9).

    In short, Valve doesn't need to do much to please others, as they're the one being chased for the potentially huge-selling Half-Life 2. Everyone will be sucking up to them, not the other way around. And it wouldn't do for Valve to offer nV the bundle exclusive, have consumers expect brilliant performance from the bundled FX cards, and get 12fps in DX9 on their DX9 FX card or 30fps on their $400+ 5900U. That would result in a lot of angry customers for Valve, which is a decidedly bad business move.

    People will buy HL2 regardless. Valve's bundling of HL2 with new cards is just an extra source of income for them, and not vital to the success of HL2 in any way. Bundling HL2 will be a big coup for an IHV like ATi, which requires boundary-pushing games like HL2 to drive hardware sales. Think of the relationship in this way: it's not that ATi won the bidding war to bundle HL2, but that Valve *allowed* ATi to win. Valve was going to get beaucoup bucks for marketing tie-ins with HL2 either way, so it's in their best interests to find sponsorships that present HL2 in the best light (thus apparently HL2 will be bundled with ATi DX9 cards, not their DX8 ones).

    You should read page 3 of Anand's article more closely, IMO. Valve coded not to a specific hardware standard, but to the DX9 standard. ATi cards run standard DX9 code much better than nV. Valve had to work extra hard to try to find custom paths to allow for the FX's weaknesses, but even that doesn't bring nV even with ATi in terms of performance. So ATi's current DX9 line-up is the poster-child for HL2 almost by default.

    We'll see what the Det50's do for nV's scores and IQ soon enough, and that should indicate whether Gabe was being mean or just frank.
  • Anonymous User - Friday, September 12, 2003 - link

    #33 To be pedantic, the spec for DX9 24bit minimum, it has never been said by Microsoft that it was 24bit and nothing else, 24bit is just a minimum.

    Just as 640x480 is a minimum. That doesn't make 1024x768 non standard.

    But considering you are right, and 24 bit is a rock solid standard, doesn't that mean that Valve in the future will violate the DX9 spec in your eyes? Does that not mean that ATI cards will be left high and dry, in the future? Afterall, there will be no optimizations allowed/able?

    32bit is the future, according to Valve after all.
    Nvidia may suck at doing it, but at least they can do it.
  • XPgeek - Friday, September 12, 2003 - link

    edit, post #32-

    should read, "my ATi is so faster than YOUR nVidia"

Log in

Don't have an account? Sign up now