Original Link: http://www.anandtech.com/show/1144

It's almost ironic that the one industry we deal with that is directly related to entertainment has been the least exciting for the longest time. The graphics world has been littered with controversies surrounding very fickle things as of late; the majority of articles you'll see relating to graphics these days don't have anything to do with how fast the latest $500 card will run. Instead, we're left to argue about the definition of the word "cheating". We pick at pixels with hopes of differentiating two of the fiercest competitors the GPU world has ever seen, and we debate over 3DMark.

What's interesting is that all of the things we have occupied ourselves with in recent times have been present throughout history. Graphics companies have always had questionable optimizations in their drivers, they have almost always differed in how they render a scene and yes, 3DMark has been around for quite some time now (only recently has it become "cool" to take issue with it).

So why is it that in the age of incredibly fast, absurdly powerful DirectX 9 hardware do we find it necessary to bicker about everything but the hardware? Because, for the most part, we've had absolutely nothing better to do with this hardware. Our last set of GPU reviews were focused on two cards - ATI's Radeon 9800 Pro (256MB) and NVIDIA's GeForce FX 5900 Ultra, both of which carried a hefty $499 price tag. What were we able to do with this kind of hardware? Run Unreal Tournament 2003 at 1600x1200 with 4X AA enabled and still have power to spare, or run Quake III Arena at fairytale frame rates. Both ATI and NVIDIA have spent countless millions of transistors, expensive die space and even sacrificed current-generation game performance in order to bring us some very powerful pixel shader units with their GPUs. Yet, we have been using them while letting their pixel shading muscles atrophy.

Honestly, since the Radeon 9700 Pro, we haven't needed any more performance to satisfy the needs of today's games. If you take the most popular game in recent history, the Frozen Throne expansion to Warcraft III, you could run that just fine on a GeForce4 MX - a $500 GeForce FX 5900 Ultra was in no way, shape or form necessary.

The argument we heard from both GPU camps was that you were buying for the future; that a card you would buy today could not only run all of your current games extremely well, but you'd be guaranteed good performance in the next-generation of games. The problem with this argument was that there was no guarantee when the "next-generation" of games would be out. And by the time they are out, prices on these wonderfully expensive graphics cards may have fallen significantly. Then there's the issue of the fact that how well cards perform in today's pixel-shaderless games honestly says nothing about how DirectX 9 games will perform. And this brought us to the joyful issue of using 3DMark as a benchmark.

If you haven't noticed, we've never relied on 3DMark as a performance tool in our 3D graphics benchmark suites. The only times we've included it, we've either used it in the context of a CPU comparison or to make sure fill rates were in line with what we were expecting. With 3DMark 03, the fine folks at Futuremark had a very ambitious goal in mind - to predict the performance of future DirectX 9 titles using their own shader code designed to mimic what various developers were working on. The goal was admirable; however, if we're going to recommend something to millions of readers, we're not going to base it solely off of one synthetic benchmark that potentially may be indicative of the performance of future games. The difference between the next generation of games and what we've seen in the past is that the performance of one game is much less indicative of the performance of the rest of the market; as you'll see, we're no longer memory bandwidth bound - now we're going to finally start dealing with games whose pixel shader programs and how they are handled by the execution units of the GPU will determine performance.

All of this discussion isn't for naught, as it brings us to why today is so very important. Not too long ago, we were able to benchmark Doom3 and show you a preview of its performance; but with the game being delayed until next year, we have to turn to yet another title to finally take advantage of this hardware - Half-Life 2. With the game almost done and a benchmarkable demo due out on September 30th, it isn't a surprise that we were given the opportunity to benchmark the demos shown off by Valve at E3 this year.

Unfortunately, the story here isn't as simple as how fast your card will perform under Half-Life 2; of course, given the history of the 3D graphics industry, would you really expect something like this to be without controversy?

By now you've heard that our Half-Life 2 benchmarking time took place at an ATI event called "Shader Day." The point of Shader Day was to educate the press about shaders, their importance and give a little insight into how ATI's R3x0 architecture is optimized for the type of shader performance necessary for DirectX 9 applications. Granted, there's a huge marketing push from ATI, despite efforts to tone down the usual marketing that is present at these sorts of events.

One of the presenters at Shader Day was Gabe Newell of Valve, and it was in Gabe's presentation that the information we published here yesterday. According to Gabe, during the development of Half-Life 2, the development team encountered some very unusual performance numbers. Taken directly from Gabe's slide in the presentation, here's the performance they saw initially:

Taken from Valve Presentation

As you can guess, the folks at Valve were quite shocked. With NVIDIA's fastest offering unable to outperform a Radeon 9600 Pro (the Pro suffix was omitted from Gabe's chart), something was wrong, given that in any other game, the GeForce FX 5900 Ultra would be much closer to the Radeon 9800 Pro in performance.

Working closely with NVIDIA (according to Gabe), Valve ended up developing a special codepath for NVIDIA's NV3x architecture that made some tradeoffs in order to improve performance on NVIDIA's FX cards. The tradeoffs, as explained by Gabe, were mainly in using 16-bit precision instead of 32-bit precision for certain floats and defaulting to Pixel Shader 1.4 (DX8.1) shaders instead of newer Pixel Shader 2.0 (DX9) shaders in certain cases. Valve refers to this new NV3x code path as a "mixed mode" of operation, as it is a mixture of full precision (32-bit) and partial precision (16-bit) floats as well as pixel shader 2.0 and 1.4 shader code. There's clearly a visual tradeoff made here, which we will get to shortly, but the tradeoff was necessary in order to improve performance.

The resulting performance that the Valve team saw was as follows:

Taken from Valve Presentation

We had to recap the issues here for those who haven't been keeping up with the situation as it unfolded over the past 24 hours, but now that you've seen what Valve has shown us, it's time to dig a bit deeper and answer some very important questions (and of course, get to our own benchmarks under Half-Life 2).

ATI & Valve - Defining the Relationship

The first thing that comes to mind when you see results like this is a cry of foul play; that Valve has unfairly optimized their game for ATI's hardware and thus, it does not perform well on NVIDIA's hardware. Although it is the simplest accusation, it is actually one of the less frequent that we've seen thrown around.

During Gabe Newell's presentation, he insisted that they [Valve] have not optimized or doctored the engine to produce these results. It also doesn't make much sense for Valve to develop an ATI-specific game simply because the majority of the market out there does have NVIDIA based graphics cards, and it is in their best interest to make the game run as well as possible on NVIDIA GPUs.

Gabe mentioned that the developers spent 5x as much time optimizing the special NV3x code path (mixed mode) as they did optimizing the generic DX9 path (what ATI's DX9 cards use). Thus, it is clear that a good attempt was made to get the game to run as well as possible on NVIDIA hardware.

To those that fault Valve for spending so much time and effort trying to optimize for the NV3x family, remember that they are in the business to sell games and with the market the way it is, purposefully crippling one graphics manufacturer in favor of another would not make much business sense.

Truthfully, we believe that Valve made an honest attempt to get the game running as well as possible on NV3x hardware but simply ran into other unavoidable issues (which we will get to shortly). You can attempt to attack the competence of Valve's developers; however, we are not qualified to do so. Yet, any of those who have developed something similar in complexity to Half-Life 2's source engine may feel free to do so.

According to Gabe, these performance results were the reason that Valve aligned themselves more closely with ATI. As you probably know, Valve has a fairly large OEM deal with ATI that will bring Half-Life 2 as a bundled item with ATI graphics cards in the future. We'll be able to tell you more about the cards with which it will be bundled soon enough (has it been 6 months already?).

With these sorts of deals, there's always money (e.g. marketing dollars) involved, and we're not debating the existence of that in this deal, but as far as Valve's official line is concerned, the deal came after the performance discovery.

Once again, we're not questioning Valve in this sense and honestly don't see much reason to, as it wouldn't make any business sense for them to cripple Half-Life 2 on NVIDIA cards. As always, we encourage you to draw your own conclusions based on the data we've provided.

Moving on…

What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

Improving Performance on NVIDIA

If the hypotheses mentioned on the previous page hold true, then there may be some ways around these performance issues. The most obvious is through updated drivers. NVIDIA does have a new driver release on the horizon, the Detonator 50 series of drivers. However, Valve instructed us not to use these drivers as they do not render fog in Half-Life 2. In fact, Valve was quite insistent that we only used publicly available drivers on publicly available hardware, which is a reason you won't see Half-Life 2 benchmarks in our upcoming Athlon 64 review.

Future drivers may be the key for higher performance to be enabled on NVIDIA platforms, but Gabe issued the following warning:

"I guess I am encouraging skepticism about future driver performance."

Only time will tell if updated drivers can close the performance gap, but as you are about to see, it is a decent sized gap.

One thing that is also worth noting is that the shader-specific workarounds for NVIDIA implemented by Valve will not immediately translate to all other games that are based off of Half-Life 2's Source engine. Remember that these restructured shaders are specific to the shaders used in Half-Life 2, which won't necessarily be the shaders used in a different game based off of the same engine.

Gabe also cautioned that reverting to 16-bit floating point values will only become more of an issue going forward as "newer DX9 functionality will be able to use fewer and fewer partial precision functions." Although the theory is that by the time this happens, NV4x will be upon us and will have hopefully fixed the problems that we're seeing today.

NVIDIA's Official Response

Of course, NVIDIA has their official PR response to these issues, which we've published below:

During the entire development of Half Life 2, NVIDIA has had close technical contact with Valve regarding the game. However, Valve has not made us aware of the issues Gabe discussed.

We're confused as to why Valve chose to use Release. 45 (Rel. 45) - because up to two weeks prior to the Shader Day we had been working closely with Valve to ensure that Release 50 (Rel. 50) provides the best experience possible on NVIDIA hardware.

Regarding the Half Life2 performance numbers that were published on the web, we believe these performance numbers are invalid because they do not use our Rel. 50 drivers. Engineering efforts on our Rel. 45 drivers stopped months ago in anticipation of Rel. 50. NVIDIA's optimizations for Half-Life 2 and other new games are included in our Rel.50 drivers - which reviewers currently have a beta version of today. Rel. 50 is the best driver we've ever built - it includes significant optimizations for the highly-programmable GeForce FX architecture and includes feature and performance benefits for over 100 million NVIDIA GPU customers.

Pending detailed information from Valve, we are unaware of any issues with Rel. 50 and the drop of Half-Life 2 that we have. The drop of Half-Life 2 that we currently have is more than 2 weeks old. It is not a cheat or an over optimization. Our current drop of Half-Life 2 is more than 2 weeks old. NVIDIA's Rel. 50 driver will be public before the game is available. Since we know that obtaining the best pixel shader performance from the GeForce FX GPUs currently requires some specialized work, our developer technology team works very closely with game developers. Part of this is understanding that in many cases promoting PS 1.4 (DirectX 8) to PS 2.0 (DirectX 9) provides no image quality benefit. Sometimes this involves converting 32-bit floating point precision shader operations into 16-bit floating point precision shaders in order to obtain the performance benefit of this mode with no image quality degradation. Our goal is to provide our consumers the best experience possible, and that means games must both look and run great.

The optimal code path for ATI and NVIDIA GPUs is different - so trying to test them with the same code path will always disadvantage one or the other. The default settings for each game have been chosen by both the developers and NVIDIA in order to produce the best results for our consumers.

In addition to the developer efforts, our driver team has developed a next-generation automatic shader optimizer that vastly improves GeForce FX pixel shader performance across the board. The fruits of these efforts will be seen in our Rel.50 driver release. Many other improvements have also been included in Rel.50, and these were all created either in response to, or in anticipation of the first wave of shipping DirectX 9 titles, such as Half-Life 2.

We are committed to working with Gabe to fully understand.

More on Mixed-Mode for NV3x

We briefly mentioned the Mixed Mode of operation for NV3x GPUs that Valve implemented in Half-Life 2, but there is much more to it than just a special NV3x code path. In fact, the mixed mode NV3x code path was really only intended for the GeForce FX 5900 Ultra (NV35). The mainstream FX chips (5200/5600) require a slightly different code path.

Here you can see the 40% performance boost NVIDIA gets from the special NV3x code path.

The GeForce FX 5600 (NV31) uses a code path that is internally referred to as dx82; this path is a combination of DX9 (pixel shader 2.0) and DX8.1 (pixel shader 1.4) code, and thus, doesn't look as good as what you'll see on the 5900 Ultra.

Although the 5900 Ultra performs reasonably well with the special NV3x mixed mode path, the 5600 and 5200 cards do not perform well at all. Valve's recommendation to owners of 5600/5200 cards is to run the DX8 (pixel shader 1.4) code path in order to receive playable performance under Half-Life 2. The performance improvement gained by dropping to the DX8 code path is seen most on the GeForce FX 5200; although, there is a slight improvement on the 5600 as you can see below:

The sacrifices that you encounter by running either the mixed mode path or the DX8 path are obviously visual. The 5900 Ultra, running in mixed mode, will exhibit some banding effects as a result of a loss in precision (FP16 vs. FP32), but still looks good - just not as good as the full DX9 code path. There is a noticeable difference between this mixed mode and the dx82 mode, as well as the straight DX8 path. For example, you'll notice that shader effects on the water aren't as impressive as they are in the native DX9 path.

Are the visual tradeoffs perceptive? Yes. The native DX9 path clearly looks better than anything else, especially the DX8.0/8.1 modes.

The Test

Valve had very strict requirements about the test systems they let us use. The systems were only allowed to use publicly available drivers and thus, we used NVIDIA's Detonator 45.23s and ATI's Catalyst 3.7s, both publicly available from the respective websites.

The Dell PCs that we used were configured with Pentium 4 3.0C processors on 875P based motherboards with 1GB of memory. We were running Windows XP without any special modifications to the OS or other changes to the system.

We ran a total of three levels on each card - e3_techdemo_5, e3_bugbait and e3_c17_02, all of which were part of the E3 demos that were shown and are representative of actual game play under Half-Life 2.

We ran all cards at 1024x768, and the highest end cards at 1280x1024. We also used the best possible shader setting for the hardware, meaning that the R3x0 hardware used the DX9 code path, the 5900 Ultra used the NV3x code path and everything else used the DX8.x code path.

All tests were run without Anti-Aliasing or Anisotropic Filtering enabled. Anti-Aliasing was not properly supported in this demo and thus wouldn't be representative of final game play.

We only tested with a 128MB Radeon 9800 Pro as a 256MB card wasn't available at the time (all of our 256MB cards were tied up in Athlon 64 testing). The performance difference between 128MB and 256MB is negligable; although time permitting, we may see some higher detail textures offered for 256MB card owners. We'll see what happens once the game ships though.

Half-Life 2 Performance - e3_techdemo_5.dem

In our first test, we see that ATI holds an incredible lead over NVIDIA, with the Radeon 9800 Pro outscoring the GeForce FX 5900 Ultra by almost 70%. The Radeon 9600 Pro manages to come within 4% of NVIDIA's flagship, not bad for a ~$100 card.

Although it looks much worse than the other competitors, the GeForce4 Ti 4600 running in its DX8 code path manages to offer fairly decent frame rates - outperforming both of the mainstream FX parts. The Radeon 9200 has a good showing here, but exhibited a number of visual artifacts during the testing that could impact performance. So, we'll reserve judgement on that part until everything gets worked out between it and the game.

At 1280x1024, we're shading more pixels and thus the performance difference increases even further, with the 5900 Ultra being outperformed by 73% this time around.

Half-Life 2 Performance - e3_bugbait.dem

Under this demo, the GeForce4 Ti 4600 steps up to the plate and comes in as a third place winner. Once again, remember that out of the entire bunch, the visual quality on the Ti 4600 would be the worst here as it is using the baseline DX8.0 (pixel shader 1.1) code path.

The Radeon 9800 and 9700 Pro both take the lead, outperforming the GeForce FX 5900 Ultra by around 32% here. The Radeon 9600 Pro manages to offer extremely good bang for your buck, slightly outperforming the 5900 Ultra.

The performance gap grows to be a massive 61% advantage for the Radeon 9800 Pro over the GeForce FX 5900 Ultra at 1280x1024.

Half-Life 2 Performance - e3_c17_02.dem

Here we see something very interesting, and something we haven't really seen before - the Radeon 9600 all the way up to the Radeon 9800 Pro performing within 12% of each other. This is because with shader-heavy games, such as Half-Life 2, the bottleneck is no longer memory bandwidth - rather it is pure computational power; basically, how quickly these GPUs can process through those shader programs.

The GeForce FX 5900 Ultra is just edged out by the Radeon 9600 Pro. What's even more interesting is that NVIDIA's GeForce4 Ti 4600 manages to beat all of the other contenders quite well - granted that the Ti 4600 doesn't look as good as it is using the base DX8.0 code path.

The Radeon 9200 puts up a good fight; however, there were some rendering issues during the benchmark, which may invalidate this score. We'll have to wait for the final build to see if things change any.

At 1280x1024, a smaller subset of the cards were run. You can tell why just by looking at the frame rates. Interestingly enough, the Radeon 9600 Pro comes out ahead here by a slight margin over the Radeon 9700 Pro - possibly due to its updated architecture. The GeForce FX 5900 Ultra still lags behind. This time, even more significantly because of the fact that we're shading many more pixels at a higher resolution.

Final Words

When we first heard Gabe Newell's words, what came to mind is that this is the type of excitement that the 3D graphics industry hasn't seen in years. The days where we were waiting to break 40 fps in Quake I were gone and we were left arguing over whose anisotropic filtering was correct. With Half-Life 2, we are seeing the "Dawn of DX9" as one speaker put it; and this is just the beginning.

The performance paradigm changes here; instead of being bound by memory bandwidth and being able to produce triple digit frame rates, we are entering a world of games where memory bandwidth isn't the bottleneck - where we are bound by raw GPU power. This is exactly the type of shift we saw in the CPU world a while ago, where memory bandwidth stopped being the defining performance characteristic and the architecture/computational power of the microprocessors had a much larger impact.

One of the benefits of moving away from memory bandwidth limited scenarios is that enhancements that traditionally ate up memory bandwidth, will soon be able to be offered at virtually no performance penalty. If your GPU is waiting on its ALUs to complete pixel shading operations then the additional memory bandwidth used by something like anisotropic filtering will not negatively impact performance. Things are beginning to change and they are beginning to do so in a very big way.

In terms of the performance of the cards you've seen here today, the standings shouldn't change by the time Half-Life 2 ships - although NVIDIA will undoubtedly have newer drivers to improve performance. Over the coming weeks we'll be digging even further into the NVIDIA performance mystery to see if our theories are correct; if they are, we may have to wait until NV4x before these issues get sorted out.

For now, Half-Life 2 seems to be best paired with ATI hardware and as you've seen through our benchmarks, whether you have a Radeon 9600 Pro or a Radeon 9800 Pro you'll be running just fine. Things are finally heating up and it's a good feeling to have back...

Log in

Don't have an account? Sign up now