Original Link: http://www.anandtech.com/show/2267
Real World DirectX 10 Performance: It Ain't Prettyby Derek Wilson on July 5, 2007 9:00 AM EST
- Posted in
When it was drafted, DirectX 10 promised to once again change the way developers approach real-time 3D graphics programming. Not only would graphics hardware be capable of executing short custom programs (called shaders) on vertices and fragments (pixels), but developers would be able to move much more high-level polygon work to the GPU through geometry shaders. Pulling polygon level manipulation off the CPU opens up a whole host of possibilities to the developer.
With adequate performance, many of the geometric details simulated through other techniques could be applied in simple, straightforward ways involving less overhead. Techniques like normal mapping, parallax occlusion mapping, and many others exist solely for generating the illusion of additional geometry. Ever wonder why a face can be incredibly detailed while the silhouette of the same head looks more like a stop sign than a melon? This is because modern real-time 3D relies on low polygon models augmented with pixel level "tricks" to make up for it.
There are lots of cool thing we can do with the ability to process geometry on the GPU. We could see particle systems on the GPU, fine grained model details like fur that can be affected by the physical characteristics of the world, procedural geometry for highly dynamic environments, "real" displacement mapping, and geometry amplification that can add detail to models. Some of these things may show up sooner than others in games, as we will still be limited by the performance of the hardware when it comes to implementing these features.
There are, of course, other benefits to DX10. We explored this in previous articles for those who are interested, but here's a quick run down. Object and state change overhead has been decreased, allowing for less CPU involvement when sending data to the GPU. This should improve performance and give developers more headroom in building larger, more complex scenes. We have more rigidly defined specifications, which means developers can focus less on how individual hardware will handle their game and more on the features they want to implement. With a larger focus on data types and accuracy, the results of calculations will be more consistent between hardware, and developers will have more flexibility in choosing how their data is processed.
In general, DX10 also offers a more generic computing model with lots of flexibility. This will be very important going forward, but right now developers still have de facto limitations on shader length and complexity based on the performance of the hardware that currently exists. As developers better learn how to use the flexibility they have, and as hardware designers continue to deliver higher performance year after year, we will see DirectX 10 applications slowly start to blossom into what everyone has dreamed they could be.
For now, before we get into features and performance, we would like to temper your expectations. Many of the features currently implemented in DirectX 10 could also be done using DirectX 9. Additionally, those features that are truly DX10 only either don't add much beyond what we would get otherwise, or require quite a bit of processing power to handle. Thus, we either get something that was already possible or something that requires expensive hardware.
Our test platform is the same as the one we used in our recent articles. Necessarily departing from our norm, this round of testing is performed under the 32-bit version of Windows Vista. We used the latest beta drivers we could get our hands on from both AMD and NVIDIA. Here's a breakdown of the platform:
Performance Test Configuration:
|CPU:||Intel Core 2 Extreme X6800 (2.93GHz/4MB)|
|Chipset Drivers:||Intel 188.8.131.524|
|Hard Disk:||Seagate 7200.7 160GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2)|
|Video Drivers:|| ATI Catalyst 184.108.40.206-rc2
NVIDIA ForceWare 162.18
|Desktop Resolution:||1280 x 800 - 32-bit @ 60Hz|
|OS:||Windows Vista x86|
We were also able to obtain a beta version of FRAPS from Beepa in order to record average framerates in DirectX 10 applications. Without this, we were previously limited to only testing applications that generate statistics for us. Armed with a DX10 capable version of FRAPS, we can now also take a look at the performance of DX10 SDK samples and other demos that don't include built in frame counters.
For now, though, we are sticking with real world performance. We'll be looking at Call of Juarez, Company of Heroes, and Lost Planet: Extreme Condition. Except for Call of Juarez, we will be looking at DirectX 9 and DirectX 10 path performance. The Call of Juarez benchmark explicitly highlights the enhanced features of their DX10 path, and they don't offer an equivalent benchmark for DX9. If there is demand for Call of Juarez benchmarking down the road, we may look at using FRAPS in both DX9 and DX10 versions. Lost Planet testing required the use of our DX10 version of FRAPS, but Company of Heroes testing was performed using the same method previously available (the performance test in the graphics options section).
In addition to looking at each game on its own, we will take a look at how DX9 and DX10 performance compare overall. Performance scaling with and without AA under each API as well as relative performance of cards under each API will be analyzed.
Call of Juarez
There has been quite a bit of controversy and drama surrounding the journey of Call of Juarez from DirectX 9 to DirectX 10. As many may remember, AMD handed out demos of the DirectX 10 version of Call of Juarez prior to the launch of R600. This build didn't fully support NVIDIA hardware, so many review sites opted not to test it. On its own, this is certainly fine and no cause for worry. It's only normal to expect a company to want to show off something cool running on their hardware even if it isn't as fully functional as the final product will be.
But, after NVIDIA found out about this, they set out to help Techland bring their demo up to par and get it to run properly on G80 based systems. Some publications were able to get an updated build of the game from Techland which included NVIDIA's fixes. When we requested the same from them, they declined to provide us with this updated code. They cited the fact that they would be releasing a finalized benchmark in the near future. Again, this was fine with us and nothing out of the ordinary. We would have liked to get our hands on the NVIDIA update, but it's Techland's code and they can do what they want with it.
Move forward to the release of the of the Call of Juarez benchmark we currently have for testing, and now we have a more interesting situation on our hands. Techland decided to implement something they call "HDR Correct" antialiasing. This feature is designed to properly blend polygon edges in cases with very high contrast due to HDR lighting. Using a straight average or even a "gamma corrected" blend of MSAA samples can result in artifacts in extreme cases when paired with HDR.
The real caveat here is that doing HDR correct AA requires custom MSAA resolve. AMD hardware must always necessarily perform AA resolves in the shader hardware (as the R/RV6xx line lack dedicated MSAA resolve hardware in their render backends), so this isn't a big deal for them. NVIDIA's MSAA hardware, on the other hand, is bypassed. The ability of DX10 to allow individual MSAA samples to be read back is used to perform the custom AA resolve work. This incurs quite a large performance hit for what NVIDIA says is little to no image quality gain. Unfortunately, we are unable to compare the two methods ourselves, as we don't have the version of the benchmark that actually ran using NVIDIA's MSAA hardware.
NVIDIA also tells us that some code was altered in Call of Juarez's parallax occlusion mapping shader that does nothing but degrade the performance of this shader on NVIDIA hardware. Again, we are unable to verify this claim ourselves. There are also other minor changes that NVIDIA feels unnecessarily paint AMD hardware in a better light than the previous version of the benchmark.
But Techland's response to all of this is that game developers are the one's who have the final say in what happens with their code. This is definitely a good thing, and we generally expect developers to want to deliver the best experience possible to their users. We certainly can't argue with this sentiment. But whether or not anything is going on under the surface, it's very clear that Techland and NVIDIA are having some relationship issues.
No matter what's really going on, it's better for the gamer if hardware designers and software developers are all able to work closely together to design high quality games that deliver a consistent experience to the end user. We want to see all of this as just an unfortunate series of miscommunications. And no matter what the reason, we are here today with what Techland has given us. The performance of their code as it is written is the only thing that really matters, as that is what gamers will experience. We will leave all other speculation in the hands of the reader.
So, what are the important DirectX 10 features that this benchmark uses? We see geometry shaders to simulate water particle effects, alpha-to-coverage for smooth leaf and grass edges, and custom MSAA resolve for HDR correct AA.
The AMD Radeon HD 2900 XT clearly outperforms the GeForce 8800 GTS here. At the low end, none of our cards are playable under any option the Call of Juarez benchmark presents. While all the numbers shown here are with large shadow maps and high quality shadows, even without these features, the 2400 XT only posted about 10 fps at 1024x768. We didn't bother to test it against the rest of our cards because it just couldn't stack up.
With 4xAA enabled, our low-end NVIDIA hardware really tanks. Remember that even these cards must resolve all MSAA samples in their shader hardware. AMD's parts are designed to always handle AA in this manner, but NVIDIA's parts only support the feature inasmuch as DX10 requires it.
We do see some strange numbers from the low-end NVIDIA cards at 1600x1200, but its likely that they performed so poorly here that rendering certain aspects of the scene failed to the point of improving performance (in other words, it's likely not everything was rendered properly even though we didn't notice anything).
Company of Heroes
While Company of Heroes was first out of the gate with a DirectX 10 version, Relic didn't simply recompile their DX9 code for DX10; Company of Heroes was planned for DX10 from the start before there was any hardware available to test with. We are told that it's quite difficult to develop a game when going only by the specifications of the API. Apparently Relic was very aggressive in their use of DX10 specific features and had to scale back their effort to better fit the actual hardware that ended up hitting the street.
In spite of the fact that Microsoft requires support for specific features in order to be certified as a DX10 part, requiring a minimum level of performance for features is not part of the deal. This certainly made it hard for early adopters to produce workable code before the arrival of hardware, as developers had no idea which features would run fastest and most efficiently.
In the end, a lot of the DX10 specific features included in CoH had to be rewritten in a way that could have been implemented on DX9 as well. That's not to say that DX10 exclusive features aren't there (they do make use of geometry shaders in new effects); it's just that doing things in a way similar to how they are currently done offers better performance and consistency between hardware platforms. Let's take a look at some of what has been added in with the DX10 version.
The lighting model has been upgraded to be completely per pixel with softer and more shadows. All lights can cast shadows, making night scenes more detailed than on the DX9 version. These shadows are created by generating cube maps on the fly from each light source and using a combination of instancing and geometry shading to create the effect.
Company of Heroes DirectX 9
Company of Heroes DirectX 10
There is more debris and grass around levels to add detail to terrain. Rather than textures, actual geometry is used (through instancing and geometry shaders) to create procedurally generated "litter" like rocks and short grass.
Triple buffering is enabled by default, but has been disabled (along with vsync) for our tests.
We discovered that our cards with 256MB of RAM or less had trouble running with 4xAA and DirectX 10. Apparently this is a known issue with CoH on 32-bit Vista running out of addressable memory. Relic says the solution is to switch to the 64-bit version of the OS, which we haven't had time to test out quite yet.
DirectX 9 Tests
Under DX9, the Radeon HD 2900 XT performs quite well when running Company of Heroes. The card is able to keep up with the 8800 GTX here. In spite of a little heavier hit from enabling 4xAA, the 2900 XT still manages to best it's 8800 GTS competition. But the story changes when we move to DX10.
DirectX 10 Tests
When running with all the DX10 features enabled, the HD 2900 XT falls to just below the performance of the GeForce 8800 GTS. Once again, the low-end NVIDIA and AMD cards are unable to run at playable framerates under DX10, though the NVIDIA cards do lead AMD.
Enabling 4xAA further hurts the 2900 XT relative to the rest of the pack. We will try to stick with Windows Vista x64 in the future in order to run numbers with this game on hardware with less RAM.
Lost Planet: Extreme Condition
Lost Planet: Extreme Condition is a port of an Xbox 360 game. Well, to be honest, it's almost as if they just tacked on support for a keyboard and mouse and recompiled it for Windows on x86 hardware. While we haven't done a full review of the game, only playing through the intro and first mission, our initial assessment is that this is absolutely the worst console port ever.
Unless you have an Xbox 360 controller for your PC, the game is almost unplayable. The menus are clunky and difficult to navigate. Moving in and out of different sections of the main menu requires combinations of left and right clicking, which is patently absurd. If during gameplay you wish to change resolutions, you must click the mouse no less than a dozen times. This does not include the need to navigate menus through hovering (who does that?) and the click required to grab the scroll bar in the settings menu.
The console shooter has always had to work hard to compete with the PC. Halo and Halo 2 did quite a good job of stepping up to the plate, and Gears of War really hit one out of the park. But simply porting a mediocre console shooter to the PC does not a great game make.
That said, if you can get past the clunky controls and stunted interface, the visuals in this game are quite stunning. It can also actually be fun and satisfying to shoot up a bunch of Akrid. Our first impression is that a good game could be buried underneath all of the problems inherent in the PC port of Lost Planet, but we'll have to take a closer look to draw a final conclusion on this one.
For now, the important information to take away is what we get from the DirectX 10 version of the game. While we haven't found an explicit list of the differences, our understanding is that the features are generally the same. Under DirectX 10, gamers can choose a "high" shadow quality option while DX9 is limited to "medium". Other than this, it seems lighting is slightly different (though not really better) under DX10. From what we've seen reported, Capcom's goal with DX10 on Lost Planet is to increase performance over their DX9 version.
Lost Planet DirectX 9
Lost Planet DirectX 10
In order to make as straightforward a comparison as possible, we used the same settings under DX9 and DX10 (meaning everything on high except for shadow quality).
DirectX 9 Tests
While it's difficult not to feel like a broken record, we need to disable quite a few settings to get our low-end cards playable. In this case, even under DX9 without AA enabled they don't perform well. We are planning on testing the 8500 and 8400 in the near future and we'll be sure and go back to see if we can run DX10 tests at low enough settings to get interesting results for these budget and mainstream parts.
Also, once again, the 2900 XT performs well under DX9, but slips a little behind with 4xAA due to it's lack of MSAA resolve hardware.
DirectX 10 Tests
Incredibly, NVIDIA's 162 drivers combined with the retail version of Lost Planet actually deliver roughly equivalent performance on DX9 and DX10. While Capcom's goal is higher performance under DX10, we would still expect that AMD and NVIDIA both have a long way to go in bringing their DX10 drivers up to parity with the quality of their DX9 drivers.
DirectX 9 vs. DirectX 10
Here we'll take a closer look at some of the scaling differences between DirectX 9 and DirectX 10 on current hardware under current drivers with Company of Heroes and Lost Planet.
First up is a look at relative scaling between cards under each API. The idea is to see whether cards that perform better under DX9 also perform better under DX10 (and vice versa). This will only give us a glimpse at what could happen going forward, as every game (and every implementation of that game) will be different.
For Company of Heroes, we see huge performance drops in moving to DirectX 10 from DirectX 9. The new lighting and shadowing techniques combined with liberal geometry shader use are responsible for at least halving performance when running the more detailed DX10 path. NVIDIA seems to handle the new features Relic added better than AMD. These results are especially impressive remembering that NVIDIA already outperformed AMD hardware under DX9.
Lost Planet is a completely different animal. With Capcom going for a performance boost under DX10, we can see that they actually succeeded with the top of the line NVIDIA cards. There isn't much else enticing about the DX10 version of Lost planet, and it's clear that AMD's drivers haven't been optimized to tackle this game quite yet.
Next we want to take a look at AA scaling difference between DirectX 9 and DirectX 10. Can we expect less impact from AA on one API or the other? Let's take a look.
Under Call of Juarez, our low-end NVIDIA cards suffer from a huge drop in performance when AA is enabled. This is likely due to the fact that they can't handle either the bandwidth or the shader requirements of Techland's HDR correct AA. The higher end parts seem to handle the AA method fairly well, though certainly NVIDIA would be happier if the retained their hardware AA advantage.
For our DX10 Company of Heroes test, which does use hardware MSAA resolve where available, AMD hardware scales much worse than NVIDIA hardware.
All of our cards scale worse under DX9 when enabling 4xAA than under DX10. While we don't have enough information to really understand why that is under Company of Heroes, it is certainly interesting to see some sort of across the board performance advantage for DX10 (even if it is in a round about way).
Lost Planet, with its attempt to improve performance by moving to DX10, delivers very similar performance impact from AA in either DX9 or DX10. Again we see a very slight scaling advantage in favor of DX10 (especially with AMD hardware), but nothing life changing.
For now, AMD does seem to have an advantage in Call of Juarez, while NVIDIA leads the way in Company of Heroes and Lost Planet. But as far as NVIDIA vs. AMD in DirectX 10 performance, we really don't want to call a winner right now. It's just way too early, and there are many different factors behind what we are seeing here. As the dust settles and everyone gets fully optimized DirectX 10 drivers out the door with a wider variety of games, then we'll be happy to take a second look.
The more important fact to realize is that DirectX 10 is finally here. While developers are used to programmable hardware after years with DirectX 9, there is still room for experimentation and learning with geometry shaders, more flexibility, lower state change and object overhead, and (especially) faster hardware. But DirectX 10 isn't an instant pass to huge performance and incredible effects.
Let's look at it like this: there are really three ways a game can come to support DirectX 10, and almost all games over the next few years will ship with a DX9 path as well. The easiest thing is to do a straight port of features from DirectX 9 (which should generally be slightly faster than the DirectX 9 counterpart if drivers are of equal quality). We could also see games offer a DirectX 10 version with enhanced features that could still be implemented in DX9 in order to offer an incentive for users to move to a DX10 capable platform. The most aggressive option is to implement a game focused around effects that can only be effectively achieved through DirectX 10.
Games which could absolutely only be done in DX10 won't hit for quite a while for a number of reasons. The majority of users will still be on DX9 platforms. It is logical to spend the most effort developing for the user base that will actually be paying for the games. Developers are certainly interested in taking advantage of DX10, but all games for the next couple of years will definitely have a DX9 path. It doesn't make sense to rewrite everything from the ground up if you don't have to.
We are also hearing that some of the exclusive DX10 features that could enable unique and amazing effects DX9 isn't capable of just don't perform well enough on current hardware. Geometry shader heavy code, especially involving geometry amplification, does not perform equally well on all available platforms (and we're looking at doing some synthetic tests to help demonstrate this). The performance of some DX10 features is lacking to the point where developers are limited in how intensely they can use these new features.
Developers (usually) won't write code that will work fine on one platform and not at all on another. The decisions on how to implement a game are in the hands of the developers, and that's where gamers rightly look when performance is bad or hardware and feature support is not complete. Building a consistent experience for all gamers is important. It won't be until most users have hardware that can handle all the bells and whistles well that we'll see games start to really push the limits of DX10 and reach beyond what DX9 can do.
In conversations with developers we've had thus far, we get the impression that straight ports of DX9 to DX10 won't be the norm either. After all, why would a developer want to spend extra time and effort developing, testing and debugging multiple code paths that do exactly the same thing? This fact, combined with the lack of performance in key DX10 features on current hardware, means it's very likely that the majority of DX10 titles coming out in the near term will only be slightly enhanced versions of what could have been done through DX9.
Both NVIDIA and AMD were very upset over how little we thought of their DX10 class mainstream hardware. They both argued that graphics cards are no longer just about 3D, and additional video decode hardware and DX10 support add a lot of value above the previous generation. We certainly don't see it this way. Yes, we can't expect last years high-end performance to trickle down to the low-end segment, but we should at least demand that this generation's $150 part will always outperform last generation's.
This is especially important in a generation that defines the baseline of support for a new API. The 2400 and 8400 cards will always be the lowest common denominator in DX10 hardware (until Intel builds a DX10 part, but most developers will likely ignore that unless Intel can manage to pull a rabbit out of their hat). We can reasonably expect that people who want to play games will opt for at least an 8600 or a 2600 series card. Going forward, developers will have to take that into account, and we won't be able to see key features of games require more horsepower than these cards provide for the next couple of years.
AMD and NVIDIA had the chance to define the minimum performance of a DX10 class part higher than what we can expect from cards that barely get by with DX9 code. By choosing to design their hardware without a significant, consistent performance advantage over the X1600 and 7600 class of parts, developers have even less incentive (not to mention ability) to push next generation features only possible with DX10 into their games. These cards are just not powerful enough to enable widespread use of any features that reach beyond the capability of DirectX 9.
Even our high-end hardware struggled to keep up in some cases, and the highest resolution we tested was 2.3 megapixels. Pushing the resolution up to 4 MP (with 30" display resolutions of 2560x1600) brings all of our cards to their knees. In short, we really need to see faster hardware before developers can start doing more impressive things with DirectX 10.