Original Link: http://www.anandtech.com/show/1372
IntroductionNext week (we are hearing July 5th), Ubisoft will release their second patch to CryTek's FarCry. This is the game that shows off the beautiful CryEngine renderer that CryTek has put together. The images and scenery is truly beautiful, and with the new patch comes a much needed update to run speed (~15%) and run duration (~30%). These new features make the game an even more enjoyable experience.
But that's not the major update that we are here to talk about. The FarCry 1.2 will feature a new rendering path based on Shader Model 3.0 (Vertex and Pixel Shader 3.0), which is currently only supported by NVIDIA's 6800 series cards and not by ATI's X800 line of cards.
We are here today to test out the new patch on six different levels in FarCry and see if the new methods, which CryTek were able to include in their new path, offer any kind of advantage. As the game play experience is meant to be the same no matter what card we're using, we'll clear the air before we start, and say that there will be no new eye candy available through the SM3.0 path. The game should be rendered exactly the same way it was under SM2.0, and we will take a look at IQ as we go through our tests just to make sure that we keep on track. This is a very important point to take away as it means that regardless of whether you buy an ATI X800 or an NVIDIA 6800, the game will still look and play the same.
Well, if there are no new bells and whistles, why should the end user care? Because there are some performance increases that CryTek was able to squeeze out of the engine with their new render path. How much, we're about to find out, but first, let's take a look at what exactly has changed.
UPDATE: It has recently come to our attention that our 4xAA/8xAF benchmark numbers for NVIDIA 6800 series cards were incorrect when this article was first published. The control panel was used to set the antialiasing level, which doesn't work with FarCry unless set specifically in the FarCry profile (which was not done here). We appologize for the error, and have updated our graphs and analysis accordingly.
For a more positive update, after a discussion with CryTek about the new rendering path, we have learned that the lighting model implimented in the SM3.0 Path is exactly the same as was used in the SM2.0 Path. The only exception is that they used the conditional rendering (branching in the pixel shader) to emulate multipass lighting in a single pixel shader. The performance gains we see actually indicate that PS3.0 branching does not have as significant a performance hit as previously thought (and proves to be more efficient than using multiple pixel shaders in a scene).
What's New in 1.2?We have, unfortunately, not had the chance or pleasure of speaking with CryTek on the subject, but NVIDIA has given us a heads up on what the new patch includes in the way of SM3.0 support and the features of the SM3.0 rendering path. From what we can gather, there are two performance enhancing developments in the CryEngine made possible by SM3.0 in the new patch.
As for VS3.0, instancing is implemented when rendering grass. Instancing helps to reduce CPU and bus overhead by allowing the engine to send one model to the vertex shader where multiple "instances" of the object are manipulated and moved about the scene as necessary. With all the grass in FarCry, it's easy to see how this could be beneficial.
Under PS3.0, CryTek has apparently implemented single-pass per-pixel lighting. With this per-pixel lighting model, a pixel shader is run that takes into account and processes all light sources in the level that affect a particular pixel in one pass. The PS2.0 implementation apparently uses multiple rendering passes (one for each light) for each affected pixel. This means that in heavily lighted scenes, one (more intense) lighting pass can run, which eliminates the time it takes to setup and execute another pass, even if both implementations have the same result. It is unclear to us exactly why this is possible in PS3.0 and not in PS2.0 (we have even seen examples of technology like this running on PS2.0 hardware). We would really love the chance to go more in-depth with Crytek about their lighting algorithms.
At this point, this is all we know about the new SM3.0 rendering path in FarCry. So, theoretically, running around with your flashlight on in an outdoor (very grassy) night scene with lots of lights everywhere would offer the best performance boost that NVIDIA could see from the new rendering path. There are scenes like this in the game. We actually test a night scene with grass (the regulator level), though it doesn't have an abundance of light sources (or a flashlight).
If these are the only differences between the SM2.0 and SM3.0 paths, then all we heard about were the performance enhancing features. We do appreciate NVIDIA providing us with the patch and information before it went live on Ubisoft's website, but we would still rather have had some of this information directly from the source. Obviously, we want to present a fair and unbiased account of what's actually happening, and we should be fine as long as we take proper precautions and consider as many angles as possible.
And on that note, we've been in touch with ATI about a rendering issue that we noticed with the FarCry 1.2 patch under WinXP SP2, DX9.0c, and ATI's latest public driver (4.6). The problem appears to be incorrect mipmap selection on particular sections of the ground (usually on uneven ground) throughout the game. We've seen this appear as the result of an incorrectly set LOD in the past, but we don't know what is causing this. To be very clear about the issue, here's a portion of a screenshot of the ground right out in front of us in the volcano level.
ATI assures us that they have also been working with CryTek on their efforts. Since we have seen a performance improvement with the latest driver and new 1.2 patch, we don't have any reason to think that anything extraordinarily fishy is going on behind the scenes between NVIDIA and Crytek. We would obviously like to see this texturing issue fixed.
Since performance characteristics in FarCry are dominated by shader performance rather than the texel fill rate or the size of the texture used (especially if it's still being trilinearly filtered as it appears to be the case from the screenshot), our opinion is that this issue will not significantly artificially improve the performance numbers that we see from ATI.
The BenchmarkSince we are focused on one game, our testing methodologies will be a little different this time around, and we'll take the opportunity to go in-depth on our testing method for this game as well. We will look at 6 different levels of the game, and for each Level Analysis, we will look at image quality as well as demo benchmark performance.
One of the largest caveats about benchmarking in FarCry is that demos don't work like one would expect. For example, in Unreal Tournament 2004, we can start a game, play our hearts out, start and stop recording somewhere in the middle, and we have a very cool little benchmark of the action. This repeatable benchmark is a fair representative of gameplay as far as benchmarks go. This seems reasonable for a game with a built-in demo mode.
FarCry, on the other hand, will record the movement of the player through a level without recording any of the players actions (like firing a weapon or pressing a button to open a door), and none of the other characters in the level are recorded either. When a demo recorded in single player mode is viewed, all the AI controlled characters man the same posts that they would in the game; only they ignore the player moving through the level when the demo is running.
This means that some demos have instances of passing through locked doors, and AI bots making their normal rounds regardless of the player (and can be in different locations for different runs of the benchmark). And as if this wasn't enough, the worst part of the whole experience (have you figured it out yet?) is that demos are entirely absent of any fighting, conflict, or gunfire.
We tried many times to benchmark this game using FRAPS, but our ability to be repeatable was worse than what demo playback gives us.
So, why are we at all OK with using FarCry's built-in demo mode? Because much of FarCry game play has to do with sneaking around, walking through the levels, and taking in the scenery. No, it's not the all-encompassing perfect benchmark, but it isn't the worst thing that we've seen either (*cough* - 3dmark - *cough*). We've compared the demo mode to our very non-repeatable FRAPS benchmarks of walking around levels and we are comfortable with the reliability of the scores that we get from the demo for that purpose.
Also, when we were informed that this patch was coming down the pipe, NVIDIA sent along a couple demos to test the performance difference with the new SM3.0 path enabled. Ubisoft is going to include these 4 demos with their patch, but we were obviously a little wary of just throwing these numbers up. We took a close look at the demos, and we are including them alongside our original custom demo and a new custom demo that we recorded for this article. The reason why we are including the NVIDIA provided demos is that they are definitely sections of the game that are really parts of the gameplay. Whether these are representative of overall gameplay or not, there are definitely experiences in the single player mode of the game that are represented by the demos.
It is important to keep the numbers in this test in perspective. We are specifically investigating the benefit and impact of CryTek's new SM3.0 path. To explore the technology fully, it is necessary to look both at parts of the game that benefit most from the update as well as those that don't see that much impact. We trust that the demos provided by NVIDIA will highlight the best results that we can expect to see, and it just so happened that our original demo and the new one, which we recorded, see less change (as we will soon discover).
The TestFor this test, we used the same setup as in our 6800 and x800 launch articles. This time around, we are using newer drivers, a beta windows service pack, DX9.0c, and the 1.2 version of FarCry. The numbers that we originally ran are much different (in a good way) than the numbers that we will see here for the SM2.0 path on both cards.
In order to test image quality, we couldn't use Windows' built-in screen capture, or HyperSnap 5 (which we usually use to accommodate DX9 captures with special requirements). We had to use FarCry's built-in screen capture (default key is F12), which only captures images in .jpg format rather than any of the uncompressed formats that we would rather see for IQ comparisons. As such, pixel perfect comparisons (though not technically possible in the first place) aren't even a distant hope. Small versions of the images have only been cropped, not resized or resampled, and the full 1600x1200 images will be linked up.
|Performance Test Configuration|
|Processor(s):||AMD Athlon 64 3400+|
|RAM:||2x 512MB OCZ PC3200 (2:2:3:6)|
|Hard Drive(s):||Seagate Barracuda 7200.7|
|Video AGP & IDE Bus Master Drivers:||VIA Hyperion 4in1 4.51|
|Video Card(s):||NVIDIA GeForce 6800 Ultra Extreme
NVIDIA GeForce 6800 Ultra
NVIDIA GeForce 6800 GT
NVIDIA GeForce 6800
ATI Radeon X800 XT Platinum Edition
ATI Radeon X800 XT
ATI Radeon X800 Pro
|Video Drivers:||NVIDIA 61.45 SM3 Beta Graphics Drivers
ATI Catalyst 4.6
|Operating System(s):||Windows XP Professional SP2 RC2 with DX9.0c
and the Summer 2004 DirectX SDK Update
|Power Supply:||PC Power & Cooling Turbo Cool 510|
|Motherboards:||FIC K8T800 (754 pin)|
As is apparent from the table, we are introducing a couple of new cards this time around. For easy reference, here is the pixel width, core clock speed and memory data rate of all the parts included:
NVIDIA GeForce 6800: 12 pipes, 325 core, 700 mem
NVIDIA GeForce 6800 GT: 16 pipes, 350 core, 1000 mem
NVIDIA GeForce 6800 Ultra: 16 pipes, 400 core, 1100 mem
NVIDIA GeForce 6800 Ultra Extreme: 16 pipes, 460 core, 1200 mem
ATI Radeon X800 Pro: 12 pipes, 475 core, 900 mem
ATI Radeon X800 XT: 16 pipes, 500 core, 1000 mem
ATI Radeon X800 XT Platinum Edition: 16 pipes, 520 core, 1120 mem
ATI cards are always run in SM2.0 mode (as they don't support SM3.0), so the labels on the graphs only reflect the code path that NVIDIA's cards take. Each level analysis will have an SM2.0 comparison (both NVIDIA and ATI on the same path) and an SM3.0 comparison (NVIDIA running SM3.0 with ATI running SM2.0).
Also, keep in mind that this test is performing an analysis of two different rendering paths, and not the performance difference between SM2.0 and SM3.0 code. If this were really a test of SM2.0 versus SM3.0, we would be talking about using the same rendering techniques with different instructions (in which case, the lower complexity of SM2.0 has the potential to be faster in many cases). What we are looking at here are two different rendering methods.
In other words, this is the performance difference between two different implementations of CryTek's engine, not a generalization of SM2.0 versus SM3.0 performance. In this case, CryTek determined that SM3.0 provided functionality, which made changes to the rendering path, worth the cost of implementation. Let's take a look at the end result.
Level Analysis: mp_airstripThis is our default benchmark here at AnandTech. We've been using this demo to benchmark with FarCry ever since we first included it in our benchmark suite.
Image quality is not an issue here as the NVIDIA and ATI screen shots show.
With no AA or AF enabled, the SM3.0 path offers a small performance boost that nudges the GeForce 6800 Ultra Extreme up over the X800 XT in performance. The improvement in our default benchmark is negligible, and (except for the 6800 UE) statistically insignificant. Also, the 6800 shows a very small drop in performance here.
Turning on AA and AF gives a little extra help to seeing a performance improvement with the SM3.0 path, but this still isn't anything to brag about. With AA/AF and SM3.0 the only card that slips in position is the X800 Pro (which is really too close a race to call anyway).
Level Analysis: mp_mangoriverThis benchmark is focused around water, and includes an underwater section. Since it's also a benchmarked spectator, we fly up over the waterfall to get some long viewing range rendering in.
Here, we also see no image quality differences, but take a look at the full size jpegs as well, if you need more convincing.
Once again, no significant performance improvements except on the very high end with the 6800 UE changing place with the X800 XT.
Still, more of the same with AA and AF enabled. Not much exciting so far, but these benchmarks don't highlight the properties that have enhanced performance under the new rendering path. It is interesting to note that the X800 series of cards appear to be more resilient to turing on AA and AF than the NVIDIDA cards even under the new rendering path. The following four NVIDIA provided demos do a good job of pointing out situations were performance gains are seen.
Level Analysis: researchThis level is completely in doors in a dark room with a large machine in the center. There is plenty of lighting all around, and this should give us a good idea of the impact of the new lighting model implimentation.
By some miracle the two SM2.0 rendering path screenshots are just about perfectly lined up, but the SM3.0 shot was a little early. There isn't a discernable IQ difference in looking at these images that we can tell, but please have a look at the full versions.
In this benchmark, we see performance go from fairly evenly spread out over the range of cards tested to three of the four NVIDIA cards leading in performance under the SM3.0 rendering path. The 12-pipe NVIDIA card almost catches up to the 12-pipe ATI card.
Unlike our previous benchmark, the top of the line NVIDIA card even leads in AA/AF when SM3.0 is enabled. This is a fairly impressive performance gain, with the 6800UE leaping over two ATI cards, and the 6800U becoming competitive with the X800 XT.
Remember, this is an NVIDIA provided demo and highlights the performance benefits of the new rendering path. At the same time, if we had benchmarked this level, we would have done the same thing by following the path on which the game takes the player. This kind of benefit does exist in the game, but so do times when (as we have shown) no real improvement can be seen.
Level Analysis: regulatorThis level starts inside and moves outside at night through some grass and trees around a compound.
Here is the first indication we have had that the SM3.0 path renders things ever so slightly differently than the SM2.0 path. Looking closely (swapping quickly back and forth between the images), it is apparent that the glow of the light above the door is a little dimmer when the new rendering techniques are applied (really, we promise its different). This small a difference doesn't really add up to anything in terms of game play, but it does let us know that the new lighting model isn't a mathematically identical solution. Which one is nearer the developers vision, only CryTek can tell us (and hopefully they will).
Again, we see a small performance improvement that pushes the 6800 GT above the x800 XT PE, but this performance improvement is very small.
Continuing the trend, performance improvement due to SM3.0 is slightly higher with 4xAA/8xAF than without. Here, we even see that NVIDIA can lead in an AA/AF enabed benchmark without the help of the new rendering path.
NVIDIA supplied this demo, and its modest improvement shows that not all performance gains are monumental.
Level Analysis: trainingThis is actually the very first thing that happens in the game. The player wakes up and walks through the cave out onto a beach. In this benchmark, the player walks out to the small dock and looks out over water as the demo concludes.
The new VS3.0 code that handles grass does a good job of matching up with the VS2.0 implimentation. These shots aren't lined up as well as some of our others, but it is clear that there aren't any differences in rendering quality.
This time, we don't see much performance improvement here.
With AA and AF enabled, we get a little more of a boost in performance.
This NVIDIA supplied level shows smaller improvement than the other (just slightly more impressive than our custom demos). The main factor in this level is grass, which our tests really didn't have. So far, it seems like lighting has the largest impact on performance.
Level Analysis: volcanoThis level starts out with some very nicely lighted floors in an indoor scene. Our screenshot reflects one of the different floors the demo runs across. The demo then moves outside to run across a lava-pitted volcano crater with some interesting heat and lava effects.
Even though there is no difference between the two cards under the SM2.0 rendering path, its easy to see that the very center of the specular highlight is not as highly saturated under SM3.0. Again, this may or may not be by design, but hopefully we will be able to bring you that information soon. Either rendering seems equally likely to be more desired (either the reflection is supposed to be brighter than the SM3.0 path, or the the SM2.0 path's multipass lighting may over brighten the floor).
Here, we see huge performance gains with the new rendering path.
And even larger leaps with AA and AF enabled.
This is another NVIDIA supplied demo, and it shows the largest performance gains that we see in the SM3.0 render path. From the looks of our other benchmarks, these numbers are not typical, but they do happen as our own exploration of this level proved to reflect the numbers that we see in this demo.
Final WordsBoth of our custom benchmarks show ATI cards leading without anisotropic filtering and antialiasing enabled, with NVIDIA taking over when the options are enabled. We didn't see much improvement from the new SM3.0 path in our benchmarks either. Of course, it just so happened that we chose a level that didn't really benefit from the new features the first time we recorded a demo. And, with the mangoriver benchmark, we were looking for a level to benchmark that didn't follow the style of benchmarks that NVIDIA provided us with in order to add perspective.
Even some of the benchmarks with which NVIDIA supplied us showed that the new rendering path in FarCry isn't a magic bullet that increases performance across the board through the entire game.
Image quality of both SM2.0 paths are on par with eachother, and the SM3.0 path on NVIDIA hardware shows negligable differences. The very slight variations are most likely just small fluctuations between the mathematical output of a single pass and a multipass lighting shader. The difference is honestly so tiny that you can't call either rendering lower quality from a visual standpoint. We will still try to learn what exactly causes the differences we noticed from CryTek.
The main point that the performance numbers make is not that SM3.0 has a speed advantage over SM2.0 (as even the opposite may be true), but that single pass per-pixel lighting models can significantly reduce the impact of adding an ever increasing number of lights to a scene.
It remains to be seen whether or not SM3.0 offer a significant reduction in complexity for developers attempting to implement this advanced functionality in their engines, as that will be where the battle surrounding SM3.0 will be won or lost.
UPDATE: CryTek has pointed out that the new lighting implimentation is essentially the same but uses branching in the pixel shader to accomplish what needed to be done in multiple shaders under the PS2.0 path. This indicates that the conditional rendering feature of SM3.0 is actually faster than using multiple shaders (which gives NVIDIA 6 series cards a performance advantage when multiple shaders would have been required).