Original Link: http://www.anandtech.com/show/1497
Linux 3D AGP GPU Roundup: More Cutting Edge Penguin Performanceby Kristopher Kubicki on October 4, 2004 12:05 AM EST
- Posted in
IntroductionWe recently took a look at several performance CPUs last week - and we were incredibly impressed by the amount of interest it spawned. Our little Linux section has been making waves left and right and we are quickly establishing ourselves as the premier Linux hardware journal. We have been working very diligently on a GPU roundup to top all GPU roundups in the Linux world. It has taken us a little over 3 weeks from start to finish, but we think that the final product is well worth it.
We get dozens of emails a day from readers asking which video card is right for them, particularly if they are going to give Linux a shot. It may be due to the circles that we run in, but the sheer interest for Linux among our peers seems to have peaked 100-fold what it was last year. Simple, clean distros like SuSE, Fedora Core and Mandrake have done wonders to the Windows migration crowd - and then there is the whole Gentoo sensation as well. Linux is definitely growing, but does it really have a competitive edge in any gaming or graphics intensive application?
The focus of this analysis is not to fire up glxgears, and see which program runs it faster. Instead, we want to look at some common graphics intensive applications for Linux and determine how well they run, particularly in relation to their Windows counterparts. We are interested in more than just the benchmark results - getting there is half the fun, and coincidentally, half the weighting for a purchase decision for many of us. Invariably, we will draw some conclusions from one GPU family to another out of the eleven cards that we have chosen to compare today.
When it comes to our quantitative data, we aren't just looking at average frames per second and declaring a winner. We have spent weeks working on a graphics benchmark utility specifically designed for AnandTech, which we are open sourcing and releasing to the world today as well.
Our New Benchmark: FrameGetterOK, FrameGetter is not the best name for a benchmarking utility - but we are engineers and computer scientists, not marketing geniuses. Last week, we took some time to introduce everyone to our new Linux GPU benchmark. Fortunately, it was received with incredible success - both by our industry peers and our readers. You can read more of the program specifications as described by the lead developer, Wiktor Kopec, here. Just to recap, here is how the program works again:
- We install a few libraries in the lib directory that are passed data from each game.
- A shell program in the FG suite copies and modifies the game executables. All references to libGL and libSDL in the copy are replaced with our library installed in the first step.
- The modified game executable runs while happily sending data to our libraries. Our libraries look for swap while dumping the input occasionally to the /tmp directory.
- Frames per second and time are written to the screen on some games.
- The frames per second are written into /tmp/fg_logfile.
- A batch program included in the suite converts the FG screenshots into PNG files.
Here, you can download version 0.1.0 of the AnandTech FrameGetter source and executables. Please read the documentation very carefully. FrameGetter uses a BSD style license.
Let's Talk about DriversInstalling a dozen video cards with various sets of drivers was the largest annoyance for us during the testing of this review. Obviously, due to license restrictions, NVIDIA and ATI drivers must be installed after the initial OS installation, and cannot be packaged with the kernel. For Windows users not familiar with the process, the kernel module or driver wrapper must be completely recompiled for closed source drivers to work.
NVIDIA's drivers are not only supported via SuSE's YOU (the YAST Online Updater), but the drivers easily plug into SuSE without any trouble. We just installed our kernel source, hit init 3, ran the 1.0-6111 binary install, and then followed the instructions on the screen. NVIDIA's drivers provide DRI-like support via SaX2 (the SuSE X configuration tool) as well. Typical video cards us the Direct Rendering Infrastructure (DRI) for 3D accelerated graphics. The DRI acts as somewhat of an abstraction layer between X Windows and OpenGL. NVIDIA actually uses their own DRI-like module outside of the standard DRI module. Without DRI or NVIDIA's modules, we are only running software acceleration.
ATI's drivers came out of the box with several problems. We made the initial mistake of installing and testing the entire suite of video cards with the NVIDIA cards/drivers first. We are not entirely sure why, but even after completely removing the NVIDIA kernel module via NVIDIA's uninstall scripts, we still had persistent errors installing the ATI drivers correctly.
Our first test bed was an nForce3 MSI Socket 939 board. We isolated some of our problems to the agpgart module - for older ATI drivers, we need to load a separate specific AGP module on SuSE 9.1 for DRI to load correctly. On our MSI nForce3 board, this should have been the nvidia_agp module. However, try as we could, we could not get nvidia_agp and fglrx to play well with each other. Some of the issues stem from SuSE 9.1 not recognizing the nForce3 chipset correctly, but some issues may stem from ATI drivers just not recognizing everything correctly. After switching to a Socket 939 VIA motherboard, our problems suddenly disappeared. Of course, we had to re-test our entire NVIDIA suite on the new motherboard (we saved it for last the second time around).
Even by switching to a different motherboard, we were not entirely blessed. Using ATI's driver set from their website yielded some results, but first, we made the mistake of using the fglrx package from ATI's website. ATI's implementation of the X Windows configuration completely upsets SaX2, and X will simply ignore the DRI module when we try to load it. Somewhere between playing with various kernel builds, driver builds and hardware configurations, we finally got it right. Our best success with newest SuSE 9.2-RC3 kernel came from using the RPMs and instructions on the supplement FTP site. The 2.6.8 kernel blew away our boot configuration a few times; for whatever reason, VIA SATA controllers are now recognized as SCSI controllers to the new Linux kernel. Without getting too much into detail, we needed to re-edit our mtab, fstab and grub configuration to a different device; the serial ATA drives suddenly became SCSI drives. We finally no longer had errors on the agpgart driver:
linux:~ # dmesg | grep agpgart Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected AGP bridge 0 agpgart: Maximum main memory to use for agp memory: 941M agpgart: AGP aperture is 128M @ 0xf0000000 linux:~ #
 2.6.8-14-default, you can download it from the SuSE FTP site in the update directory.
Configuration (continued)Looking back, had we completely destroyed our OS with a new kernel in this manner in the first place with the nForce3 motherboard, we probably would have ended in the same result. Our new SuSE sanctioned ATI configuration still uses the non-GPL fglrx driver (as opposed to the in-kernel radeon driver), but comes with a partially compiled kernel module and is compatible with SuSE's SaX2 configuration. X must be configured with the line below in order for the game to correctly load the DRI driver:
# init 3 (login) # sax2 -r -m 0=fglrx -b /usr/X11R6/lib/sax/profile/firegl
Enabling 3D acceleration (DRI) still needs to be done manually by editing the /etc/X11/XF86Config file after running the SaX2 utility. Enabling FSAA must be done by editing the XF86Config file by hand as well (see our AA/AF section for details). After a little more than 8 hours of playing with configurations, we hit paydirt.
linux:~ # glxinfo name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: SGI server glx version string: 1.2
All in all, just getting the ATI drivers on something that isn't Red Hat feels like way too much work for basic OpenGL support. Keep in mind that we even run SuSE, a RPM derivative - not too different from Red Hat. Even after we got the ATI fglrx drivers working correctly, we had a couple of issues with screen corruption and poor resizing. Below, you can see a screen grab from our ATI frame buffer playing Unreal Tournament at 800x600. The image should not be surrounded by a black border, but rather, stretched to the limits of the screen.
Another issue that we came across with ATI's was the lack of 64-bit Linux drivers. ATI has no 64-bit drivers for Linux, yet they have 64-bit Windows binaries. Thus, our benchmarks are limited to 32-bit binaries only.
The TestBelow, you can see our test rig configuration.
|Performance Test Configuration|
|Processor(s):||AMD Athlon 64 3800+ (130nm, 2.4GHz, 512KB L2 Cache)|
|RAM:||2 x 512MB Mushkin PC-3200 CL2 (400MHz)|
|Motherboard(s):||MSI K8T Neo2 (Socket 939)|
|Hard Drives:||Seagate 7200.7 120GB SATA|
|Video Cards:||GeForce 6800 Ultra 256MB
GeForce 6800 128MB
GeForceFX 5950 Ultra 256MB
GeForceFX 5900 Ultra 128MB
GeForce 5900 128MB
GeForceFX 5700 Ultra
Radeon X800 Pro 256MB
Radeon 9800XT 256MB
Radeon 9700 Pro 128MB
Radeon 9600XT 128MB
|Operating System(s):||SuSE 9.1 Professional
|Driver:||(ATI) SuSE 9.1 Supplement fglrx 3.12.0
You may have noticed that we are running an extremely new version of the Linux kernel, and very new ATI and NVIDIA drivers as well. For all intents and purposes, we are running a completely default SuSE 9.1 Professional install with the SuSE 9.2-RC3 kernel and brand new drivers. This was not an easy accomplishment, but was unfortunately the only manner in which we could install a platform compatible with both ATI and NVIDIA video cards on the Socket 939 architecture.
Our testing procedure is very simple. We take our various video cards and run respective time demos while using our AnandTech FrameGetter tool. We rely on in-game benchmarks for some of our tests as well - since FG will not run on Wine games. We post the average frames per second scores calculated by the utility. Remember, FG calculates the frames per second every second, but it also tells us the time that our demo ran, and how many frames it took. This average is posted for most benchmarks.
However, when testing our games, we find that some interesting patterns sometimes occur. For these instances, we have specially crafted the FG program to record our timedemo by taking the frames per second every second and dumping this data into a text file. We explained this in our initial FG announcement. Some graphs, particularly Wolfenstein and Unreal Tournament, have particularly fascinating trends, which we explore more in the evaluation.
All of our benchmarks are run three times and the highest scores obtained are taken - and as a general trend, the highest score is usually the second or third pass at the timedemo. Why don't we take the median values and standard deviation? For one, IO bottlenecks tend to occur due to the hard drive and memory, even though they "theoretically" should behave the same every time that we run the program. Memory hogs like UT2004, which tend to also load a lot of data off the hard drive, are notorious for behaving strangely on the first few passes.
Since we had issues with the ATI driver running Anisotropic Filtering, we did not run any tests with AF on. However, many of our games have sets of benchmarks with 4X Anti Aliasing disabled and enabled. At the end of this analysis, we also have a small section showing some of the differences with the various AA and anisotropic filters enabled.
Unreal Tournament 2004While Wolfenstein is our OpenGL benchmark cornerstone, Unreal Tournament is our SDL cornerstone. We place a lot of weight on our UT2004 benchmarks, since UT is perhaps the largest Linux game released to date. We are anticipating Doom3's Linux release in just a few days, so that may also change things.
We used the assault.dem timedemo in this benchmark.
During the timedemos on ATI cards, we occasionally got mild screen corruption in the game console - white flashing triangles ranging anywhere between 100 and 600 pixels long. There seems to be a documented problem with this on various websites, and it looks like the newer versions of the ATI drivers may fix this. Since we could not get the newest drivers working yet, we cannot vouch for this claim.
Below, you can see how the two video cards shaped up during the first eight seconds of the timedemo.
Our FG utility really gives us something to be proud of when we look at graphs like Unreal Tournament. It's true that the average frames per second are lower on ATI cards over their NVIDIA counterparts, but we see a lot of stability in how the card behaves. You'll notice that although the GeForce 6800 ramps up to 40FPS very quickly, it hits a local minimum while the Radeon is just starting to notch up.
The first scene in which both cards clock down can be found below.
The combination of rendering the exterior landscape (large textures) of the cargo hold, the unusual lighting and shading had affected both cards - although it would seem the Radeon cringed first before the landscape was fully revealed. Our player ducks and mainly looks at the ground for a second or so after this, and you can see the NVIDIA card really ramp up in that half second.
Wolfenstein: Enemy TerritoryWolfenstein acts as the cornerstone of our OpenGL benchmarks. The program uses very simple GL calls, and runs on virtually any configuration that we could find. There is an unusual bug in Wolfenstein when used in conjunction with FrameGetter; occasionally after running the modified executable, the original executable uses the libFG libraries anyway.
Let's take a specific look at the performance between our Radeon X800 Pro and the GeForce 6800 (Non-Ultra). You can download the CVS file of the graph below here.
We expect relatively older games like Wolfenstein to be fairly CPU bound. A two and a half minute cross-section of the radar timedemo reveals very little difference in performance between the two cards - but keep in mind that for this timedemo, we captured our FPS on two second intervals. Unreal Tournament on the previous page was taken at half second intervals. There are still some interesting phenomena, however. At the 109th second of the timedemo, notice how the Radeon X800 ramps very slowly before peaking at the 115th second. The NVIDIA card peaks almost immediately at the 111th second, stays level and then peaks again at the 115th second. Here is a screenshot of that particular scene.
Our player has just walked out of a hut and onto the field. The global scene that seems to have punished our graphic cards (~43rd second) the most can be seen below.
Medal of Honor: Allied AssaultFinally, we have Medal of Honor 1.11 beta3, which uses both the OpenGL and SDL libraries. We ran into serious problems with the game crashing while we were preparing for this analysis. We almost scrapped the game from our tests until we downloaded the v1.1 client, which seems to place nicely with the NVIDIA 1.0-6111 driver.
Since MOHAA is Quake based, we were not surprised to see many of the benchmark numbers fall in line with our Wolfenstein benchmark.
Wine, CedegaLinux has taken excellent strides in becoming a full Windows desktop replacement operating system; advances in Open Office and Mozilla being two of the most notable. Unfortunately, the decision to buy new hardware constantly goes hand in hand with the decision to play some new game - and if it's a gaming machine you want, then Linux isn't the operating system that you need. Fence sitters end up being the people who lose. For example, you may wish to buy a new Linux rig for some CAD tool, but are forced to dual boot the machine in order to play FarCry. Intel's Vanderpool and AMD's Pacifica "virtualization" technologies may make dual boot and emulation a thing of the past, but today, we are stuck emulating Windows instead of running multiple instances of it.
Maybe emulation isn't the right word. "Wine Is Not an Emulator", as they used to say. In fact, the Wine project has very little to do with emulation. Wine acts very similarly to the AnandTech FrameGetter program - running a binary while replacing and linking libraries at run-time - but on a much more complicated level. TransGaming describes the basic implementation of WineX (Cedega) below:
"Cedega loads a game's binary into memory on a Linux system and then dynamically links to code that provides an implementation of the Win32 APIs that the program is using. The APIs that Windows games are mostly built on top of are primarily based on Microsoft's DirectX system. These APIs include facilities for handling 3D graphics (Direct3D), mouse and keyboard input (DirectInput), audio (DirectSound), and so on. TransGaming works to create Linux compatible versions of these APIs that work on top of the Linux equivalents such as OpenGL, X11, and the OSS and ALSA sound APIs." Wine continues to make an impression on the Linux gaming community. For large, major releases, Cedega provides some really great support and nearly flawless gameplay. We subscribed to Transgaming's Cedega program (previously known as WineX) several months ago and have met some limited success.
FarCryOf course, we just laid out extensive praise for Wine and then we ran into a game like FarCry. We wanted FarCry to be our focus Wine benchmark game, but we immediately had problems when the game would not load. We were constantly greeted by "EXCEPTION: Attempt to read from NULL at 0x00000000" in the splash screen. We actually vaguely remember this same exception error from Mechwarrior 4 several years ago (on Windows). Part of us thinks that the attempt an unusual read like this may have something to the NX stack protection on Athlon 64 3800+ testbed.
Jedi Knight: Jedi AcademyJedi Knight: JA actually runs very smoothly and flawlessly on Wine. We cannot use our FG utility on Cedega (yet) unfortunately, so our benchmarks are based on numbers obtained in the game FPS averages.
Check out our very recent Windows analysis of JKJA. As you can see from our benchmarks, there is a definite performance hit with Wine. Derek Wilson, our GPU Editor, uses a slightly faster processor in his benchmarks, but not enough to account for a 15% lacking difference that we see in our tests. Cedega is slower, but for those of us who are trying to ditch Windows, the performance levels are acceptable.
RacerOur graphics coverage wouldn't be entirely complete without the de facto OpenGL Linux game. Note that we did not install Racer from YAST. When we installed from YAST, there was a technical issue that we ran into with the FrameGetter utility on ATI cards. We have not isolated that cause yet.
There isn't a whole lot of science behind our Racer benchmark. We basically floor the car on a racetrack and let it hit a wall at full speed. We do not seem to have any troubles replicating this test, but it isn't exactly an elegant timedemo. You can view our CSV of the graph below here.
Let us take a closer look between some evenly matched cards. You'll see that the FPS rev up until the car hits the wall and then generates a lot of atmospheric textures (smoke).
Our cards do not vary wildly in FPS like in some of our other benchmarks. The point at which the card hits the wall should seem pretty obvious from the graph. We were interested in how similarly the cards performed during this event, even though this was just a small cross-section of the benchmark.
FSAA and AFEnabling and disabling Full Screen Anti Aliasing and Anisotropic Filtering for both cards was met with varying succes. 4X AA for the ATI cards was enabled by hand in the XF86Config file. We needed to include the additional options for our fglrx device after installing the driver properly:
Option "FSAAEnable" "yes" Option "FSAAScale" "4" Option "FSAADisableGamma" "no" Option "FSAACustomizeMSPos" "no" Option "FSAAMSPosX0" "0.000000" Option "FSAAMSPosY0" "0.000000" Option "FSAAMSPosX1" "0.000000" Option "FSAAMSPosY1" "0.000000" Option "FSAAMSPosX2" "0.000000" Option "FSAAMSPosY2" "0.000000" Option "FSAAMSPosX3" "0.000000" Option "FSAAMSPosY3" "0.000000" Option "FSAAMSPosX4" "0.000000" Option "FSAAMSPosY4" "0.000000" Option "FSAAMSPosX5" "0.000000" Option "FSAAMSPosY5" "0.000000" Option "UseFastTLS" "0" Option "BlockSignalsOnLock" "on" Option "UseInternalAGPGART" "no" Option "ForceGenericCPU" "no" Option "EnablePrivateBackZ" "yes"
Pay considerable attention to the EnablePrivateBackZ option. Although documentation for that particular variable seems light, AntiAliasing refused to draw correctly without it. Without enabling that element on our tests beds, nothing would draw to the screen.
You may notice that we purposely have not discussed much about Anisotropic Filtering up until this point. There are currently no driver-level AF features in fglrx. This is a large problem with the Radeon cards in our lineup - but fortunately, we still have trilinear and bilinear filtering.
To enable FSAA for NVIDIA cards, we needed only to set the environmental variable $__GL_FSAA_MODE to 4 (AF is enabled similarly by setting $__GL_DEFAULT_LOG_ANISO). We do not need to restart X to enable FSAA or AF, which is a huge relief for us. However, attempting to find Anisotropic Filtering working correctly in a game setting proved difficult. Perhaps it was the way in which we configured our drivers, or perhaps some fluke in our testing methodology escaped us, but AF for NVIDIA cards did not work.
Since the AnandTech FrameGetter utility by default measures FPS once every second, we modified the source to take screenshots every tenth of a second for this portion of the test. After running the various benchmarks a few times, we had several hundred overlapping frames to choose some comparative screenshots for IQ testing. Below, you can see our capture of a soldier that shows two levels of anti-aliasing. Try as we could, there were no instances of one card rendering AA differently than the other. No driver cheating conspiracies today.
|Mouseover bilinear to trilinear||
Again, everything here is on par with Windows demonstrations of trilinear/bilinear filtering. There were no differences between the ATI and NVIDIA implementations of trilinear and bilinear filtering; we get the same images on both cards.
Final ThoughtsThere are so many more real world examples than just the few benchmarks that we looked at today. We did not cover many Image Quality (IQ) scenarios in this analysis either - particularly since the ATI driver has very limited (non-existent) support with Anisotropic Filtering while our NVIDIA cards just ignored any Anisotropic Filtering commands.
You can view our CSV with the performance of each video card from the roundup here.
When we started this review, we had no premonitions on the outcome of some of our video cards. It's true that installing NVIDIA drivers on Linux is almost as painless as installing the drivers on Windows; when the SuSE Yast Online Updates are up to date, installing via the online update is actually easier than Windows. ATI's drivers, on the other hand, gave us several problems - so much so that we actually ended up re-doing the analysis a few times with different kernels/motherboards just to get it right. The lack of 64-bit ATI drivers also prevented us from doing a fair 64-bit binary comparison of our game lineup.
Although we tested only two games under Wine, and one did not work, we cannot call our Wine testing very exhaustive. With more time and energy, we will devote a separate article to analyzing some games just under Wine/Cedega to see how they perform. Jedi Knight performed exceptionally; we were very impressed for a change with how easily something actually worked under Linux. We are interested in Wine's development, but we also anticipate dilemmas that it will soon face against AMD and Intel's virtualization projects. If Intel and AMD successfully create multi-core processors that allow each core to run its own operating system - and they will, given enough time - there may be a large backlash in the Linux gaming community. Users could simply run a copy of Windows (for games) and a copy of Linux (for work) at the same time without rebooting. That is, if they are OK with the price of Windows when such technologies become available. Perhaps more developers will follow in the footsteps of id and Epic, and Linux binaries will become commonplace before multi-OS virtualization squeezes the developers out.
It is important to consider that we were not particularly comparing ATI to NVIDIA in this analysis. Although this analysis did draw some pretty strong lines as to where each card stand, we were more interested in how each game performed compared to their Windows counterparts. We drew a lot of conclusions from one of our more recent video card analyses from July. Surprisingly, most of our NVIDIA video cards scaled very similarly. Wine games like Jedi Knight took a 10% to 15% hit in performance compared to the Windows tests that we did just a few weeks ago. Other games like Unreal Tournament 2004 actually showed mild signs of an increase in frame rate on the NVIDIA graphics cards. Wolfenstein: ET generally performed with similar average FPS to our video cards from 2003. However, keep in mind that the drivers used then were almost a year old.
Medal of Honor: AA and Racer do not have direct Windows benchmarks, but they helped determine a great deal about the scalability of our video cards under Linux. We were happy to see that the ATI cards were capable of keeping pace, even though there were issues with other games. Almost all of ATI's short comings on Linux came from the driver set; lack of Anisotropic Filtering, difficult configuration and few accelerated games were all issues. On the other hand, even though NVIDIA claims support for Anisotropic Filtering, we could not find an instance of it working in our testing.
High performance gaming on Linux certainly isn't for everyone. We spent weeks preparing for this analysis and we still ran into problems that we could not correct. So many times, we came to a solution for a problem only to find our Linux distribution had some files in a slightly different place or our file dependency tree was completely broken. These are the things that scare away people from Linux. Although customizing our own system, contravening the Microsoft "monopoly" and roughing-it-on-our-own were refreshing and challenging, this editor immediately fired up the Tribes: Vengence demo on Windows after the Linux testing and editing were complete. Total time to install and configure: 5 minutes, 40 seconds; now that was refreshing.
During publication of this review, we received some information from ATI about some upcoming Linux announcements which they are working on. We will keep you informed of the details as we hear them.