Original Link: http://www.anandtech.com/show/2492
GeForce 9800 GTX and 3-way SLI: May the nForce Be With Youby Derek Wilson on April 1, 2008 9:00 AM EST
- Posted in
Yes, NVIDIA leads the way in performance. They own the fastest single GPU card, the fastest multiGPU single card, and the fastest multi card configurations. People who want the best of the best do pay a premium for the privilege, but that isn't something everyone is comfortable with. Most of us would much rather see a high end card that doesn't totally depart from sanity in terms of actual value gained through the purchase. Is the 9800 GTX that solution? That's what we are here to find out.
We've gotten a lot of feedback lately about our test system. Yes, at the very high end we haven't seen what we would have expected if all things were equal between all platforms. But the fact is that making a single platform work for apples to apples comparisons between CrossFire and SLI is worth it. With this review, we aren't quite there, as we just uncovered a HUGE issue that has been holding us back from higher performance with our high end hardware. We do have some numbers showing what's going on, but we just didn't have time to rerun all of our hardware after we discovered the solution to the issue. But we'll get to that shortly.
The major questions we will want to answer with this review are mostly about value. This card isn't a new architecture and it isn't really faster than other single card single GPU solutions. But the price point does make a difference here. At about $400, AMD's Radeon 3870X2 will be a key comparison point to this new $300 part. With the 8800 Ultra and GTX officially leaving the scene, the 9800 GX2 and 9800 GTX are the new top two in terms of high end hardware at NVIDIA. The price gap between these two is very large (the 9800 GX2 costs about twice as much as a stock clocked 9800 GTX) and the 3870X2 falls right in between them. Does this favor AMD or NVIDIA in terms of value? Does either company need to adjust their price point?
Things are rarely straightforward in the graphics world, and with the crazy price points and multi-GPU solutions that recently burst on to the scene, we’ve got a lot of stuff to try and make sense out of. Let us take you through the looking glass...
The 9800 GTX and EVGA’s Cards
The 9800 GTX is a 128 shader, G92 based card (yes, another one) that comes in at 675MHz core, 1.69GHz shader clock, and 2.2GHz (effective) memory clock. This puts the raw power of the card up over the 8800 Ultra, but there is one major drawback to this high end part: it only has a 256-bit memory bus hooked up to 512MB of RAM.
The added memory might not come into play a lot, but the fact that the 8800 Ultra has essentially 50% more effective memory bandwidth does put it at an advantage in memory performance limited situations. This means there is potential for performance loss at high resolutions, high levels of AA, or in games with memory intensive effects. While we get that $300 US puts this card in a different class than the 8800 Ultra, and thus NVIDIA is targeting a different type of user, we would have liked to see a card with more bandwidth and more memory (especially when we look at the drop off in performance between Crysis at 19x12 and 25x16).
9800 GTX cards are capable of 3-way SLI with the two SLI connectors on the top. Of course, NVIDIA requires that we use an NVIDIA motherboard for this purpose. We are not fans of artificial technical limitations based on marketing needs and would much prefer to see SLI run on any platform that enables multiple PCIe x16 slots. With normal SLI, we do have the Skulltrail option, but NVIDIA has chosen not to enable 3-way capability on this board either.
We wanted to be able to include 3-way SLI numbers in our launch review (which has been one incredible headache, but more on that later), and EVGA was kind enough to help us out by providing the hardware. We certainly appreciate them enabling us to bring you numbers for this configuration today.
We were also able to get our hands on a C0 engineering sample 790i board for testing. Let’s just say that the experience was … character building. Running a QX9770 and 1333Mhz DDR3 at 9:9:9:24, we had what could best be described as a very rough time getting 3-way and even quad SLI with two 9800 GX2 boards to work in this system. We were lucky to get the numbers we did get. Let’s take a look at what we tested with
Once again we used the Skulltrail system for most of our comparisons. We’ve added on the 790i board for 3-way SLI performance scaling tests.
I didn’t believe I would be saying this so soon, but our experience with 790i and SLI has been much much worse than on Skulltrail. We were plagued by power failure after power failure. With three 9800 GTX cards plugged in, the system never got up over 400 W when booting into windows, but after a few minutes the power would just flicker and cut out.
It didn’t make sense that it was the PSU size, because it wasn’t even being loaded. We did try augmenting the PSU with a second one to run one of the cards, but that didn’t work out either. The story is really long and arduous and for some reason involved the Power of the Dark Side, but our solution (after much effort) was to use one power supply for the system and graphics cards and one power supply for the drives and fans. Each PSU needed to be plugged into its own surge protector and needed to be on different breakers.
The working theory is that power here isn’t very clean, and the 790i board is more sensitive to fluctuations in the quality of the power supplied (which is certainly affected by the AC source). Isolating breakers and using surge protectors was the best we could do, and we are very thankful it worked out. It seems likely that a good quality 1000-1500 VA UPS would have been enough to provide cleaner power and solve the issue, but we didn’t have one to test with.
Once we handled this we were mostly able to benchmark. We could get a good 15 minutes of up time out of the system, but after repeated benchmarking instability crept back in and we’d need to wait a while before we tried again. The majority of these problems were on 3-way and Quad SLI, but we did have a hiccup with a two card SLI configuration as well. We didn’t have any trouble at all with single card solutions (even single 9800 GX2 solutions).
Before anyone says heat, we were testing in an open air environment in a room with an ambient temp of about 15 degrees C, with one 120mm fan blowing straight into the back of the GPUs and another blowing through the memory (we did take care not to interfere with the CPU HSF airflow as well). The graphics cards did get warm, but if heat was the issue here, I’d better get a bath of LN2 to run this thing submerged in ready.
It is very important that we note one more time that this is the C0 engineering sample stepping and that NVIDIA explicitly told us that stability might be an issue in some situations. The retail C1 stepping should not have these issues.
Here’s our test setup:
|CPU||2x Intel Core 2 Extreme
QX9775 @ 3.20GHz
|Motherboard||Intel D5400XS (Skulltrail)|
|Video Cards||ATI Radeon HD 3870
NVIDIA GeForce 8800 Ultra
NVIDIA GeForce 9800 GTX
NVIDIA GeForce 9800 GX2
|Video Drivers||Catalyst 8.3
|Hard Drive||Seagate 7200.9 120GB 8MB 7200RPM|
|RAM||2xMicron 2GB FB-DIMM DDR2-8800|
|Operating System||Windows Vista Ultimate
Crysis, DX10 and Forcing VSYNC Off in the Driver
Why do we keep on coming back to Crysis as a key focal point for our reviews? Honestly because it’s the only thing out there that requires the ultra high end hardware enabled by recently released hardware.
That and we’ve discovered something very interesting this time around.
We noted that some of our earlier DX10 performance numbers on Skulltrail looked better than anything we could get more recently. In general, the higher number is usually more likely to be "right", and it has been a frustrating journey trying to hunt down the issues that lead to our current situation.
Many reinstalls and configuration tweaks later and we’ve got an answer.
Every time I set up a system, because I want to ensure maximum performance, the first thing I do is force VSYNC off in the driver. I also generally run without having the graphics card scale output for my panel; centered timings allow me to see what resolution is currently running without having to check. But I was in a hurry on Sunday and I must have forgotten to check the driver after I set up an 8800 Ultra SLI for testing Crysis.
Low and behold, when I looked at the numbers, I saw a huge performance increase. No, it couldn’t be that VSYNC was simply not forced off in the driver could it? After all, Crysis has a setting for VSYNC and it was explicitly disabled; the driver setting shouldn’t matter.
But it does.
Forcing VSYNC off in the driver can decrease performance by 25% under the DX10 applications we tested. We see a heavier impact in CPU limited situations. Interestingly enough, as we discussed last week, with our high end hardware, Crysis and World in Conflict were heavily CPU and system limited. Take a look for yourself at the type of performance gains we saw from disabling VSYNC. These tests were run on Crysis using the GeForce 9800 GX2 in Quad SLI.
We would have tried overclocking the 790i system as well if we could have done so and maintained stability.
In looking at these numbers, we can see some of the major issues we had between NVIDIA platforms and Skulltrail diminish. There is still a difference, but 790i does have PCIe 2.0 bandwidth between its cards and it uses DDR3 rather than FB-DIMMS. We won’t be able to change those things, but right now my option is to run half the slots with 800 MHz FB-DIMMS or all four slots with 667 MHz. We should be getting a handful of higher speed lower latency FB-DIMMS in for testing soon which will we believe will help. Now that we’ve gotten a better feel for the system, we also plan on trying some bus overclocking to help alleviate the PCIe 2.0 bandwidth advantage 790i has. It also seems possible to push our CPUs up over 4GHz with air cooling, but we really need a larger PSU to keep the system stable (without any graphics going, a full CPU load can pull about 700W at the wall when running 4GHz at 1.5v) especially when you start running a GPU on top of that.
Slower GPUs will benefit less from not forcing VSYNC off in the driver, but even if the framerate is near a CPU limit (say, within 20%) performance will improve. NVIDIA seems more impacted by this than AMD, but we aren’t sure at this point whether that is because NVIDIA’s cards expose more of a CPU limit due to their higher performance.
NVIDIA is aware of the VSYNC problem, as they were able to confirm our findings yesterday.
Being that we review hardware, it is conceivable that this issue might not affect most gamers. Many people like VSYNC on, and since most games allow for the option it isn’t usually necessary or beneficial to force VSYNC off in the driver. So we decided to ask in our video forum just how many people force VSYNC off in the driver, and whether they do so always or just some of the time.
More than half our 89 respondents (at the time of this writing) never force VSYNC off, but 40% of the remaining respondents admitted to forcing VSYNC off at some point, half of these always forcing VSYNC off (just as we do in our testing).
This is a big deal, especially for members who want to play Crysis and have lower end CPUs. We didn’t have time to rerun all of our numbers without VSYNC forced off in the driver, so keep in mind that these numbers could benefit a lot by doing so.
Scaling and Performance with 3-way SLI
As we’ve explained, we had a great number of issues in testing 3-way SLI and Quad SLI on our 790i board. We couldn’t even get 8800 Ultra Tri SLI to work, as it draws so much power in addition to being finicky in the first place. We were able to get some numbers run on Crysis and Oblivion.
Here is a look at performance scaling with both; we’ll look at comparative performance further below.
These numbers were run on the 790i system and we absolutely did leave VSYNC on its default setting. Performance differences between one, two, and three 9800 GTX cards were more compressed when we force VSYNC off in the control panel.
Here is how Crysis stacks up in a direct comparison with the major competition (except for the 8800 Ultra configuration which we could not run).
9800 GTX 3-way is absolutely playable at 1920x1200 with Crysis when using Very High settings. Clearly Quad SLI leads the way here, but for $300 less, that’s not a bad deal if what you want to do is play Crysis at 1920x1200.
The delta between Tri and Quad is lower here. In both cases, two 9800 GTX cards outperform a single 9800 GX2. While the 9800 GX2 can be plugged into any system, NVIDIA still wants to sell 790i platforms. If you’ve got or want an NVIDIA based platform, you get higher performance for the exact same price by going with two 9800 GTX cards over a single 9800 GX2.
With the hassle of huge power supplies, cooling, etc. associated with Tri and Quad SLI, our money for maintaining value with a high end solution would have to fall to the 9800GTX SLI set up. 3-way seems to have some problems at the moment as well, as we ran into one large issue in one of the only two games we tested. Oblivion has some graphical issues that we document here on YouTube.
Once Again, The Rest
For those of you who want the rest of the data, here it is. We ran the same list of games from our previous article on the 9800 GX2 Quad SLI. This time we do have the 8800 Ultra SLI numbers in the mix as well to round out the comparison.
So, now that we have the 9800 GTX in the mix, what has changed? Honestly, not as much in terms of performance stack as in price. Yes, the 8800 Ultra is better than the 9800 GTX where memory bandwidth is a factor, but other than that the relationship of the 9800 GTX to the 3870X2 is largely the same. Of course, NVIDIA would never sell 8800 Ultra below the 3870X2 price of $400 (the binned 90nm G80 glued on there didn’t come cheap).
The smaller die size of the G92 based 9800 GTX takes away one victory AMD had over NVIDIA: the more expensive 8800 Ultra was slower than AMD’s top of the line. Without significantly improving (and sometimes hurting) performance over the 8800 Ultra (because they didn’t really need to with the 9800 GX2 in their pocket), NVIDIA has brought more competition to AMD’s lineup, which is definitely not something they will be happy about.
It is nice to have this card come in at the $300 price point with decent performance, but the most exciting thing about it is the fact that picking up two of them will give you better performance than a single 9800 GX2 for the same amount of money. Two of them can even start to get by in Crysis with Very High settings (though it might offer a better experience with one or two features turned down a bit).
While our very limited and rocky experience with 3-way SLI may have been tainted by the engineering sample board we used, the fact that we can get near 9800 GX2 Quad SLI performance for 3/4 of the costs is definitely a good thing. The fact this set up MUST be run in an nForce board is a drawback, as we would love to test in a system that can run every configuration under the sun. We’re getting closer with Skulltrail, and we aren’t missing the fact that there are concerns among our readers over its use. But we’re confident that we can push performance up and turn it into our workhorse for graphics, especially now that the VSYNC issue has been cleared up.
While testing this group of cards has been difficult with all the problems we experienced, we are very happy to have a solid explanation for what was causing our decreased performance we were seeing. Now all we need is an explanation for why forcing VSYNC off in the driver causes such a huge performance hit.