Original Link: http://www.anandtech.com/show/2175
Performance Scaling with OCZ's 8800 GTXby Derek Wilson on February 16, 2007 11:00 AM EST
- Posted in
OCZ's major contribution to the technological world has been in the form of memory and cooling. A short time back, power supplies were added to their repertoire, and we have been duly impressed with their quality and versatility. OCZ have established themselves as a company that makes enthusiast parts for people who care about performance. However their latest scion is in the form of high end graphics cards and it seems OCZ is intent on making its mark in this field as well.
Although we did a round up of 8800's some time ago, there have been several more recent cards hitting the market. As they are released, we intend to keep reviews of card comparisons up to date. Today, we implement this promise with our analysis of the OCZ GeForce 8800 GTX.
These days, companies are coming out with faster parts by producing cards that have faster memory, core, and shader clock speeds. Gamers who want even higher performance and don't mind spending a little extra money can get a card that's overclocked out of the box. This saves them the trouble of doing it themselves, with the added protection of a warranty, as well. Until we're able to round up all of these cards we want to establish how overclocking effects performance by increasing memory, core, and shader clock speeds. Is there a significant benefit for the price?
The OCZ GeForce 8800 GTX and The Test
Aside from the sticker on the face of the card, there's really no difference between the OCZ GeForce 8800 GTX and all of the other 8800 GTX cards based on NVIDIA's design. We can expect similar heat and noise characteristics because this card uses the same fan and heat sink. Since the clock speeds are also set the same, performance analysis will produce similar results. The difference between the OCZ GeForce 8800 GTX and the competition is that OCZ has paid special attention to the overclockability of their hardware. They've got a lot of experience in providing memory solutions that are easily overclockable. Hopefully this will translate into their graphics hardware as well.
In spite of its similarities to the other cards we've seen a thousand times, here's a look at the OCZ GeForce 8800 GTX. Note the sticker.
For our performance testing we used the same system we've employed in our past few articles. We'll be comparing the performance of the OCZ GeForce 8800 GTX to the data we saw in the first round up using the games F.E.A.R. and Oblivion.
|System Test Configuration|
|CPU:||Intel Core 2 Extreme X6800 (2.93GHz/4MB)|
|Motherboard:||EVGA nForce 680i SLI|
|Chipset:||NVIDIA nForce 680i SLI|
|Chipset Drivers:||NVIDIA nForce 9.35|
|Hard Disk:||Seagate 7200.7 160GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2)|
|Video Drivers:||ATI Catalyst 7.1
NVIDIA ForceWare 93.71 (G7x)
NVIDIA ForceWare 97.92 (G80)
|Desktop Resolution:||2560 x 1600 - 32-bit @ 60Hz|
|OS:||Windows XP Professional SP2|
NVIDIA provides software, nTune, which allows users to increase the clock speed of the memory and the GPU. This functionality used to be included in their driver but it's no significant inconvenience to have it provided as a separate download. We're still pleased with NVIDIA's support of user overclocking in general. For the 8800 series, we're still waiting on support for shader overclocking, as NVIDIA has made promises but has yet to deliver.
We were able to use nTune to find optimal core and memory clock speeds but in order to perform shader overclocking we had to rely on a tool called NiBiTor, an NVIDIA BIOS editor maintained by mvktech.net. Combined with a flash utility, we were able to download the BIOS from our video card, modify the shader clock speed, and flash it with our new settings. While we can't be sure that NVIDIA's driver doesn't alter this in any way while the card is running, if you buy a card with a higher than stock shader clock speed, this is what you'll get.
The first thing we did was find a maximum stable memory clock speed. We did this by increasing memory speed and running the fuzzy cube test within ATI Tool for about 10 minutes. Once we found the highest stable memory clock speed, we used this for all our tests in order to reduce the impact of memory on overall game performance. This also serves to reduce the number of combinations of different clock speeds we would have to test.
Maximum core and shader clock speeds were found independently and tested for stability using ATI Tool. The highest stable core clock we were able to run on our OCZ hardware is 640 MHz, while we found that 1600 MHz was about as high as we could go on the shader side. We test core clock scaling with memory fixed at 1020 MHz (2040 MHz effective data rate), and shader clock fixed at 1350 MHz (stock 8800 GTX speed). Our shader clock scaling tests are performed with the same memory speed at two different core clock speeds: 575 MHz, and 640 MHz.
To put this in perspective, here's how our OCZ overclocking compares to the other cards we've already tested.
|Card||Core Overclock||Memory Overclock|
|ASUS GeForce EN8800 GTX||629MHz||1021MHz|
|BFG GeForce 8800 GTX||649MHz||973MHz|
|EVGA e-GeForce 8800 GTX w/ ACS3||659MHz||1013MHz|
|Leadtek Winfast GeForce 8800 GTX||627MHz||1033MHz|
|MSI GeForce NX8800 GTX||652MHz||1040MHz|
|OCZ GeForce 8800 GTX||640MHz||1020MHz|
|Sparkle Calibre 8800 GTX||631MHz||914MHz|
|Sparkle GeForce 8800 GTX||629MHz||1011MHz|
|XFX GeForce 8800 GTS||654MHz||866MHz|
While OCZ doesn't have the highest showing in either core or memory clock speed department, they're position is quite high on both lists. The combination of a respectable memory and core clock speed is a good thing to hope for when buying a video card to overclock.
It is important to note here that every single card will overclock differently, even if the company has binned their parts to achieve high clock speeds. Your mileage may, of course, vary.
GeForce 8800 GTX Core Clock Scaling
As we previously mentioned, in order to test core clock scaling, we fixed the memory clock at 1020 MHz and the shader clock at 1350 MHz. Our tests were performed at different clock speeds, and we will report on performance vs. clock speed.
One of the issues with games that don't make use of a timedemo that renders exactly the same thing every time is consistency. Both Oblivion and F.E.A.R. can vary in their results as the action that takes place in each benchmark is never exactly the same twice. These differences are normally minimized in our testing by using multiple runs. Unfortunately, with the detail we wanted to use to look at performance, normal variance was playing havoc with our graphs. For this, we devised a solution.
Our formula for determining average framerate at each clock speed is as follows:
FinalResult = MAX(AvgFPS.run1, AvgFPS.run2, ... , AvgFPS.run5, PreviousClockSpeedResult)
What this means is that we don't see normal fluctuation that would cause a higher clock speed to yield a lower average FPS while still maintaining a good deal of accuracy. Normally, our tests have a plus or minus 3 to 5 percent variability. Due to the number of samples we've taken and the fact that previous test results are used, deviation is cut down quite a bit. In actual gameplay, there will be much more fluctuation here.
As far as settings go, Oblivion is running with maximum quality settings, the original texture pack, no anisotropic filtering and no antialiasing. We have chosen 1920x1440 in an attempt to test a highly compute limited resolution without taxing memory as much as something like 2560x1600 would. For F.E.A.R., we are also using 1920x1440. All the quality settings are on maximum with the exception of soft shadows, which we leave disabled.
While the maximum stable clock speed of our OCZ card is 640, we were able to squeak out a couple benchmarks at higher frequencies if we powered down for a while between runs. It seems like aftermarket cooling could help our card maintain higher clock speeds, but we'll have to save that test for another article.
It's clear that there are some granularity issues in setting the core clock frequency of the 8800 GTX. Certainly, this doesn't seem to be nearly the problem we saw with the 7800 GTX, but it is still something to be aware of. In general, we see performance improvements similar to the percent increase of clock speed. This indicates that core clock speed has a fairly direct effect on performance.
The F.E.A.R. graph looks a little more disturbingly discrete, but it is important to note that the built in benchmark only provides whole numbers and not real average framerates. Again, for the most part, performance increases keep up with the percent increase in clock speed. This becomes decreasingly true at the higher end of the spectrum, and this could indicate that extreme overclocking of 8800 GTX cards will have diminishing returns over 660 MHz. Unfortunately, we don't have a card that can handle running at higher speeds at the moment.
GeForce 8800 GTX Shader Clock Scaling
For shader clock scaling, we've performed two sets of tests. In one test, we've left the core clock at the stock 575 MHz, and in the other we are running 640 MHz. In both cases, memory speed is fixed at 1020 MHz. Not only do we get a sense of what to expect when overclocking both shader and core clock at the same time, but we can get a better idea of how shader clock impacts performance in the 8800 GTX.
Our test scenarios are the same as in our core clock speed scaling tests. Shader clock is adjusted in 50 MHz increments from 1200 to 1600. Unfortunately, the combination of 1600 MHz shader and 640 MHz core clock just wasn't stable enough to complete any of our tests, but we still have enough data to understand more about what's going on.
Just looking at the graph tells us that increasing shader clock does have an impact on performance. Digging a bit deeper is unfortunately a little disappointing. If we look at the percent improvement in performance from stock to maximum, we see only about a 5.2% increase. The percent increase in shader clock speed is actually 18.5% from stock to maximum. Shader clock speed has much less impact on Oblivion than core clock speed.
F.E.A.R. does respond a little better to shader overclocking than Oblivion, but even at 8.3% improvement at 1600 MHz, F.E.A.R. performance doesn't even improve at half the rate shader clock speed is increased. Like Oblivion, F.E.A.R. benefits much more from increasing core clock speed.
When we run our overclocked core speed with variable shader clocks, we do get higher performance in general. As far as scaling goes, at the 1550 MHz shader mark, which is a 14.8% increase in shader frequency, we only net a 3.9% performance increase. Yes, total performance is higher than just overclocking either the core or the shader, but it is very clear which has the greater impact on Oblivion performance.
Once again, we see a similar situation in F.E.A.R. with performance improving only slightly as shader clock is increased with a 640 MHz core clock. This time, though, we don't see any better performance than with just the core overclocked at stock shader clock speeds.
These tests are slightly different than the ones used in our scaling data. For Oblivion, 16xAF was enabled through the control panel, and F.E.A.R. was run with 4xAA turned on. This means that the numbers we see here will be a little bit lower than in the tests we have run so far. First up is Oblivion.
Clearly having the fourth highest core clock and essentially tying for the third highest memory clock works out well for OCZ here. With a balance of core and memory speed, OCZ achieves the third highest performance in our Oblivion test.
For F.E.A.R. we see more of the same. Our OCZ sample is a top competitor among cards we've overclocked coming in third in each of our performance tests. The balance of a high core and memory clock speed provides a good platform for achieving high performance under both Oblivion and F.E.A.R.
Power, Heat, and Noise
We normally prefer to retest power heat and noise for every article. The problem with comparing tests from different times and places is that they are very dependent on the environment in which the test is performed. Even the time of day can affect the outcome. As we don't have all the cards we originally tested, this time we had to improvise.
The MSI NX8800 GTX and the XFX GeForce 8800 GTS were retested in our current environment, along with the OCZ GeForce 8800 GTX. In looking at the percent differences between the old and new data for the MSI and XFX parts, we were able to determine how to scale the OCZ data to fit in with the rest of our numbers.
Some of our data went through multiple conversions to different units to perform the calculations in order to maintain as much accuracy as possible. To determine relationships in temperature, Kelvin was used, while SPL data was converted from dB to pressure in pascals (assuming 20 micropascals as the reference pressure). After scaling our converted data for the OCZ hardware, numbers were converted back to Celsius and dB.
Please keep in mind that this is not as reliable as retesting all of our parts, but it does give us a good idea of where things fall. Add to that the fact that the OCZ 8800 GTX is based on the reference design, and we can be fairly certain that our numbers check out as they make sense compared to the rest of the hardware based on the same design.
Our power numbers put the OCZ 8800 GTX where we would expect a reference based 8800 GTX to fall. Load power is a little higher than most of the other cards, but one Watt really isn't going to make much difference.
There isn't really anything spectacular to report about our temperature data. As we would expect, the OCZ hardware falls in line with the rest.
Within 1 dB, there is no chance that any human can hear an audible difference between most of these 8800 parts.
There really is quite a lot of data to chew through, but it is difficult to really draw solid conclusions from it. Let's start by summing up what we know.
First, core clock impacts performance very heavily in TES4: Oblivion and F.E.A.R. We know this because the percent performance improves at a given clock speed is the nearly identical to the percent clock speed increases. We also know that the same is not true for shader clock increases under both Oblivion and F.E.A.R.
We can speculate quite a bit about this, but we would need more information to take it very far. For instance, because core clock speed affects performance more than shader clock, we can assume that the functions controlled by core clock speed are more important to performance in these games than pure shader processing. Core clock controls the input assembler, vertex, geometry, and fragment thread scheduling, triangle setup, rasterization, texture address and filtering, and final pixel output through the ROP. At best, we can say that the games we tested benefit from improved performance of these subsystems more than improved pixel and vertex shader processing speed.
This begs the question, why do we see better performance from improvement of aspects other than pixel and vertex shader processing. The answer could be that the games we chose are heavily texture or fill rate limited, it could be that the overhead of DX9 requires more processing on the GPU outside the shader hardware and is preventing increases in shader clock speed from really mattering, or it might be that the hardware isn't able to schedule threads fast enough to keep the shader hardware busy.
If we are texture or fill rate limited, future games that make heavier use of SM3.0 and, further down the road, DX10, will start to see more of a benefit from improved shader clock speed. If DX9 overhead is getting in the way, we might only see shader clock matter under Vista. The worst case scenario would be if the hardware isn't able to keep the shaders scheduled without help. Of course, this is also the least likely case.
While clock scaling is fun and exciting to talk about, we do have the matter of our OCZ hardware to attend to. While there isn't anything special about it on the outside, we were able to achieve very good performance through a balance of memory and core overclocking. While we only tested one sample, OCZ's selling point seems to be their card's overclockability as enabled by their chip and memory module selection. If they are true to their word and we can expect similar performance from every part they sell, these parts could fit quite nicely into anyone's overclocking plans.
At the same time, it is tough to make a recommendation based on such an untested claim, especially when our only hardware came straight from OCZ. Time will have to tell whether or not OCZ's marketing does right by its customers, as we won't be going out and testing a random sample of retail OCZ 8800 GTX cards.
For those interested in overclocked parts, it is usually a better idea to go with factory modified hardware. The only clock speeds that are guaranteed to run are the ones you get out of the box. But for those willing to gamble, OCZ's entry into the 8800 arena looks at least as good as the other cards we've tested. In the least, our tests did not prove their marketing wrong.