Original Link: http://www.anandtech.com/show/2758

When we first heard about the overclocking potential of the 4890 from AMD, we were a bit skeptical. At the same time, the numbers we were hearing were impressive and AMD doesn't have a history of talking up that sort of thing to us. There have already been some investigations around the web that do point to the 4890 as having some healthy overclocking potential, so we decided to try our hand at it and see what we could come up with.

We are testing review samples, which means that our parts may have more overclockability than off the shelf cards, but we can't attest to that at this point. What we do want to explore are the overclocking characteristics of the 4890 and how different adjustments may or may not affect performance. From what we are seeing around the web, many people are getting fairly close to the speeds we tested. Every part is different, but while clock speeds may vary, the general performance you can expect at any given point will not.

So what's so special about this AMD part that we are singling it out for overclocking anaysis? Well, the GPU has been massaged to allow for more headroom, some of which hasn't been exploited at stock clock speeds. This is the first time in a long time (or is it ever?) we are seeing multiple manufacturers bring out overclocked parts based on an AMD GPU at launch. With this as the flagship AMD GPU, we also want to see what kind of potential it has to compete with NVIDIA's top of the line GPU.

But it's more than just the chip. We also are also interested in how well the resources on the board are balanced. Core voltages and clock speeds must be selected along with framebuffer size and memory clock. These considerations must account for a target power, heat, noise and price. For high end parts, we see the emphasis on performance over other factors, but there will still be hard limits to work within.

Because of all this, balancing hardware specifications is very important. Memory bandwidth needs to be paired well with core speed in order to maximize performance. It doesn't do us as much good to have an infinitely fast core if we have slow memory that limits performance. We also aren't well served by really ridiculously fast memory if the core can't consume data quick enough. Using resources appropriately is key. And AMD did a good job balancing resources with the 4890.

Rather than just test the semi-official overclock (which is just a 50MHz core clock boost to 900MHz), we decided to test multiple core and memory overclocks (and one core + memory overclock) to better understand the performance characteristics of this beast. As expected, overclocking both core and memory saw the best results followed by only overclocking the core. Just boosting memory speed on its own didn't seem to have a significant impact on performance despite the large overclock that was possible.

So why not sell every chip at the "overclocked" speed? Well, it's all about yield. Our guess is that while the change that AMD made were certainly good enough to boost clock speed over the 4870 by a healthy margin that there were a good number of parts that couldn't be pushed up to 900MHz and AMD really didn't want to sell them as cheaper hardware. We haven't heard that endorsing the idea overclocked parts is really a policy change for AMD, so it might just be that previous layout, routing, and design choices provided for a narrower range of overclockability around the target clock frequency.

What ever the reason for it, we now have overclockable hardware from AMD. Our analysis starts with an in depth look at percent increase in performance, but if all you care about is raw performance data, we've got plenty of that in the second half. And with it comes a surprise in our conclusion we never expected.

Cranking GDDR5 All the Way Up

The first stop on our overclocking tour is in the memory subsystem. We will be increasing the memory clock frequency which reduces latency slightly and increases bandwidth significantly. The stock clock speed is 975MHz with 1ns devices (which means they are rated at 1GHz). AMD mentioned that signaling and interference (caused by the graphics hardware) are bigger problems with 1GHz GDDR5 than actually running the memory at that speed, which is why they went with the 25MHz lower clock speed.

Even with the 975MHz default clock speed, we already have a data rate of 3.9GHz. Which is pretty intense. We found in playing with ATI's built in overclocking tools (overdrive), we were able achieve stable performance at the maximum clock speed the driver allowed: 1200MHz. Doing the math gives us a massive 4.8GHz of data rate. This means, with a 256-bit wide bus, we're talking about almost 154 GB/s of bandwidth. This is more memory bandwidth than the NVIDIA GeForce GTX 280 and just a little less than the GTX 285 (which both use GDDR3 but on 512-bit busses).

So armed with 1.2GHz GDDR5, what can the 850MHz core of the Radeon HD 4890 accomplish now? Let's take a look at percent increase in performance per game when just increasing memory clock.

1680x1050    1920x1200    2560x1600

Apparently not that much more, even at 2560x1600.

Because our tests are not 100% deterministic, there is some variability in our results. Generally, this is very low, though it does vary from game to game and benchmark to benchmark. We have a hard time calling anything less than a 3% difference significant, as it could be due to fluctuations in the tests. These numbers may indicate some positive change in performance, but not one that would matter. At 2560x1600, only Call of Duty showed a performance improvement that mattered. And this is from a 225MHz overclock (just about a 23.1% increase in clock speed), which is pretty large.

There really isn't a huge need to delve into the raw numbers here, as they are just not that different. We'll hold off on that until it matters. Next up, we're going to look at increasing only the core clock speed.

Exploring Core Overclocking

Adjusting core clock speed has a much higher impact on performance than only adjusting memory speed. At stock clock speeds the 4890 is much more compute bound than memory bound, and this is where the difference comes in. While the 900MHz core clock variant will not offer huge performance gains over the stock card, the performance gains will be fairly proportional to the clock speed increase.

Despite the fact that a 50MHz bump only offers a maximum potential average performance improvement of about 6%, we often see realized performance gains of between 3% and 5% on 900MHz core clocked 4890 hardware. This is certainly a much better return than we saw even with a 23%+ memory overclock. Even so, 5% real world performance isn't the holy grail. So we decided to test multiple core clock frequencies ranging from 850MHz to 1000MHz in 50MHz increments. For these tests, we fixed memory clock speed at 975MHz.

Let's jump right in and talk about 1000MHz. Here's a look at what we get from this boost in clock speed.

1680x1050    1920x1200    2560x1600

In non-CPU limited situations, the approximately 10% to 13% performance improvement out of a potential 17.6% improvement is nothing to sneeze at. Here's the break down of percent increase in performance at the different clock speeds we tested across three resolutions in all the games we tested.

At each speed bump we a pretty good proportional performance improvement. We are closer to the theoretical max at the more modest clock speed increases than at the high end though. This could potentially mean that our core clock speed increases are creating memory bottlenecks. It is clear that even without any potential boost from an accompanying memory overclock, the 4890 is potentially capable of some impressive clock speeds and performance. Despite the fact that we want to be thorough, we can't test all of these core clock speeds with multiple different memory clock speeds, as the testing would quickly balloon. So we compromised a bit, but the results on the next page speak for themselves.

We absolutely must caution our readers once again that these are not off-the-shelf retail parts. These are parts sent directly to us from manufacturers and could very likely have a higher overclocking potential than retail parts. From what we are hearing in the field, though, many people have been able to achieve a decent boost in clock speed with the 4890.

Age of Conan Core Scaling

1680x1050    1920x1200    2560x1600

Call of Duty World At War Core Scaling

1680x1050    1920x1200    2560x1600

Crysis Warhead Core Scaling

1680x1050    1920x1200    2560x1600

Fallout 3 Core Scaling

1680x1050    1920x1200    2560x1600

FarCry 2 Core Scaling

1680x1050    1920x1200    2560x1600

Left 4 Dead Core Scaling

1680x1050    1920x1200    2560x1600

Race Driver GRID Core Scaling

1680x1050    1920x1200    2560x1600

Combined Memory and Core Overclocking: The Sweet Spot

In this round of tests, we combine our previous maximum overclocks. This is our compromise, in that we show the maximum potential of combined core and memory overclocking rather than effects of memory overclocking over each core clock speed we tested. While the latter option would be more complete, our tests do enough to show people what they need to know to find the sweet spot.

We theorized that with an extreme core clock speed that memory may have become a bottleneck to performance at some point. Despite the fact that increasing memory clock without increasing core clock didn't do much at all, we could see increased benefit beyond what one might expect based on our initial memory overclocking results.

Before we looked at varied memory clock with a stock core clock and varied core clock with a stock memory clock. Let's revisit both of those but also add in a twist. We will also look at percent increase in performance when overclocking memory with a 1GHz core clock and the percent increase in performance when overclocking the core with a 1.2GHz memory clock.

1680x1050    1920x1200    2560x1600

To get a basic idea of what's going on, here's an example of two programs. Remember that this isn't really real world and is just to illustrate the concept.

The first application is completely compute bound and the second is 50% compute bound and 50% memory bandwidth bound. Both tests generate 100 frames per second on a stock Radeon HD 4890. If we increase core clock speed 10%, the first application will generate 110 frames per second, while the second one would only generate 105. This is because we only see the 10% benefit while doing half of the work. If we look at only boosting memory performance 10%, the first program delivers only 100 fps while the second hits 105 again. Pushing both memory and core clock speed up 10% each gives us 110 frames per second from both applications. Basically.

Nothing is really that contrived or works like that, but the important thing to remember is that different applications can make varying use of different resources, and balancing those resources is important to ensuring the best performance in the most efficient package.

So, to find the sweet spot for your overclock, you will want to increase core clock speed as much as you can. Then bump up memory clock and see how high you can get it and remain stable. Use a real world application to test performance at each point and then use a binary search like algorithm to find the sweet spot in a short number of tests. And there you have it. We didn't do this for you, but what's better practice than a little hands on experience right? Besides, if gives readers the opportunity to compare notes in the comments on what the optimal memory clock for a 1GHz core clock on the 4890 would be. Have fun!

Age of Conan Performance

1680x1050    1920x1200    2560x1600

Call of Duty World at War Performance

1680x1050    1920x1200    2560x1600

Crysis Warhead Performance

1680x1050    1920x1200    2560x1600

Fallout 3 Performance

1680x1050    1920x1200    2560x1600

FarCry 2 Performance

1680x1050    1920x1200    2560x1600

Left 4 Dead Performance

1680x1050    1920x1200    2560x1600

Race Driver GRID Performance

1680x1050    1920x1200    2560x1600

Power Consumption

Yes, overclocking consumes more power. But interestingly enough, even pushed all the way to the top of what AMD allows in their own driver settings, the overclocked 4890 draws just a little less power than the GTX 285. Which is not bad at all considering the type of performance we can get out of this setup.

Idle Power

Load Power

Final Words

All but two.

That's how many benchmarks in which our 1GHz/1.2GHz (core/mem) Radeon HD 4890 lead the stock NVIDIA GeForce GTX 285. That's nothing to sneeze at. Certainly it doesn't mean that the 4890 is faster or better than the GTX 285, especially because the GTX 285 can be overclocked as well to improve performance. What this does mean is that for about $100 less we have the potential to achieve the stock performance of NVIDIA's flagship single GPU part with a highly overclocked AMD GPU. From an end user value perspective, that extra $100 is there to ensure you get at least the performance of the GTX 285 along with any potential overclocking benefits you might have from the higher end part. There is still reason to buy the GTX 285 if you need even more power. But this is quite intriguing from an architectural perspective.

These tests show that there is the potential for a 959 Million transistor AMD GPU to consistently outperform a 1.4 Billion transistor NVIDIA GPU in the same power envelope at 55nm with similar memory bandwidth.

Yields and business being what they are, it doesn't make sense for AMD to push out a part at the extreme clock speeds we tested. But from an engineering standpoint, even with the smaller die, less is more, multiGPU at the top end strategy, AMD has built a part that can (when overclocked) best the stock performance of top of the line NVIDIA hardware designed to pack as much power into a single GPU as possible.

And that seems pretty significant.

At the same time, while we don't have any solid standardized OpenCL tests to run as of yet, it appears from some limited applications like folding@home and others that NVIDIA's approach may be better suited to GPU computing or more general purpose or flexible applications beyond gaming. We can't really confirm this theory yet, as there isn't a wide enough range of GPU computing applications, but it might not be that NVIDIA has been pushing CUDA so hard because they know it to be an advantage, not just in terms of software support and a feature check box, but in terms of a fundamental performance or architectural edge for these algorithms. The architectural path NVIDIA has chosen may well prove useful when DX11 hits and we see a further push away from DX9 towards really deep programmability and flexibility. Only time will tell on that front, though.

In the meantime, NVIDIA's margins are much tighter on their larger GPUs and now their single GPU performance advantage has started to erode. It seems the wonders of the RV7xx series have yet to exhaust themselves. Competition is indeed a wonderful thing, and we can't wait to see what comes out of the upcoming DX11 hardware battle.

For now, at resolutions below 2560x1600, the Radeon HD 4890 has the advantage. At 2560x1600, the lines become a little more blurry. For stock hardware the GTX 285 is still the fastest thing around in most cases. But if you want to take your chances with overclocking, 30" gaming on a single AMD GPU just got a lot more potentially attractive.

Log in

Don't have an account? Sign up now