Compute Performance

Shifting gears, as always our final set of performance benchmarks is a look at compute performance. As we saw with the launch of the GTX 680, Kepler (GK104) just doesn’t do very well here, thanks in part to NVIDIA stripping out a fair bit of compute hardware and memory bandwidth on GK104 in order to focus on gaming performance. OpenCL performance is particularly bad with NVIDIA almost completely ignoring it, but even DirectCompute performance often swings AMD’s way. This isn’t to say that GK104 doesn’t have its moments, but when it comes to compute it’s typically AMD’s time to shine.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

The 7970 already had a significant lead in this benchmark thanks to AMD’s work on improving their DirectCompute performance, and the 7970GE extends it further. The most important factor of course is actual game performance – where the 7970GE and GTX 680 are tied – but this is clear software evidence of what we already know in hardware: that the 7970GE is far more potent at compute than the GTX 680 is.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Being an OpenCL title that NVIDIA isn’t taking any care to optimize for, the 7970GE simply blows the GTX 680 out of the water. It’s not even a contest here. Only one card family is even worth consideration for use here. However it’s interesting to note that the 7970GE’s performance improvement over the 7970 is a bit below average, with the 7970GE only picking up 6%. SLG does stress memory bandwidth and compute performance, but in all likelihood the 7970GE isn’t boosting as much here as it is under our gaming tests. Once AMD starts exposing real clockspeeds we’ll need to revisit this assumption.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

While the 7970GE does improve upon the 7970’s already strong performance, we’re clearly reaching the point where the relatively long CPU/GPU transfer times over PCIe are taking their toll, explaining why the 7970GE could only shave off 5ms. This is actually an important point to make and is why APUs are so important to AMD’s GPU computing plans, but it also means that at a certain speed GPU performance ceases to matter.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

In this final compute shader benchmark NVIDIA’s performance is actually quite respectable, leading to them besting the 7970. However the 7970GE provides just enough of a performance boost to push AMD ahead of NVIDIA here, giving AMD a solid majority of our standard compute benchmarks. Even when Kepler is faced with a favorable workload, it looks like GCN based 7970GE is capable of taking NVIDIA head-on.

Finally, we received a number of requests for some further compute benchmarking using some of the consumer programs AMD provided the press with for the Trinity launch. In particular WinZip and handbrake were requested, so we’ve gone ahead and run those benchmarks for this review.

Starting with WinZip, WinZip 16.5 introduced OpenCL acceleration of both compression and AES achieve encryption. Despite being accelerated via OpenCL WinZip only supports AMD devices, presumably because only AMD provided technical assistance. As a result we’re looking solely at pure CPU performance and GPU accelerated performance across AMD’s lineup.

One thing immediately sticks out: WinZip isn’t very sensitive to GPU performance. Merely having a GPU increases performance rather significantly, but it doesn’t matter if it’s a fast GCN card or a GCN card at all for that matter, as even the VLIW4 based 6970 returns the same times. In fact AMD’s drivers report almost no GPU load, so it’s questionable how much of this is actually being run on the GPU versus being run on the CPU through AMD’s OpenCL CPU driver.

As for Handbrake, AMD sent along a newer version that works with discrete GPUs. AMD notes that this is still very much a work in progress, which we saw first-hand when OpenCL acceleration failed to handle two of our three test clips. It failed to properly crop one video, and failed to properly detelecine another. Handbrake’s OpenCL acceleration will of course continue to improve as it approaches release, but for the time being it’s definitely a beta.

Much like WinZip, Handbrake doesn’t appear to be particularly GPU performance sensitive, which doesn’t come as much of a surprise. Large parts of the H.264 encoding process are ill suited for GPU acceleration, so X.264 is only offloading part of the process and the deciding factor is still CPU performance. The actual GPU load is very inconsistent, but generally tops out at around 40% usage.

The end result is nothing to sneeze at however. Whereas Handbrake averaged 25.6fps without GPU acceleration, with it performance increases by 24% to around 32fps. And unlike other GPU compute accelerated encoders the quality here is very consistent between the CPU and GPU paths (though GPU file size tends to be a bit larger), which means we’re retaining the same quality and customizability of Handbrake/x264 while gaining additional performance for free.

Despite the fact that this is an AMD backed initiative it’s interesting to see that Handbrake’s performance isn’t heavily reliant on the GPU being used. We would have assumed that Handbrake was only optimized for AMD’s GPUs at this point, and even if that’s the case NVIDIA’s GPUs are still fast enough to make up the difference. The fact that Handbrake performance with NVIDIA’s GPUs is a hair faster is not at all what we would have expected, but at the same time this is very beta quality software and is likely dependent on the clip being used, so we wouldn’t advise reading too much into this at this time.

Civilization V Synthetics
Comments Locked

110 Comments

View All Comments

  • Ammaross - Friday, June 22, 2012 - link

    So, since the 7970 GE is essentially a tweaked OCed 7970, why not include a factory-overclocked nVidia 680 for fairness? There's a whole lot of headroom on those 680s as well that these benches leave untouched and unrepresented.
  • elitistlinuxuser - Friday, June 22, 2012 - link

    Can it run pong and at what frame rates
  • Rumpelstiltstein - Friday, June 22, 2012 - link

    Why is Nvidia red and AMD Green?
  • Galcobar - Friday, June 22, 2012 - link

    Standard graph colouring on Anandtech is that the current product is highlighted in green, specific comparison products in red. The graphs on page 3 for driver updates aren't a standard graph for video card reviews.

    Also, typo noted on page 18 (OC Gaming Performance), the paragraph under the Portal 2 1920 chart: "With Portal 2 being one of the 7970GE’s biggest defEcits" -- deficits
  • mikezachlowe2004 - Sunday, June 24, 2012 - link

    Computer performance is a big factor in deciding in purchase as well and I am disappointed to not see any mention of this in the conclusion. AMD blows nVidia out the water when it comes to compute performance and this should not be taken lightly seeing as games right now are implementing more and more compute capabilities in games and many other things. Compute performance has been growing and growing and today at a rate higher than ever and it is very disappointing to see no mention of this in Anand's conclusion.

    I use autoCAD for work all the time but I also enjoy playing games as well and with a workload like this, AMDs GPU provide a huge advantage over nVidia simply because nVidias GK104 compute performance is no where near that of AMDs. AMD is the obvious choice for someone like me.

    As far as the noise and temps go, I personally feel if your spending $500 on a GPU and obviously thousands on your system there is no reason not tospend a couple hundred on water cooling. Water cooling completely eliminates any concern for temps and noise which should make AMDs card the clear choice. Same goes for power consumption. If you're spending thousands on a system there is no reason you should be worried about a couple extra dollars a month on your bill. This is just how I see it. Now don't get me wrong, nVidia has a great card for gaming, but gaming only. AMD offers the best of both worlds. Both gaming and compute and to me, this makes the 7000 series the clear winner to me.
  • CeriseCogburn - Sunday, June 24, 2012 - link

    It might help if you had a clue concerning what you're talking about.

    " CAD Autodesk with plug-ins are exclusive on Cuda cores Nvidia cards. Going crossfire 7970 will not change that from 5850. Better off go for GTX580."

    " The RADEON HD 7000 series will work with Autodesk Autocad and Revitt applications. However, we recommend using the Firepro card as it has full support for the applications you are using as it has the certified drivers. For the list of compatible certified video cards, please visit http://support.amd.com/us/gpudownload/fire/certifi... "

    nVidia works out of the box, amd does NOT - you must spend thousands on Firepro.

    Welcome to reality, the real one that amd fanboys never travel in.
  • spdrcrtob - Tuesday, July 17, 2012 - link

    It might help if you knew what you are talking about...

    CAD as infer is AutoCAD by Autodesk and it doesn't have any CUDA dedicated plugin's. You are thinking of 3DS Max's method of Rendering called iRay. That's even fairly new from 2011 release.

    There's isn't anything else that uses CUDA processors on a dedicated scale unless its a 3rd Party program or plugin. But not in AutoCAD, AutoCAD barely needs anything. So get it straight.

    R-E-V-I-T ( with one T) requires more as there's rendering engine built in not to mention its mostly worked in as a 3D application, unlike AutoCAD which is mostly used in 2D.

    Going Crossover won't help because most mid-range and high end single GPU's (AMD & NVIDIA) will be fine for ANY surface modeling and/ or 3D Rendering. If you use the application right you can increase performance numbers instead of increasing card count.

    All Autodesk products work with any GPU really, there are supported or "certified" drivers and cards, usually only "CAD" cards like Fire Pro or Quadro's.

    Nvidia's and AMD's work right out of the Box, just depends on the Add In Board partner and build quality NVIDIA fan boy. If you're going to state facts , then get your facts straight where it matters. Not your self thought cute remarks.

    Do more research or don't state something you know nothing about. I have supported CAD and Engineering Environments and the applications they use for 8yrs now, before that 5 yrs more of IT support experience.
  • aranilah - Monday, June 25, 2012 - link

    please put up a graph of the 680 overclocked to its maximum potential versus this to its maximum oc, that would be a different story i believe , not sure though. Please do it because on you 680 review there is no OC testing :/
  • MrSpadge - Monday, June 25, 2012 - link

    - AMDs boost assumes the stock heatsink - how is this affected by custom / 3rd party heat sinks? Will the chip think it's melting, whereas in reality it's crusing along just fine?

    - A simple fix would be to read out the actual temperature diode(s) already present within the chip. Sure, not deterministic.. but AMD could let users switch to this mode for better accuracy.

    - AMD could implement a calibration routine into the control panel to adjust the digital temperature estimation to the atcual heat sink present -> this might avoid the problem altogether.

    - Overvolting just to reach 1.05 GHz? I don't think this is necessary. Actually, I think AMD is generously overvolting most CPUs and some GPUs in the recent years. Some calibration for the actual chip capability would be nice as well - i.e. test if MY GPU really needs more voltage to reach the boost clock.

    - 4 digit product numbers and only fully using 2 of them, plus the 3rd one to a limited extend (only 2 states to distinguish - 5 and 7). This is ridiculous! The numbers are there to indicate performance!!!

    - Bring out cheaper 1.5 GB versions for us number crunchers.

    - Bring an HD7960 with approx. the same amount of shaders as the HD7950, but ~1 GHz clock speeds. Most chips should easily do this.. and AMD could sell the same chip for more, since it would be faster.
  • Hrel - Monday, June 25, 2012 - link

    How can you write a review like this, specifically to test one card against another, then only overclock one of them in the "OC gaming performance" section. Push the GTX680 as far as you can too otherwise those results are completely meaningless; for comparison.

Log in

Don't have an account? Sign up now