Compute Performance

Shifting gears, as always our final set of performance benchmarks is a look at compute performance. As we saw with the launch of the GTX 680, Kepler (GK104) just doesn’t do very well here, thanks in part to NVIDIA stripping out a fair bit of compute hardware and memory bandwidth on GK104 in order to focus on gaming performance. OpenCL performance is particularly bad with NVIDIA almost completely ignoring it, but even DirectCompute performance often swings AMD’s way. This isn’t to say that GK104 doesn’t have its moments, but when it comes to compute it’s typically AMD’s time to shine.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. Note that this is a DX11 DirectCompute benchmark.

The 7970 already had a significant lead in this benchmark thanks to AMD’s work on improving their DirectCompute performance, and the 7970GE extends it further. The most important factor of course is actual game performance – where the 7970GE and GTX 680 are tied – but this is clear software evidence of what we already know in hardware: that the 7970GE is far more potent at compute than the GTX 680 is.

Our next benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

Being an OpenCL title that NVIDIA isn’t taking any care to optimize for, the 7970GE simply blows the GTX 680 out of the water. It’s not even a contest here. Only one card family is even worth consideration for use here. However it’s interesting to note that the 7970GE’s performance improvement over the 7970 is a bit below average, with the 7970GE only picking up 6%. SLG does stress memory bandwidth and compute performance, but in all likelihood the 7970GE isn’t boosting as much here as it is under our gaming tests. Once AMD starts exposing real clockspeeds we’ll need to revisit this assumption.

For our next benchmark we’re looking at AESEncryptDecrypt, an OpenCL AES encryption routine that AES encrypts/decrypts an 8K x 8K pixel square image file. The results of this benchmark are the average time to encrypt the image over a number of iterations of the AES cypher.

While the 7970GE does improve upon the 7970’s already strong performance, we’re clearly reaching the point where the relatively long CPU/GPU transfer times over PCIe are taking their toll, explaining why the 7970GE could only shave off 5ms. This is actually an important point to make and is why APUs are so important to AMD’s GPU computing plans, but it also means that at a certain speed GPU performance ceases to matter.

Our fourth benchmark is once again looking at compute shader performance, this time through the Fluid simulation sample in the DirectX SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

In this final compute shader benchmark NVIDIA’s performance is actually quite respectable, leading to them besting the 7970. However the 7970GE provides just enough of a performance boost to push AMD ahead of NVIDIA here, giving AMD a solid majority of our standard compute benchmarks. Even when Kepler is faced with a favorable workload, it looks like GCN based 7970GE is capable of taking NVIDIA head-on.

Finally, we received a number of requests for some further compute benchmarking using some of the consumer programs AMD provided the press with for the Trinity launch. In particular WinZip and handbrake were requested, so we’ve gone ahead and run those benchmarks for this review.

Starting with WinZip, WinZip 16.5 introduced OpenCL acceleration of both compression and AES achieve encryption. Despite being accelerated via OpenCL WinZip only supports AMD devices, presumably because only AMD provided technical assistance. As a result we’re looking solely at pure CPU performance and GPU accelerated performance across AMD’s lineup.

One thing immediately sticks out: WinZip isn’t very sensitive to GPU performance. Merely having a GPU increases performance rather significantly, but it doesn’t matter if it’s a fast GCN card or a GCN card at all for that matter, as even the VLIW4 based 6970 returns the same times. In fact AMD’s drivers report almost no GPU load, so it’s questionable how much of this is actually being run on the GPU versus being run on the CPU through AMD’s OpenCL CPU driver.

As for Handbrake, AMD sent along a newer version that works with discrete GPUs. AMD notes that this is still very much a work in progress, which we saw first-hand when OpenCL acceleration failed to handle two of our three test clips. It failed to properly crop one video, and failed to properly detelecine another. Handbrake’s OpenCL acceleration will of course continue to improve as it approaches release, but for the time being it’s definitely a beta.

Much like WinZip, Handbrake doesn’t appear to be particularly GPU performance sensitive, which doesn’t come as much of a surprise. Large parts of the H.264 encoding process are ill suited for GPU acceleration, so X.264 is only offloading part of the process and the deciding factor is still CPU performance. The actual GPU load is very inconsistent, but generally tops out at around 40% usage.

The end result is nothing to sneeze at however. Whereas Handbrake averaged 25.6fps without GPU acceleration, with it performance increases by 24% to around 32fps. And unlike other GPU compute accelerated encoders the quality here is very consistent between the CPU and GPU paths (though GPU file size tends to be a bit larger), which means we’re retaining the same quality and customizability of Handbrake/x264 while gaining additional performance for free.

Despite the fact that this is an AMD backed initiative it’s interesting to see that Handbrake’s performance isn’t heavily reliant on the GPU being used. We would have assumed that Handbrake was only optimized for AMD’s GPUs at this point, and even if that’s the case NVIDIA’s GPUs are still fast enough to make up the difference. The fact that Handbrake performance with NVIDIA’s GPUs is a hair faster is not at all what we would have expected, but at the same time this is very beta quality software and is likely dependent on the clip being used, so we wouldn’t advise reading too much into this at this time.

Civilization V Synthetics
Comments Locked

110 Comments

View All Comments

  • piroroadkill - Friday, June 22, 2012 - link

    While the noise is bad - the manufacturers are going to spew out non-reference, quiet designs in moments, so I don't think it's an issue.
  • silverblue - Friday, June 22, 2012 - link

    Toms added a custom cooler (Gelid Icy Vision-A) to theirs which reduced noise and heat noticably (about 6 degrees C and 7-8 dB). Still, it would be cheaper to get the vanilla 7970, add the same cooling solution, and clock to the same levels; that way, you'd end up with a GHz Edition clocked card which is cooler and quieter for about the same price as the real thing, albeit lacking the new boost feature.
  • ZoZo - Friday, June 22, 2012 - link

    Would it be possible to drop the 1920x1200 definition for test? 16/10 is dead, 1080p has been the standard for high definition on PC monitors for at least 4 years now, it's more than time to catch up with reality... Sorry for the rant, I'm probably nitpicking anyway...
  • Reikon - Friday, June 22, 2012 - link

    Uh, no. 16:10 at 1920x1200 is still the standard for high quality IPS 24" monitors, which is a fairly typical choice for enthusiasts.
  • paraffin - Saturday, June 23, 2012 - link

    I haven't been seeing many 16:10 monitors around thesedays, besides, since AT even tests iGPU performance at ANYTHING BUT 1080p your "enthusiast choice" argument is invalid. 16:10 is simply a l33t factor in a market dominated by 16:9. I'll take my cheap 27" 1080p TN's spaciousness and HD content nativiness over your pricy 24" 1200p IPS' "quality" anyday.
  • CeriseCogburn - Saturday, June 23, 2012 - link

    I went over this already with the amd fanboys.
    For literally YEARS they have had harpy fits on five and ten dollar card pricing differences, declaring amd the price perf queen.

    Then I pointed out nVidia wins in 1920x1080 by 17+% and only by 10+% in 1920x1200 - so all of a sudden they ALL had 1920x1200 monitors, they were not rare, and they have hundreds of extra dollars of cash to blow on it, and have done so, at no extra cost to themselves and everyone else (who also has those), who of course also chooses such monitors because they all love them the mostest...

    Then I gave them egg counts, might as well call it 100 to 1 on availability if we are to keep to their own hyperactive price perf harpying, and the lowest available higher rez was $50 more, which COST NOTHING because it helps amd, of course....

    I pointed out Anand pointed out in the then prior article it's an ~11% pixel difference, so they were told to calculate the frame rate difference... (that keeps amd up there in scores and winning a few they wouldn't otherwise).

    Dude, MKultra, Svengali, Jim Wand, and mass media, could not, combined, do a better job brainwashing the amd fan boy.

    Here's the link, since I know a thousand red-winged harpies are ready to descend en masse and caw loudly in protest...

    http://translate.google.pl/translate?hl=pl&sl=...

    1920x1080: " GeForce GTX680 is on average 17.61% more efficient than the Radeon 7970.
    Here, the performance difference in favor of the GTX680 are even greater"

    So they ALL have a 1920x1200, and they are easily available, the most common, cheap, and they look great, and most of them have like 2 or 3 of those, and it was no expense, or if it was, they are happy to pay it for the red harpy from hades card.
  • silverblue - Monday, June 25, 2012 - link

    Your comparison article is more than a bit flawed. The PCLab results, in particular, have been massively updated since that article. Looks like they've edited the original article, which is a bit odd. Still, AMD goes from losing badly in a few cases to not losing so badly after all, as the results on this article go to show. They don't displace the 680 as the best gaming card of the moment, but it certainly narrows the gap (even if the GHz Edition didn't exist).

    Also, without a clear idea of specs and settings, how can you just grab results for a given resolution from four or five different sites for each card, add them up and proclaim a winner? I could run a comparison between a 680 and 7970 in a given title with the former using FXAA and the latter using 8xMSAA, doesn't mean it's a good comparison. I could run Crysis 2 without any AA and AF at all at a given resolution on one card and then put every bell and whistle on for the other - without the playing field being even, it's simply invalid. Take each review at its own merits because at least then you can be sure of the test environment.

    As for 1200p monitors... sure, they're more expensive, but it doesn't mean people don't have them. You're just bitter because you got the wrong end of the stick by saying nobody owned 1200p monitors then got slapped down by a bunch of 1200p monitor owners. Regardless, if you're upset that NVIDIA suddenly loses performance as you ramp up the vertical resolution, how is that AMD's fault? Did it also occur to you that people with money to blow on $500 graphics cards might actually own good monitors as well? I bet there are some people here with 680s who are rocking on 1200p monitors - are you going to rag (or shall I say "rage"?) on them, too?

    If you play on a 1080p panel then that's your prerogative, but considering the power of the 670/680/7970, I'd consider that a waste.
  • FMinus - Friday, June 22, 2012 - link

    Simply put; No!

    1080p is the second worst thing that happened to the computer market in the recent years. The first worst thing being phasing out 4:3 monitors.
  • Tegeril - Friday, June 22, 2012 - link

    Yeah seriously, keep your 16:9, bad color reproduction away from these benchmarks.
  • kyuu - Friday, June 22, 2012 - link

    16:10 snobs are seriously getting out-of-touch when they start claiming that their aspect ratio gives better color reproduction. There are plenty of high-quality 1080p IPS monitors on the market -- I'm using one.

    That being said, it's not really important whether it's benchmarked at x1080 or x1200. There is a neglible difference in the number of pixels being drawn (one of the reasons I roll my eyes at 16:10 snobs). If you're using a 1080p monitor, just add anywhere from 0.5 to 2 FPS to the average FPS results from x1200.

    Disclaimer: I have nothing *against* 16:10. All other things being equal, I'd choose 16:10 over 16:9. However, with 16:9 monitors being so much cheaper, I can't justify paying a huge premium for a measily 120 lines of vertical resolution. If you're willing to pay for it, great, but kindly don't pretend that doing so somehow makes you superior.

Log in

Don't have an account? Sign up now