For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.

While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.

2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.

So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.

With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.

As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso  6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.

In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.

Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.

Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.

Wolfenstein Power, Temperature, & Noise
Comments Locked

197 Comments

View All Comments

  • GeorgeH - Friday, October 22, 2010 - link

    WRT comments complaining about the OC 460 -

    It's been clear from the 460 launch that a fully enabled and/or higher clocked 460 would compete very well with a 470. It would have been stupid for NVIDIA to release such a card, though - it would have made the already expensive GF100 even more so by eliminating a way to get rid of their supply of slightly defective GF100 chips (as with the 465) and there was no competitive reason to release a 460+.

    Now that there is a competitive reason to release one, do you really think Nvidia is going to sit still and take losses (or damn close to it) on the 470 when it has the capability of launching a 460+? Do you really think that Nvidia still can't make fully functional GF104 chips? Including the OC 460 is almost certainly Ryan's way of hinting without hinting (NDAs being what they are) what Nvdia is prepping for release.

    (And if you really think AT is anyone's shill, you're obviously very new to AT.)
  • AnandThenMan - Friday, October 22, 2010 - link

    "And if you really think AT is anyone's shill, you're obviously very new to AT."

    Going directly against admitted editorial policy doesn't exactly bolster your argument now does it. As for your comment about a 460+ or whatever you were trying to say, who cares? Reviews are supposed to be about hardware that is available to everyone now, not some theoretical card in the future.
  • MGSsancho - Friday, October 22, 2010 - link

    A vendor could just as likely sell an overclocked 470 card as well as a 480. But I think you made the right assumption that team green might be releasing overclocked cards that all have a minimum of 1gb of ram to make it look like their cards are faster than team red's. maybe it will be for near equal price points, the green cards will all be 20~30% overclocked to make it look like they are 10% faster than the red offerings at similar prices. Red cards could just be sold over clocked as well (we have to wait a bit more to see how well they overclock). All of this does not really matter. In the end of the day, buyers will look at whats the fastest product they can purchase at their price point. Maybe secondly they will notice that hey this thing gets hot and is very loud and just blindly blaming the green/red suits and thirdly they will look at features. Who really knows.

    Personally I purchase the slightly slower products then over clock them myself if i find a game that needs it. I would rather have the headroom vs buying a card that is always going to be hot enough to rival volcanoes even if it is factory warrantied.
  • Golgatha - Friday, October 22, 2010 - link

    The nVidia volcanoes comment is really, really overstated. I have a mid-tower case with a 120mm exhaust and 2x92mm intakes (Antec Solo for reference), and a GTX 480. None of these case fans are high performance fans. Under very stressful gaming conditions, I hit in the 80-85°C range, and Folding@Home's GPU3 client will get it up to 91°C under 100% torturous load.

    Although I don't like the power consumption of the GTX 480 for environmental reasons, it is rock solid stable, has none of the drawbacks of multi-GPU setups (I actually downgraded from a Crossfire 5850 setup due to game crashing and rendering issues), and it seems to be top dog in a lot of cases when it comes to minimum FPS (even when compared to multi-GPU setups).
  • Parhel - Friday, October 22, 2010 - link

    "And if you really think AT is anyone's shill, you're obviously very new to AT"

    I think you're referring to me, since I'm the one who used the word "shill." Let me tell you, I've been reading AT since before Tom's Hardware sucked, and that's a loooong time.

    If I were going to buy a card today, I'd buy the $180 GTX 460 1GB, no question. I'm not an AMD fan, nor am I an NVidia fan. I am, however, an Anandtech fan. And their decision to include the FTW edition card in this review means I can no longer come here and assume I'm reading something something unbiased and objective.
  • GeorgeH - Friday, October 22, 2010 - link

    It was actually more of a shotgun blast aimed at the several silly posts implying AT was paid off by EVGA or Nvidia.

    If you've been reading AT for ~10 years, why would you assume that Ryan (or any other longtime contributor) suddenly decided to start bowing to outside pressure? If you stop lighting the torches and sharpening the pitchforks for half a second, you might realize that Ryan probably has a very good reason for including the OC card.

    Even if I'm smoking crack WRT a GTX460+, what's the point of a review? It's not to give AMD and Nvidia a "fair" fight, it's to give us an idea of the best card to spend our money on - and if AMD or Nvidia get screwed in the process, I'm not going to be losing any sleep.

    Typically, OC cards with a significant clock bump are fairly rare "Golden samples" and/or only provide marginal performance benefits without significantly increasing heat, noise, and power consumption. With the 460, Nvidia all but admitted they could've bumped the stock clocks quite significantly, but didn't want to threaten their other cards (*cough* 470 *cough*) if they didn't have to. This is reflected in what you can actually buy at Newegg - of the ~30 1GB 460's, only ~5 are running stock. 850MHz is still high, but is also right in line with the average of what you can expect any 460 to get to, so I don't think it's too far out of place.

    Repeating what I said above, including the OC card was unfair to AMD, but is highly relevant to me and my wallet. I couldn't care less if AMD (or Nvidia) get screwed by an AT review - I just want to know what's best for me, and this article delivers. If the tables were turned, I'm sure that Ryan would have no problem including an OC AMD card in a Nvidia review - because it isn't about being a shill, it's about informing me, the consumer.
  • SandmanWN - Friday, October 22, 2010 - link

    What? Put the crack down... Really, if you are short on time to review a product and you steal time away from that objective just to review a specially delivered hand selected opponents card instead of completing your assignment then you've not exactly been genuine to your readers or in this case to AMD.

    If you have time to add in an overclocked card then you need to do the same with the review card, otherwise the OC'd cards need to wait another day.

    I have no idea how you can claim some great influence on your wallet when you have no idea of the OC capabilities of the 6000 series. If you actually bought the 460 off this review then you are banking that the overclock will hold up against a unknown variable. That's not exactly relevant to anyone's wallet.
  • GeorgeH - Friday, October 22, 2010 - link

    An OC'd 460 competes with the 6870, and the 6870 doesn't really overclock at all.

    Even overclocked, a 6850 isn't going to touch a 6870, unless you're going to well over 1GHz (which short of a miracle isn't going to happen.)

    It was disappointing that the review wasn't fleshed out more, but I'd say what's missing isn't as relevant to my buying decisions as how well the plethora of OC'd 460s compare to the 6870.
  • Parhel - Saturday, October 23, 2010 - link

    "the 6870 doesn't really overclock at all"

    What? You're talking out of your ass No review site has even attempted a serious overclock yet. It's not even possible, as far as I know, to modify the voltage yet! We have no way to gauge how these cards overclock, and won't for several weeks.

    "850MHz is still high, but is also right in line with the average of what you can expect any 460 to get to"

    Now you're sounding like the shill. 850Mhz is not a realistic number if we're talking about 24/7 stability with stock cooling. No way.
  • GeorgeH - Saturday, October 23, 2010 - link

    850MHz unrealistic? Nvidia flat out admitted that most cards are capable of at least ~800MHz (no volt mods, no nothing) and reviews around the web have backed this up, showing low to mid 800's on most stock cards, at stock voltages, running stock cooling. If you're worried about reliability, grab one of the many cards that come factory OC'd with a warranty.

    The 6870 doesn't now and never will overclock much at all, at least not in the way the 460 does. As with any chip, there will be golden sample cards that will go higher with voltage tweaks and extra cooling, but AMD absolutely did not leave ~20-25% of the 6870's average clockspeed potential on the table. The early OC reviews back this up as well, showing the 6870 as having minimal OC'ing headroom at stock voltages.

    If you're waiting to compare the maximum performance that you can stretch out of a cherry-picked 6870 with careful volt mods and aftermarket cooling, you're going to be comparing it with a 460 @ ~950MHz, not ~850MHz.

    As a guess, I'd say that your ignorance of these items is what led you to be so outraged at the inclusion of the OC 460 in the review. The magnitude of the OC potential of the 460 is highly atypical (at least in mid-range to high end cards), which is why I and many other posters have no issue with its similarly atypical inclusion in the review.

Log in

Don't have an account? Sign up now