For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.

While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.

2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.

So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.

With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.

As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso  6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.

In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.

Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.

Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.

Wolfenstein Power, Temperature, & Noise
Comments Locked

197 Comments

View All Comments

  • Quidam67 - Friday, October 29, 2010 - link

    Well that's odd.

    After reading about the EVGA FTW, and its mind-boggling factory overclock, I went looking to see if I could pick one of these up in New Zealand.

    Seems you can, or maybe not. As per this example http://www.trademe.co.nz/Browse/Listing.aspx?id=32... the clocks are 763Mhz and 3.8 on the memory?!?

    What gives, how can EVGA give the same name to a card and then have different specifications on it? So good thing I checked the fine-print or else I would have been bumbed out if I'd bought it and then realised it wasn't clocked like I thought it would be..
  • Murolith - Friday, October 29, 2010 - link

    So..how about that update in the review checking out the quality/speed of MLAA?
  • CptChris - Sunday, October 31, 2010 - link

    As the cards were compared to the OC nVidia card I would be interested in seeing how the 6800 series also compares to a card like the Sapphire HD5850 2GB Toxic Edition. I know it is literally twice the price as the HD6850 but would it be enough of a performance margin to be worth the price difference?
  • gochichi - Thursday, November 4, 2010 - link

    You know, maybe I hang in the wrong circles but I by far keep up to date on GPUs more than anyone I know. Not only that, but I am eager to update my stuff if it's reasonable. I want it to be reasonable so badly because I simply love computer hardware (more than games per say, or as much as the games... it's about hardware for me in and of itself).

    Not getting to my point fast enough. I purchased a Radeon 3870 at Best Buy (Best Buy had an oddly good deal on these at the time, Best Buy doesn't tend to keep competitive prices on video cards at all for some reason). 10 days later (so I returned my 3870 at the store) I purchased a 4850, and wow, what a difference it made. The thing of it is, the 3870 played COD 4 like a champ, the 4850 was ridiculously better but I was already satisfied.

    In any case, the naming... the 3870 was no more than $200.00 I think it was $150.00. And it played COD4 on 24" 1900x1200 monitor with a few settings not maxed out, and played it so well. The 4850 allowed me to max out my settings. Crysis sucked, crysis still sucks and crysis is still a playable benchmark. Not to say I don't look at it as a benchmark. The 4850 on the week of its release was $199.99 at Best Buy.

    Then gosh oh golly there was the 4870 and the 4890, which simply took up too much power... I am simply unwilling to buy a card that uses more than one extra 6-pin connector just so I can go out of my way to find something that runs better. So far, my 4850 has left me wanting more in GTA IV, (notice again how it comes down to hardware having to overcome bad programming, the 4850 is fast enough for 1080p but it's not a very well ported game so I have to defer to better hardware). You can stop counting the ways my 4850 has left me wanting more at 1900 x 1200. I suppose maxing out Starcraft II would be nice also.

    Well, then came out the 5850, finally a card that would eclipse my 4850... but oh wait, though the moniker was the same (3850 = so awesome, so affordable, the 4850 = so awesome, so affordable, the 5850 = two 6-pin connectors, so expensive, so high end) it was completely out of line with what I had come to expect. The 4850 stood without a successor. Remember here that I was going from 3870 to 4850, same price range, way better performance. Then came the 5770, and it was marginally faster but just not enough change to merit a frivolous upgrade.

    Now, my "need" to upgrade is as frivolous as ever, but finally, a return to sanity with the *850 moniker standing for fast, and midrange. I am a *850 kind of guy through and through, I don't want crazy power consumption, I don't want to be able to buy a whole, really good computer for the price of just a video card.

    So, anyhow, that's my long story basically... that the strange and utterly upsetting name was the 5850, the 6850 is actually right in line with what the naming should have always staid as. I wouldn't know why the heck AMD tossed a curve ball for me via the 5850, but I will tell you that it's been a really long time coming to get a true successor in the $200 and under range.

    You know, around the time of the 9800GT and the 4850, you actually heard people talk about buying video cards while out with friends. The games don't demand much more than that... so $500 cards that double their performance is just silly silly stuff and people would rather buy an awesome phone, an iPad, etc. etc. etc.

    So anyhow, enough of my rambling, I reckon I'll be silly and get the true successor to my 4850... though I am assured that my Q6600 isn't up to par for Starcraft II... oh well.
  • rag2214 - Sunday, November 7, 2010 - link

    The 6800 series my not beat the 5870 yet but it is the start of the HDMI 1.4 for 3dHD not available in any other ATI graphics cards.
  • Philip46 - Monday, November 15, 2010 - link

    The review stated why was there a reson to buy a 460(not OC'ed).

    How about benchmarks of games using Physx?

    For instance Mafia 2 hits 32fps @ 1080p(I7-930 cpu) when using Physx on high, while the 5870 manages only 16.5fps, while i tested both cards.

    How about a GTA:IV benchmark?, because the Zotac 2GB GTX 460, runs the game more smoothly(the same avg fps, except the min fps on the 5850 are lower in the daytime) then the 5850 (2GB).

    How about even a Far Cry 2 benchmark?

    Co'me on anandtech!, lets get some real benchmarks that cover all aspects of gaming features.

    How about adding in driver stability? Ect..

    And before anyone calls me biased, i had both the Zotac GTX 460 and Saffire 5850 2GB a couple weeks back, and overall i went with the Zotac 460, and i play Crysis/Stalker/GTA IV/Mafia 2/Far Cry 2..ect @ 1080p, and the 460 just played them all more stable..even if Crysis/Stalker were some 10% faster on the 5850.

    BTW: Bad move by anandtech to include the 460 FTC !
  • animekenji - Saturday, December 25, 2010 - link

    Barts is the replacement for Juniper, NOT Cypress. Cayman is the replacement for Cypress. If you're going to do a comparison to the previous generation, then at least compare it to the right card. HD6850 replaces HD5750. HD6870 replaces HD5770. HD6970 replaces HD5870. You're giving people the false impression that AMD knocked performance down with the new cards instead of up when HD6800 vastly outperforms HD5700 and HD6900 vastly outperforms HD5800. Stop drinking the green kool-aid, Anandtech.

Log in

Don't have an account? Sign up now