For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.

While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.

2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.

So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.

With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.

As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso  6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.

In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.

Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.

Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.

Wolfenstein Power, Temperature, & Noise
Comments Locked

197 Comments

View All Comments

  • Setsunayaki - Friday, October 22, 2010 - link

    There was a graph where a 4XXX series card beat the 6XXX series card...There were many where the 5XXX series was higher...Tesellation performance is higher on the 460 GTX and SLI scales better than crossfire...

    What the tesselation performance graph really means is that if you were to take an 460 GTX and 6870 and turn off tesselation and play a game....the 6870 gets a higher framerate, but if you turn on Tesselation on Both cards and go full force with Tesselation and other features (considering that Nvidia has support for PhysX and most games now have some physics implementation)...the outcome shows the 6870 taking such a performance hit that as far as framerates go....a 460 actually matches it or beats it outright.

    What ATI/AMD really needs to work on is Integrating more technologies on its card to actually have more options during a game. No physics processing, Just an optimization on AA and AF...and tesselation performance that doesn't come close to a 460, along with horrible linux support...I really wonder and hope that their flagship card shows something steller....

    Not to argue against it, but for the deserving ATI/AMD fans who have stuck with them over the years. ^_^
  • Alilsneaky - Friday, October 22, 2010 - link

    Prices are high for both in my country (Belgium).

    199 Euro for the 6850 and 279 euro (in the cheaper shops, upto 350 in others) for the 6870.

    Very bland release for us, nothing to get excited about at that price point.

    I also take offense to the naming scheme, why pick a name that will inevitable deceive many people into buying a sidegrade.
  • Pastuch - Friday, October 22, 2010 - link

    There was not nearly enough discussion on DTS HD MA and TrueHD pass through in this article. Gaming is 50% of the reason to upgrade, the rest of my focus is HTPC use. Please compare the GTX 460 vs the 6870 regarding bit-streaming, video quality and hardware decoding.

    Thanks.

    P.S. Nvidia usually does a pathetic job on anything not related to gaming.
  • Scootiep7 - Friday, October 22, 2010 - link

    I think you guys are a little off on calling the 6870 the $200 price point King. The cheapest retail for the card right now is $239.99 for any model and then you have to add in another $5~10 for shipping. That sticks it at $245 - $250. That's no where near the $200 price point. And with most GTX 460 1GBs sitting at about $170 - $190 (w/ shipping), this card is not competing with them on price at all. Maybe in a few months if prices drop, but not now. It's more in the GTX 470 range and that is much tougher competition. I'm sorry, but the 6870 is NOT the $200 price point King. It's not even close.
  • Lolimaster - Sunday, October 24, 2010 - link

    HD6850 offers better performance tha 460 1GB
    HD6850 costs $175

    HD6870 kill both of them, and also 470 performance/power consumption (80w less)
  • Scootiep7 - Sunday, October 24, 2010 - link

    Ok, I'm sorry, but I have to laugh at this. Where the hell are you finding a 6850 for $175. The cheapest ANYWHERE is $199 and you still have to factor in #8ish shipping. Re-read my post and realize that the prices I quoted are accurate and you're still looking at a $30 price difference between the 6850 and the 460 1gb. Yes the performance is better, but it's not amazingly better and I don't think it justifies it. Hey, I'm all for the red team this time around. I picked up a 5770 which is an amazing bang for the buck card. I'm just saying that calling the 6870 or the 6850 the new $200 price point king is wrong. Too many variables.
  • orthancstone - Friday, October 22, 2010 - link

    I'm especially pleased to see the 4870 included in some benchmarks. As someone who owns one and who was never impressed with the performance boost/cost ratio of the 58/59xx lines, I've been wondering how the 6xxx line would compare to the two generation old stuff. I'd love to see it included in the third party 6xxx reviews.
  • Edison5do - Friday, October 22, 2010 - link

    As a owner of a HD 4850 was planning to get an HD 5770 but at this point it looks like HD 6850 looks like a better option with a few more bucks.. or wait to see if the HD 5770 will drop price a little more....
  • Sando_UK - Friday, October 22, 2010 - link

    Anandtech is one of my favourite review sites and it's a real shame to see what's happened here. I don't know the reasons why you guys needed to include the 460 OC in this review (does sound like a fine card btw, but this wasn't the place for it) - can't see any reason this wouldn't have been much better compared in a separate article. The fact Tom's hardware did a very similar thing makes the whole thing fishy...

    New generations/architectures don't come along very often and deserve proper comparison and coverage - I'm not an AMD or Nvidia fanboi (happy to go with whichever is best price/performance/extras at the time) but we rely on you guys to give us the facts on a level playing field. I'm sure you have in this case, but even the suggestion of impropriety damages you (extremely good) reputation, and I think it's something you should really try to avoid in the future - be it AMD or Nvidia reviews.

    Otherwise, thanks for all your hard work.
  • Natfly - Friday, October 22, 2010 - link

    It's sad to say, but this review fucking sucks. UVD and the display controller have been overhauled but you make no mention of any of the changes. Are there still only 2 RAMDAC clocks? Or can you now use passive DP converters while using both of both DVI ports?

    And including an OC'd card because nVidia pushed you into it? Way to take a shot to your credibility. And no mention of its clocks or price... AND no overclocking numbers for these new cards when you are specifically comparing it to an OC'd card? I mean wtf, this review is not up to previous Anandtech standards.

Log in

Don't have an account? Sign up now