For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.

While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.

2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.

So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.

With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.

As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso  6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.

In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.

Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.

Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.

Wolfenstein Power, Temperature, & Noise
Comments Locked

197 Comments

View All Comments

  • campbbri - Friday, October 22, 2010 - link

    Thanks for the great review. I don't know why everyone is complaining about mixing OC and Non-OC cards when you were extremely explicit in pointing it out.
  • krumme - Friday, October 22, 2010 - link

    I dont think you dont know why everyone is complaining.

    First. To be fair its far from everyone :), unfortunately because Anand is surrounded by far to many yes sayers. All positve. Great in many ways. But it does not develop the site as it could. There is a great huge community, and there is plenty of ressources to get ideas to new methology.

    Its good - if not vital - that Kyle is explicit about it. Otherwise it wouldnt be worth critizicing, then it would just look like a payed job, and nobody would care. Its not. But beeing explicit is not enough even if its most important and a huge quality. You need to have a good case. And Anand does have a very bad case.

    Read what Kyle wrote againg. Do you think this is his best and most sound decicion in his life? do he feel comfortable about it?

    He did betray himself a little bit. And he shouldnt do it. He should lissen to his own doubt.
  • snarfbot - Friday, October 22, 2010 - link

    yes i understand that, but i cant see how you can call a direct replacement that fails to outperform its predecessor as a success.

    especially when you consider that the prices have increased after launch as opposed to decrease as is normal. and have remained artificially high since, due to limitations at tsmc, which renders the cost argument pretty much moot.

    how about an analogy.

    6870 is to 5870 as 4770 is to 4870.

    and its on the same process which makes it even worse, although you cant really blame amd for that.

    you can very much blame their marketing department for making such a terrible decision though.

    its a terrible name, thats the whole point, at whatever price you cant call it a 6870 if it cant beat a 5870.
  • Trefugl - Friday, October 22, 2010 - link

    yes i understand that, but i cant see how you can call a direct replacement that fails to outperform its predecessor as a success.


    But the issue is that the 68xx series alone aren't really replacing the 58xx series. I think they are really splitting what the direct replacement to that market would have been into two - the 69xx (high-end enthusiast) and the 68xx (high-end mid-range).

    I agree that the naming scheme isn't the best, but I think a lot of that could have been mitigated (and maybe even made a non-issue) if the 68xx's weren't the first to launch. If the 69xx came out first people would have accepted them and been happy, but instead we have b*tching because of naming confusion...
  • Targon - Sunday, October 24, 2010 - link

    I missed this too until someone pointed out what I missed. The Radeon 6900 series will replace the 5800 series at the high end, and IS the proper high end part you are looking for.

    Back when DirectX 9 first came out, ATI only had DirectX 9 support in the old Radeon 9500 and 9700. When the X300, X600, and X800 came out, notice that AMD took the cards and started at 600 and 800, rather than 500 and 700 for the mid ranged and high end cards. This has continued a bit. In the HD 2000 series, you even had the HD 2900XT on the high end of the series, but then they went to the 3800, 4800, and 5800 series to mark the high end cards.

    So, AMD/ATI has been tweaking the names a fair bit. What initially threw me off is that the next generation high end cards are not the first cards to show up, and we have the mid-ranged cards showing up first.

    If the article said clearly, "We are reviewing the next generation mid range cards with the high generation 6900 due out next month" right up front in the article instead of buried in the text somewhere on page 2(or was it 3), there would have been less confusion.

    I don't mind the change in numbers if all parts come out at the same time, but for now, there is ONLY confusion because we have yet to see the 6970.
  • GaMEChld - Friday, October 14, 2011 - link

    I love how people are arguing over this naming change. As if people who buy discrete cards or look at video card specs don't know what their doing. If you don't know what you're buying, it serves you right.

    I don't know why this was so hard for people to understand. The 5700 was incredibly successful. AMD wanted to preserve that card for its performance and value. Thus, the 6700 name was taken. The 6800 model is a new model that sits BETWEEN where the 5700 and 5800 line had. If you recall, there was a MASSIVE performance gap between those lines, and AMD felt they should have something to bridge that gap.

    The new 6800 line bridges that gap. It offers NEAR 5800 power at a significant price reduction.

    And now ALL of the top tier cards are housed under the 6900 bracket, with the 6990 taking the dual GPU slot. If I had anything to complain about its the abandonment of the X2 designation on dual GPU cards.

    In fact, the only thing people should be angry about is the fact that the 6700 is virtually identical to the 5700 and offers little performance advantage. THAT is what is reminiscent of the 8800GT -> 9800GT transition. However, since the 5700 was a midrange product, maybe it received less attention than it should have.
  • DanaG - Friday, October 22, 2010 - link

    Now, if the 6870 is what should've been a 6770, and a 6970 is what should've been a 6870... then what'll they call what should've been a 6970? 6-10-70 / 6ten70? 6X70? 6999? Or will they go to 6970 X2?
  • spigzone - Saturday, October 23, 2010 - link

    6990 ... yhat wasn't so hard now, was it?
  • AMD_Pitbull - Saturday, October 23, 2010 - link

    Gotta say, I agree 100%. I really don't understand why everyone is getting so bloody upset with this. New product, new line. You couldn't predict what was going to happen? Sorry. Companies like to keep people guessing.

    Also, if you really want to get technical, this 6870 DOES beat the 5870 if a few things as well. Overall greater effective product AND cheaper? Win in my books. Sorry QQ'ers.
  • dvijaydev46 - Saturday, October 23, 2010 - link

    I tried converting a video file using my 5770 Hawk with MediaEspresso 6 (with hardware acceleration enabled of course), I wasn't impressed but Mediashow 5 properly utilized the GPU power and the speed difference in converting was clear. I'm not sure if there was a problem in the installation of my copy of MediaEspresso 6, but I think you guys can use Mediashow 5 to see if there is any difference in video conversion time with an AMD GPU as I don't have any other card.

Log in

Don't have an account? Sign up now