For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.

While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.

2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.

So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.

With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.

Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.

As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso  6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.

In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.

Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.

Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.

Wolfenstein Power, Temperature, & Noise
Comments Locked

197 Comments

View All Comments

  • 529th - Saturday, October 23, 2010 - link

    the marketers wanted to differentiate themselves from Nvidia, that's why they are using their second place cards to be in the same category as nvidias second place cards

    If you are shopping for a top of the line card you should know atleast a little bit about them although the un-educated video-card shopper would think that a 470 and 5870 or 6870 is on the SAME performance level, WHICH ISN'T TOO FAR FROM THE TRUTH, but I think it's here where AMD marketers are trying to make a statement

    i could be wrong, i have had very little sleep last night, cedar point was a blast!
  • SininStyle - Saturday, October 23, 2010 - link

    Can I just say THANK YOU for adding a OC edition of the 460. Don't know why everyone is whining. If you don't want to know how an OC edition compares then ignore the stupid bench for it. Why is such a huge deal?
    I personally am glad they included it and this is why. The 460 1gb stock is 675mhz and can OC "reliably" to 850mhz.. That's 175mhz gain and its noticeable. Stock volt stock fan. And for those that wanna claim heat, mine shows 64c at 75% fan on OCCT. The 6870 get 50hz OC at stock volt/fan. SEE why this is important people? $180 vs $240 with same results.

    Now with volt changes I'm sure they both have room to go I'm not sure how much. I tend to shy away from higher voltages at least for now.

    The 6850 is the better buy between the 2 68xx cards. That has allot of headroom to OC. That would even be a better comparison to the 460 due to the price. And owning the 460 doesn't make me a fanboy and I will say you can flip a coin for value on these 2.

    So again thanks for the added information. Cant see why anyone would complain about more info. If you don't like the info ignore it if it makes you feel better. Feel free to add OCed 6850s and 6870s I look forward to the comparison.
  • Parhel - Saturday, October 23, 2010 - link

    "The 460 1gb stock is 675mhz and can OC "reliably" to 850mhz"

    No, it absolutely cannot. the FTW card is a "golden sample" which is why there are so few available. Stock cooling on a stock card will not get you to 850Mhz with 24/7 reliability. You *might* get to 800Mhz, probably a bit less. That's a great value, IMO. If I were in the market at the moment, I'd pick a base model GTX 460 and OC it. Not arguing that point at all. But presenting this card in the 6870 launch article is a sham and major black eye to Anandtech's credibility.
  • rom0n - Saturday, October 23, 2010 - link

    Is it possible to post the GPUZ of the HD6850. It seems there are numerous cases where HD6850 has 1120 sent out to reviewers. See
    http://benchmarkreviews.com/index.php?option=com_c... If this happens to be one of them the results may be a little misleading. If not then it'll reaffirm the results.
  • GullLars - Saturday, October 23, 2010 - link

    This means a 6870 with open-air fan optimized for noise will be my early winter solstice present for myself, togheter with the 4x C300 64GB i just got :D
    I went for a value-upgrade of my old rigg with P2x6 1090T, 8GB kingston value DDR3, and AM3 mobo with SB850, so once i get both the SSD in RAID-0 and the GPU, I'll be a happy camper (or rusher) <3
    It'll tide me over untill i can get Bulldozer or a next gen Intel (high end/workstation) around winter 2011/2012.
  • poohbear - Saturday, October 23, 2010 - link

    "Apparently a small number of the AMD Radeon HD 6850 press samples shipped from AIB partners have a higher-than-expected number of stream processors enabled.

    This is because some AIBs used early engineering ASICs intended for board validation on their press samples. The use of these ASICs results in the incorrect number of stream processors. If you have an HD 6850 board sample from an AIB, please test using a utility such as GPU-z to determine the number of active stream processors. If that number is greater than 960, please contact us and we will work to have your board replaced with a production-level sample.

    All boards available in the market, as well as AMD-supplied media samples, have production-level GPUs with the correct 960 stream processors."

    so which one did Anandtech get? false marketing is such BS, just wanna be sure your benchmarks for the 6850 are reliable and we're not getting overrated benchmarks due to a cherry picked review sample.
  • lakrids - Saturday, October 23, 2010 - link

    The review ended up looking like an advertisement for EVGA at page 7 and beyond. Why EVGA? Why not some other brand?
    Why include that brand at all? Just mark the card "GTX 460 OC'd 850MHz".

    At the very first benchmark: Crysis 2560x1600, you didn't include the reference GTX 460, you pitched the HD6870 against the EVGA overclocked version. EVGA here, EVGA there, EVGA everywhere.

    Would you blame me if I suspect you of being on EVGA's paycheck?
  • Lolimaster - Sunday, October 24, 2010 - link

    When I call you a Intel/Nvidia biased site I'm saying the truth. Are you reviewing the HD6000 or doins an EVGA product reviews.

    This is an insult.

    Message:
    Nvidia will disappear like the dodo, just a bit more time and at that time all this sh1t will end.
  • SininStyle - Sunday, October 24, 2010 - link

    You do understand if Nvidia vanishes the price of GPUs goes through the roof right? Nvidia isnt going to vanish any earlier then Radeon. Saying either just translates into "Im a fanboy"

    Stop defending a sticker and start shopping price performance. Neither company would hesitate to rape your wallet if the other would allow it. Case in point look at the price of the 57xx and 58xx 2 months ago. Then look at the price of the same cards including the 68xx cards now. Any of these cards perform less then they did 2 months ago? But the price is a whole lot cheaper isnt it? Well you can thank the 460 for that. Competition results in better pricing for the same performance. You should be thanking Nvidia not hating them.
  • Super_Herb - Sunday, October 24, 2010 - link

    I love it - "as a matter of policy we do not include overclocked cards on general reviews"..........but this time nVidia said pretty please so we did. But because our strict ethical policy doesn't allow us to include them we'll just tell you we did it this one special time because a manufacturer specifically sent us a special card and then our integrity is still 100% intact......right? Besides, the "special" card nVidia sent us was so shiny and pretty!

    Back to [H]ard to get the real story.

Log in

Don't have an account? Sign up now