Compute & Tessellation

Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance. This will give us our best chance to not only look at the theoretical aspects of AMD’s tessellation improvements, but to isolate shader performance to see whether AMD’s theoretical performance advantages and disadvantages from VLIW4 map out to real world scenarios.

Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.

Civilization V’s compute shader benchmark has always benefitted NVIDIA, but that’s not the real story here. The real story is just how poorly the 6900 series does compared to the 5870. The 6970 barely does better than the 5850, meanwhile the 6950 is closest to NVIDIA’s GTX 460, the 768MB version. If what AMD says is true about the Cayman shader compiler needing some further optimization, then this is benchmark where that’s readily apparent. As an application of GPU computing, we’d expect the 6900 series to do at least somewhat better than the 5870, not notably worse.

Our second GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.

Unlike Civ 5, SmallLuxGPU’s performance is much closer to where things should be theoretically. Even with all of AMD’s shader changes both the 5870 and 6970 have a theoretical 2.7 TFLOPs of compute performance, and SmallLuxGPU backs up that number. The 5870 and 6970 are virtually tied, exactly where we’d expect our performance to be if everything is running under reasonably optimal conditions. Note that this means that the 6950 and 6970 both outperform the GTX 580 here, as SmallLuxGPU does a good job setting AMD’s drivers up to extract ILP out of the OpenCL kernel it uses.

Our final compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.

MediaEspresso 6 quickly gets CPU bottlenecked when paired with a faster GPU, leading to our clusters of results. For the 6900 series this mostly serves as a sanity check, proving that transcoding performance has not slipped even with AMD’s new architecture.

At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6900 series, AMD significantly enhanced their tessellation by doubling up on tessellation units and the graphic engines they reside in, which can result in up to 3x the tessellation performance over the 5870. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.

Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. So with AMD’s tessellation improvements we see the 6970 shoot to life on this benchmark, coming in at nearly 50% faster than the 5870 at both moderate and extreme tessellation settings. This is actually on the low end of AMD’s theoretical tessellation performance improvements, but then even the geometrically overpowered GTX 580 doesn’t get such clear gains. But on that note while the 6970 does well at moderate tessellation levels, at extreme tessellation levels it still falls to the more potent GTX 400/500 series.

As for Microsoft’s DirectX 11 Detail Tessellation Sample program, a different story is going on. The 6970 once again shows significant gains over the 5870, but this time not against the 6870. With the 6870 implementing AMD’s tessellation factor optimized tessellator, most of the 6970’s improvements are already accounted for here. At the same time we can still easily see just how much of an advantage NVIDIA’s GTX 400/500 series still has in the theoretical tessellation department.

Wolfenstein Power, Temperature, & Noise
Comments Locked

168 Comments

View All Comments

  • Ryan Smith - Wednesday, December 15, 2010 - link

    Exactly the same as on Cypress.

    L2: 128KB per ROP block (so 512KB)
    L1: 8KB per SIMD
    LDS: 32KB per SIMD
    GDS: 64KB

    http://images.anandtech.com/doci/4061/MidLevelView...

    I don't have the register file size readily available.
  • DanNeely - Wednesday, December 15, 2010 - link

    How likely is the decrease from 2 to 1 operations per clock likely to affect real world applications?
  • yeraldin37 - Wednesday, December 15, 2010 - link

    My current cards are running at 870Mhz(GPU) and 1100Mhz(clock), faster than stock 5870, those benchmarks for new 6970 are really disappointing, I was seriously expecting to get a single 6970 for Christmas to replace my 5850OC CF cards and make room for additional cards or even have a free pcie to plug my gtx460 for physx capability. I was going to be happy to get at least 80% of my current 5850CF setup from new 6970. what a joke! I will not make any move and wait for upcoming next generation 28nm amd GPU's. We have to be fair and mention all great efforts from AMD team to bring new technology to newest radeon cards, however not enough performance for die hard gamers. If gtx 580 were 20% cheaper I might consider to buy one, I personally never ever pay more than $400 for one(1) video card.
  • Nfarce - Wednesday, December 15, 2010 - link

    Reading Tom's Hardware they essentially slam AMD's marketing these cards as a 570-580 beater. Guru3D is also less than friendly. Interstingly, *both* sites have benches showing the 570 an d580 beating the 6950 and 6970 commandingly. What's up with that exactly?
  • fausto412 - Wednesday, December 15, 2010 - link

    it's called AMD didn't deliver on the hype...they deserve to get slammed.
  • medi01 - Wednesday, December 15, 2010 - link

    AMD delivers cards with better performance/price ratio that also consume less power. How come there is a reason to "slam", eh?
  • zst3250 - Friday, December 31, 2010 - link

    Off yourself cretin, prefearbly by getting your cranium kicked in.
  • Mr Perfect - Thursday, December 16, 2010 - link

    Wait, is Tom's reputable again? Haven't read that site since the Athlon XP was new....
  • AnnonymousCoward - Wednesday, December 15, 2010 - link

    As a 30" owner and gamer, I would never run at 2560x1600 with AA enabled if that causes <60fps. I'd disable AA. Who wouldn't value framerate over AA? So when the fps is <60, please compare cards at 2560x1600 without AA, so that I'm able to apply the results to a purchase decision.
  • SimpJee - Wednesday, December 15, 2010 - link

    Greetings, also a 30'' gamer. If you see the FPS above 30 with AA enabled, you can assume it will be (much) higher without it enabled so what's the point in actually having the author bench it without AA? Plus, anything above 30 FPS is just icing on the cake as far as I'm concerned.

Log in

Don't have an account? Sign up now