AMD's Radeon HD 6970 & Radeon HD 6950: Paving The Future For AMD
by Ryan Smith on December 15, 2010 12:01 AM ESTCompute & Tessellation
Moving on from our look at gaming performance, we have our customary look at compute performance, bundled with a look at theoretical tessellation performance. This will give us our best chance to not only look at the theoretical aspects of AMD’s tessellation improvements, but to isolate shader performance to see whether AMD’s theoretical performance advantages and disadvantages from VLIW4 map out to real world scenarios.
Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.
Civilization V’s compute shader benchmark has always benefitted NVIDIA, but that’s not the real story here. The real story is just how poorly the 6900 series does compared to the 5870. The 6970 barely does better than the 5850, meanwhile the 6950 is closest to NVIDIA’s GTX 460, the 768MB version. If what AMD says is true about the Cayman shader compiler needing some further optimization, then this is benchmark where that’s readily apparent. As an application of GPU computing, we’d expect the 6900 series to do at least somewhat better than the 5870, not notably worse.
Our second GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.
Unlike Civ 5, SmallLuxGPU’s performance is much closer to where things should be theoretically. Even with all of AMD’s shader changes both the 5870 and 6970 have a theoretical 2.7 TFLOPs of compute performance, and SmallLuxGPU backs up that number. The 5870 and 6970 are virtually tied, exactly where we’d expect our performance to be if everything is running under reasonably optimal conditions. Note that this means that the 6950 and 6970 both outperform the GTX 580 here, as SmallLuxGPU does a good job setting AMD’s drivers up to extract ILP out of the OpenCL kernel it uses.
Our final compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.
MediaEspresso 6 quickly gets CPU bottlenecked when paired with a faster GPU, leading to our clusters of results. For the 6900 series this mostly serves as a sanity check, proving that transcoding performance has not slipped even with AMD’s new architecture.
At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6900 series, AMD significantly enhanced their tessellation by doubling up on tessellation units and the graphic engines they reside in, which can result in up to 3x the tessellation performance over the 5870. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.
Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. So with AMD’s tessellation improvements we see the 6970 shoot to life on this benchmark, coming in at nearly 50% faster than the 5870 at both moderate and extreme tessellation settings. This is actually on the low end of AMD’s theoretical tessellation performance improvements, but then even the geometrically overpowered GTX 580 doesn’t get such clear gains. But on that note while the 6970 does well at moderate tessellation levels, at extreme tessellation levels it still falls to the more potent GTX 400/500 series.
As for Microsoft’s DirectX 11 Detail Tessellation Sample program, a different story is going on. The 6970 once again shows significant gains over the 5870, but this time not against the 6870. With the 6870 implementing AMD’s tessellation factor optimized tessellator, most of the 6970’s improvements are already accounted for here. At the same time we can still easily see just how much of an advantage NVIDIA’s GTX 400/500 series still has in the theoretical tessellation department.
168 Comments
View All Comments
MeanBruce - Wednesday, December 15, 2010 - link
TechPowerUp.com shows the 6850 as 95percent or almost double the performance of the 4850 and 100percent more efficient than the 4850@1920x1200. I also am upgrading an old 4850, as far as the 6950 check their charts when they come up later today.mapesdhs - Monday, December 20, 2010 - link
Today I will have completed by benchmark pages comparing 4890, 8800GT and
GTX 460 1GB (800 and 850 core speeds), in both single and CF/SLI, for a range
of tests. You should be able to extrapolate between known 4850/4890 differences,
the data I've accumulated, and known GTX 460 vs. 68xx/69xx differences (baring
in mind I'm testing with 460s with much higher core clocks than the 675 reference
speed used in this article). Email me at mapesdhs@yahoo.com and I'll send you
the URL once the data is up. I'm testing with 3DMark06, Unigine (Heaven, Tropics
and Sanctuary), X3TC, Stalker COP, Cinebench, Viewperf and PT Boats. Later
I'll also test with Vantage, 3DMark11 and AvP.
Ian.
ZoSo - Wednesday, December 15, 2010 - link
Helluva 'Bang for the Buck' that's for sure! Currently I'm running a 5850, but I have been toying with the idea of SLI or CF. For a $300 difference, CF is the way to go at this point.I'm in no rush, I'm going to wait at least a month or two before I pull any triggers ;)
RaistlinZ - Wednesday, December 15, 2010 - link
I'm a bit underwhelmed from a performance standpoint. I see nothing that will make me want to upgrade from my trusty 5870.I would like to see a 2x6950 vs 2x570 comparison though.
fausto412 - Wednesday, December 15, 2010 - link
exactly my feelings.it's like thinking Miss Universe is about to screw you and then you find out it's her mom....who's probably still hot...but def not miss universe
Paladin1211 - Wednesday, December 15, 2010 - link
CF scaling is truly amazing now, I'm glad that nVidia has something to catch up in terms of driver. Meanwhile, the ATI wrong refresh rate is not fixed, it stucks at 60hz where the monitor can do 75hz. "Refresh force", "refresh lock", "ATI refresh fix", disable /enable EDID, manually set monitor attributes in CCC, EDID hack... nothing works. Even the "HUGE" 10.12 driver can't get my friend's old Samsung SyncMaster 920NW to work at its native 1440x900@75hz, both in XP 32bit and win 7 64bit. My next monitor will be an 120hz for sure, and I don't want to risk and ruin my investment, AMD.mapesdhs - Monday, December 20, 2010 - link
I'm not sure if this will help fix the refresh issue (I do the following to fix max res
limits), but try downloading the drivers for the monitor but modify the data file
before installing them. Check to ensure it has the correct genuine max res and/or
max refresh.
I've been using various models of CRT which have the same Sony tube that can
do 2048 x 1536, but every single vendor that sells models based on this tube has
drivers that limited the max res to 1800x1440 by default, so I edit the file to enable
2048 x 1536 and then it works fine, eg. HP P1130.
Bit daft that drivers for a monitor do not by default allow one to exploit the monitor
to its maximum potential.
Anyway, good luck!!
Ian.
techworm - Wednesday, December 15, 2010 - link
future DX11 games will stress GPU and video RAM incrementally and it is then that 6970 will shine so it's obvious that 6970 is a better and more future proof purchase than GTX570 that will be frame buffer limited in near future gamesNickel020 - Wednesday, December 15, 2010 - link
In the table about whether PowerTune affects an application or not there's a yes for 3DMark, and in the text you mention two applications saw throttling (with 3DMark it would be three). Is this an error?Also, you should maybe include that you're measuring the whole system power in the PowerTune tables, it might be confusing for people who don't read your reviews very often to see that the power draw you measured is way higher than the PowerTune level.
Reading the rest now :)
stangflyer - Wednesday, December 15, 2010 - link
Sold my 5970 waiting for 6990. With my 5970 playing games at 5040x1050 I would always have a 4th extended monitor hooked up to a tritton uve-150 usb to vga adapter. This would let me game while having the fourth monitor display my teamspeak, afterburner, and various other things.Question is this!! Can i use the new 6950/6970 and use triple monitor and also use a 4th screen extended at the same time? I have 3 matching dell native display port monitors and a fourth with vga/dvi. Can I use the 2 dp's and the 2 dvi's on the 6970 at the same time? I have been looking for this answer for hours and can't find it! Thanks for the help.