Closing Thoughts

Unlike our normal GPU reviews, looking at multi-GPU scaling in particular is much more about the tests than it is architectures. With AMD and NVIDIA both using the same basic alternate frame rendering strategy, there's not a lot to separate the two on the technology side. Whether a game scales poorly or well has much more to do with the game than the GPU.

  Radeon HD 6970 GeForce GTX 580
GPUs 1->2 2->3 1->3 1->2 2->3 1->3
Average Avg. FPS Gain 185% 127% 236% 177% 121% 216%
Average Min. FPS Gain 196% 140% 274% 167% 85% 140%

In terms of average FPS gains for two GPUs, AMD has the advantage here. It’s not much of an advantage at under 10%, but it is mostly consistent. The same can be said for three GPU setups, where the average gain for a three GPU setup versus a two GPU setup nets AMD a 127% gain versus 121% for NVIDIA. The fact that the Radeon HD 6970 is normally the weaker card in a single-GPU configuration makes things all the more interesting though. Are we seeing AMD close the gap thanks to CPU bottlenecks, or are we really looking at an advantage for AMD’s CrossFire scaling? One thing is for certain, CrossFire scaling has gotten much better over the last year – at the start of 2010 these numbers would not have been nearly as close.

Overall the gains for SLI or CrossFire in a dual-GPU configuration are very good, which fits well with the fact that most users will never have more than two GPUs. Scaling is heavily game dependent, but on average it’s good enough that you’re getting your money’s worth from a second video card. Just don’t expect perfect scaling in more than a handful of games.

As for triple-GPU setups, the gains are decent, but on average it’s not nearly as good. A lot of this has to do with the fact that some games simply don’t scale beyond two GPUs at all – Civilization V always comes out as a loss, and the GPU-heavy Metro 2033 only makes limited gains at best. Under a one monitor setup it’s hard to tell if this is solely due to poor scaling or due to CPU limitations, but CPU limitations alone do not explain it all. There are a couple of cases where a triple-GPU setup makes sense when paired with a single monitor, particularly in the case of Crysis, but elsewhere framerates are quite high after the first two GPUs with little to gain from a 3rd GPU. I believe super sample anti-aliasing is the best argument for a triple-GPU setup with one monitor, but at the same time that restricts our GPU options to NVIDIA as they’re the only one with DX10/DX11 SSAA.

Minimum framerates with three GPUs does give us a reason to pause for a moment and ponder some things. For the games we do collect minimum framerate data for – Crysis and Battlefield: Bad Company 2 – AMD has a massive lead in minimum framerates. In practice I don’t completely agree with the numbers, and it’s unfortunate that most games don’t generate consistent enough minimum framerates to be useful. From the two games we do test AMD definitely has an advantage, but having watched and played a number of games I don’t believe this is consistent for every game. I suspect the games we can generate consistent data for are the ones that happen to favor the 6970, and likely because of the VRAM advantage at that.

Ultimately triple-GPU performance and scaling cannot be evaluated solely on a single monitor, which is why we won’t be stopping here. Later this month we’ll be looking at triple-GPU performance in a 3x1 multi-monitor configuration, which should allow us to put more than enough load on these setups to see what flies, what cracks under the pressure, and whether multi-GPU scaling can keep pace with such high resolutions. So until then, stay tuned.

Mass Effect 2, Wolfenstein, and Civ V Compute
Comments Locked

97 Comments

View All Comments

  • DanNeely - Sunday, April 3, 2011 - link

    Does DNetc not have 4xx/5xx nVidia applications yet?
  • Pirks - Sunday, April 3, 2011 - link

    They have CUDA 3.1 clients that work pretty nice with Fermi cards. Except that AMD cards pwn them violently, we're talking about an order of magnitude difference between 'em. Somehow RC5-72 code executes 10x faster on AMD than on nVidia GPUs, I could never find precise explanation why, must be related to poor match between RC5 algorithm and nVidia GPU architecture or something.

    I crack RC5-72 keys on my AMD 5850 and it's amost 2 BILLION keys per second. Out of 86,000+ participants my machine is ranked #43 from the top (in daily stats graph but still, man...#43! I gonna buy two 5870 sometime and my rig may just make it to top 10!!! out of 86,000!!! this is UNREAL man...)

    On my nVidia 9800 GT I was cracking like 146 million keys per second, this very low rate is soo shameful compared to AMD :)))
  • DanNeely - Monday, April 4, 2011 - link

    It's not just dnetc, ugly differences in performance also show up in the milkeyway@home and collatz conjecture projects on the boinc platform. They're much larger that the 1/8 vs 1/5(1/4 in 69xx?) differences in FP64/FP32 between would justify; IIRC both are about 5:1 in AMD's favor.
  • Ryan Smith - Sunday, April 3, 2011 - link

    I love what the Dnet guys do with their client, and in the past it's been a big help to us in our articles, especially on the AMD side.

    With that said, it's a highly hand optimized client that almost perfectly traces theoretical performance. It doesn't care about cache, it doesn't care about memory bandwidth; it only cares about how many arithmetic operations can be done in a second. That's not very useful to us; it doesn't tell us anything about the hardware.

    We want to stick to distributed computing clients that have a single binary for both platforms, so that we're looking at the performance of a common OpenCL/DirectCompute codepath and how it performs on two different GPUs. The Dnet client just doesn't meet that qualification.
  • tviceman - Sunday, April 3, 2011 - link

    Ryan are you going to be using the nvidia 270 drivers in future tests? I know they're beta, but it looks like you aren't using WHQL AMD drivers either (11.4 preview).
  • Ryan Smith - Sunday, April 3, 2011 - link

    Yes, we will. The benchmarking for this article was actually completed shortly after the GTX 590, so by the time NVIDIA released the 270 drivers it was already being written up.
  • ajp_anton - Sunday, April 3, 2011 - link

    When looking at the picture of those packed sardines, I had an idea.
    Why don't the manufacturers make the radial fan hole go all the way through the card? With three or four cards tightly packed, the middle card(s) will still have some air coming through the other cards, assuming the holes are aligned.
    Even with only one or two cards, the (top) card will have access to more fresh air than before.
  • semo - Sunday, April 3, 2011 - link

    Correct me if I'm wrong but the idea is to keep the air flowing through the shroud body and not through and through the fans. I think this is a moot point though as I can't see anyone using a 3x GPU config without water cooling or something even more exotic.
  • casteve - Sunday, April 3, 2011 - link

    "It turns out adding a 3rd card doesn’t make all that much more noise."

    Yeah, I guess if you enjoy 60-65dBA noise levels, the 3rd card won't bother you. Wouldn't it be cheaper to just toss a hairdryer inside your PC? You'd get the same level of noise and room heater effect. ;)
  • slickr - Sunday, April 3, 2011 - link

    I mean mass effect 2 is a console port, Civilisation 5 is the worst game to choose to benchmark as its a turn based game and not real time and Hawx is 4 years outdated and basically game that Nvidia made, its not even funny anymore seeing how it gives the advantage to Nvidia cards every single time.

    Replace with:
    Crysis warhead to Aliens vs predator
    battleforge to shogun 2 total war
    hawx to Shift 2
    Civilization 5 to Starcraft 2
    Mass Effect 2 with Dead Space 2
    Wolfeinstein to Arma 2
    +add Mafia 2

Log in

Don't have an account? Sign up now