Mass Effect 2, Wolfenstein, and Civ V Compute

Mass Effect 2 is a game we figured would be GPU limited by three GPUs, so it’s quite surprising that it’s not. It does look like there’s a limit at around 200fps, but we can’t hit that at 2560 even with three GPUs. You can be quite confident with two or more GPUs however that your framerates will be nothing short of amazing.

For that reason, and because ME2 is a DX9-only game, we also gave it a shot with SSAA on both the AMD and NVIDIA setups at 1920. Surprisingly it’s almost fluid in this test even with one GPU. Move to two GPUs and we’re looking at 86fps – again this is with 4x super sampling going on. I don’t think we’re too far off from being able to super sample a number of games (at least the console ports) with this kind of performance.

Wolfenstein is quite CPU limited even with two GPUs, so we didn’t expect much with three GPUs. In fact the surprising bit wasn’t the performance, it was the fact that AMD’s drivers completely blew a gasket with this game. It runs fine with two GPUs, but with three GPUs it will crash almost immediately after launching it. Short of a BSOD, this is the worst possible failure mode for an AMD setup, as AMD does not provide individual game settings for CF, unlike NVIDIA who allows for the enabling/disabling of SLI on a game-specific basis. As a result the only way to play Wolfenstein if you had a triple-GPU setup is to change CrossFire modes globally, which requires a hardware reconfiguration that takes several seconds and a couple of blank screens.

We only have one OpenGL game in our suite so we can’t isolate this as an AMD OpenGL issue or solely an issue with Wolfenstein. It’s disappointing to see AMD have this problem though.

We don’t normally look at multi-GPU numbers with our Civilization V compute test, but in this case we had the data so we wanted to throw it out there as an example of where SLI/CF and the concept of alternate frame rendering just doesn’t contribute much to a game. Texture decompression needs to happen on each card, so it can’t be divided up as rendering can. As a result additional GPUs reduce NVIDIA’s score, while two GPUs does end up helping AMD some only for a 3rd GPU to bring scores crashing down. None of this scores are worth worrying about – it’s still more than fast enough for the leader scenes the textures are for, but it’s a nice theoretical example.

  Radeon HD 6970 GeForce GTX 580
GPUs 1->2 2->3 1->3 1->2 2->3 1->3
Mass Effect 2 180% 142% 158% 195% 139% 272%
Mass Effect 2 SSAA 187% 148% 280% 198% 138% 284%
Wolfenstein 133% 0% 0% 151% 96% 145%

Since Wolfenstein is so CPU limited, the scaling story out of these games is really about Mass Effect 2. Again dual-GPU scaling is really good, both with MSAA and SSAA; NVIDIA in particular achieves almost perfect scaling. What makes this all the more interesting is that with three GPUs the roles are reversed, scaling is still strong but now it’s AMD achieving almost perfect scaling on Mass Effect 2 with SSAA, which is quite a feat given the uneven scaling of triple-GPU configurations overall. It’s just a shame that AMD doesn’t have a SSAA mode for DX10/DX11 games; if it was anything like their DX9 SSAA mode, it could certainly sell the idea of a triple GPU setup to users looking to completely eliminate all forms of aliasing at any price.

As for Wolfenstein, with two GPUs NVIDIA has the edge, but they also had the lower framerate in the first place. Undoubtedly being CPU limited even with two GPUs, there’s not much to draw from here.

Civ V, Battlefield, STALKER, and DIRT 2 Closing Thoughts
Comments Locked

97 Comments

View All Comments

  • DanNeely - Sunday, April 3, 2011 - link

    Does DNetc not have 4xx/5xx nVidia applications yet?
  • Pirks - Sunday, April 3, 2011 - link

    They have CUDA 3.1 clients that work pretty nice with Fermi cards. Except that AMD cards pwn them violently, we're talking about an order of magnitude difference between 'em. Somehow RC5-72 code executes 10x faster on AMD than on nVidia GPUs, I could never find precise explanation why, must be related to poor match between RC5 algorithm and nVidia GPU architecture or something.

    I crack RC5-72 keys on my AMD 5850 and it's amost 2 BILLION keys per second. Out of 86,000+ participants my machine is ranked #43 from the top (in daily stats graph but still, man...#43! I gonna buy two 5870 sometime and my rig may just make it to top 10!!! out of 86,000!!! this is UNREAL man...)

    On my nVidia 9800 GT I was cracking like 146 million keys per second, this very low rate is soo shameful compared to AMD :)))
  • DanNeely - Monday, April 4, 2011 - link

    It's not just dnetc, ugly differences in performance also show up in the milkeyway@home and collatz conjecture projects on the boinc platform. They're much larger that the 1/8 vs 1/5(1/4 in 69xx?) differences in FP64/FP32 between would justify; IIRC both are about 5:1 in AMD's favor.
  • Ryan Smith - Sunday, April 3, 2011 - link

    I love what the Dnet guys do with their client, and in the past it's been a big help to us in our articles, especially on the AMD side.

    With that said, it's a highly hand optimized client that almost perfectly traces theoretical performance. It doesn't care about cache, it doesn't care about memory bandwidth; it only cares about how many arithmetic operations can be done in a second. That's not very useful to us; it doesn't tell us anything about the hardware.

    We want to stick to distributed computing clients that have a single binary for both platforms, so that we're looking at the performance of a common OpenCL/DirectCompute codepath and how it performs on two different GPUs. The Dnet client just doesn't meet that qualification.
  • tviceman - Sunday, April 3, 2011 - link

    Ryan are you going to be using the nvidia 270 drivers in future tests? I know they're beta, but it looks like you aren't using WHQL AMD drivers either (11.4 preview).
  • Ryan Smith - Sunday, April 3, 2011 - link

    Yes, we will. The benchmarking for this article was actually completed shortly after the GTX 590, so by the time NVIDIA released the 270 drivers it was already being written up.
  • ajp_anton - Sunday, April 3, 2011 - link

    When looking at the picture of those packed sardines, I had an idea.
    Why don't the manufacturers make the radial fan hole go all the way through the card? With three or four cards tightly packed, the middle card(s) will still have some air coming through the other cards, assuming the holes are aligned.
    Even with only one or two cards, the (top) card will have access to more fresh air than before.
  • semo - Sunday, April 3, 2011 - link

    Correct me if I'm wrong but the idea is to keep the air flowing through the shroud body and not through and through the fans. I think this is a moot point though as I can't see anyone using a 3x GPU config without water cooling or something even more exotic.
  • casteve - Sunday, April 3, 2011 - link

    "It turns out adding a 3rd card doesn’t make all that much more noise."

    Yeah, I guess if you enjoy 60-65dBA noise levels, the 3rd card won't bother you. Wouldn't it be cheaper to just toss a hairdryer inside your PC? You'd get the same level of noise and room heater effect. ;)
  • slickr - Sunday, April 3, 2011 - link

    I mean mass effect 2 is a console port, Civilisation 5 is the worst game to choose to benchmark as its a turn based game and not real time and Hawx is 4 years outdated and basically game that Nvidia made, its not even funny anymore seeing how it gives the advantage to Nvidia cards every single time.

    Replace with:
    Crysis warhead to Aliens vs predator
    battleforge to shogun 2 total war
    hawx to Shift 2
    Civilization 5 to Starcraft 2
    Mass Effect 2 with Dead Space 2
    Wolfeinstein to Arma 2
    +add Mafia 2

Log in

Don't have an account? Sign up now