DirectX 12 Multi-GPU Performance

Shifting gears, let’s take a look at multi-GPU performance on the latest Ashes beta. The focus of our previous article, Ashes’ support for DX12 explicit multi-GPU makes it the first game to support the ability to pair up RTG and NVIDIA GPUs in an AFR setup. Like traditional same-vendor AFR configurations, Ashes’ AFR setup works best when both GPUs are similar in performance, so although this technology does allow for some unusual cross-vendor comparisons, it does not (yet) benefit from pairing up GPUs that widely differ in performance, such as a last-generation video card with a current-generation video card. None the less, running a Radeon and a GeForce card together is an interesting sight, if only for the sheer audacity of it.

Meanwhile as a result of the significant performance optimizations between the last beta build and this latest build, this has also had an equally significant knock-on effect on mutli-GPU performance as compared to the last time we looked at the game.

Ashes of the Singularity (Beta) - 3840x2160 - High Quality - MGPU

Even at 4K a pair of GPUs ends up being almost too much at Ashes’ High quality setting. All four multi-GPU configurations are over 60fps, with the fastest Fury X + 980 Ti configuration nudging past 70fps. Meanwhile the lead over our two fastest single-GPU configurations is not especially great, particularly compared to the Fury X, with the Fury X + 980 Ti configuration only coming in 15fps (27%) faster than a single GPU. The all-NVIDIA comparison does fare better in this regard, but only because of GTX 980 Ti’s lower initial performance.

Digging deeper, what we find is that even at 4K we’re actually CPU limited according to the benchmark data. Across all four multi-GPU configurations, our hex-core overclocked Core i7-4960X can only setup frames at roughly 70fps, versus 100fps+ for a single-GPU configuration.


Top: Fury X. Bottom: Fury X + 980 Ti

The increased CPU load from utilizing multi-GPU is to be expected, as the CPU now needs to spend time synchronizing the GPUs and waiting on them to transfer data between each other. However dropping to 70fps means that Ashes has become a surprisingly heavy CPU test as well, and that 4K at high quality alone isn’t enough to max out our dual GPU configurations.

Ashes of the Singularity (Beta) - 3840x2160 - Extreme Quality - MGPU

Cranking up the quality setting to Extreme finally gives our dual-GPU configurations enough of a workload to back off from the CPU performance cap. Once again the fastest configuration is the Fury X + 980 Ti, which lands just short of 60fps, followed by the Fury X + Fury configuration at 55.1fps. In our first look at Ashes multi-GPU scaling we found that having a Fury X card as the lead card resulted in better performance, and this has not changed for the newest beta. The Fury continues to be faster at reading data off of other cards. Still, the gap between the Fury X + 980 Ti configuration and the 980 Ti + Fury X configuration has closed some as compared to last time, and now stands at 11%.

Backing off from the CPU limit has also put the multi-GPU configurations well ahead of the single-GPU configurations. We’re now looking at upwards of a 65% performance boost versus a single GTX 980, and a smaller 31% performance boost versus a single Fury X. These are smaller gains for multi-GPU configurations than we first saw last year, but it’s also very much a consequence of Ashes’ improved performance across the board. Though we didn’t have time to test it, Ashes does have one higher quality setting – Crazy – which may drive a bit of a larger wedge between the multi-GPU configurations and the Fury X, though the overhead of synchronization will always present a roadblock.

DirectX 12 Single-GPU Performance DirectX 12 vs. DirectX 11
Comments Locked

153 Comments

View All Comments

  • extide - Wednesday, February 24, 2016 - link

    If you are CPU limited, and it's using lots of threads, then yeah more cores would be faster. They were CPU limited on an overclocked 4960X, which is no slouch, that was very surprising!
  • rhysiam - Wednesday, February 24, 2016 - link

    I agree that will be very interesting. I'm surprised more hasn't been made of the seemingly pretty hard CPU limit to ~70fps, irrespective of the detail settings or resolution. And that on a still very capable 4960X @ 4.2Ghz. If we estimate Skylake has a 20% IPC advantage, that would still see the current top tier 6700K (at stock) maxing out in the mid 80s, a long way short of what you might like on a 144hz monitor. Does that mean a brand new quad core CPU like the i5 6400 with its low base clock might struggle to sustain 60fps, even on lower detail settings?

    I realise this is beta and all preliminary, but it's interesting nonetheless.
  • DanNeely - Wednesday, February 24, 2016 - link

    Does DX12 Multi-adapter offer any benefits with cards that are mismatched in performance? I'm currently running a GTX 980 in my main PC and also have an older GTX 770 sitting around; would pairing them offer any speedup over just the 980, or would the faster card end up held back by the slower one?

    I'd be equally interested in seeing how AMD does with significantly mismatched GPUs; since they've been trying (with varying degrees of success) to push XFire between their IGPs and the significantly faster chips in midrange Radeon cards.
  • BigLan - Wednesday, February 24, 2016 - link

    The article has a quote from the developer about using mismatched cards...
    "For example, you will never get more than twice the speed of the slowest video card. You would be better off just using the new card alone."

    You might get some benefit, but likely not that much.
  • Friendly0Fire - Wednesday, February 24, 2016 - link

    I think that's rather narrow minded and way too absolute. Mismatched cards can be used to their full potential, but you'd need some smart coding to make it so. For instance, you could offload some of the work to the weaker GPU, keeping the stronger one for the main rendering.

    One excellent example which would fully utilize two mismatched cards is VR: multiadapter rendering would be used to offload the VR projection and transformation steps to the integrated GPU in most modern CPUs, while the main GPU would do the regular rendering. The data transfer requirement is minimal, but there's a fair amount of computations required, making it an ideal scenario.

    Other examples include doing post-processing on the weaker card (SSAO, subsurface scattering, screenspace reflections, etc.). The big problem is judging just how much work should be offloaded to the secondary GPU - just detecting the hardware would be extremely laborious.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    It's a correct description for how Ashes works. They implement a (relatively) straightforward AFR setup, so the cards need to be similar in performance.
  • Senti - Wednesday, February 24, 2016 - link

    What Multi-adapter does is left completely to developer. In some cases it can give you nothing, in others every bit of hardware can be useful including iGPU.
  • extide - Wednesday, February 24, 2016 - link

    Their current implementation is AFR, so the performance of the cards should be as close to identical as possible. In the future I think they may plan on offloading some of the raw compute onto a second GPU, and in that case an older slower GPU would be beneficial.
  • Drumsticks - Wednesday, February 24, 2016 - link

    These are always interesting results to see. I'm pretty excited for Polaris - I can't wait to pickup a higher end GPU to replace my old, old 7850.
  • mattevansc3 - Wednesday, February 24, 2016 - link

    Isn't Oxide's statement that they don't optimise for certain hardware a bit disingenuous?

    If you read their developer diaries not only was AoS built around Mantle, not only was the engine built upon Mantle but they've stated that they developed more of Mantle than AMD did.

    Before DX12 was even announced Oxide were working directly with AMD and building AoS to champion Mantle and take advantage of it a low level while only supporting nVidia hardware on DX11. That of course will automatically bias results in favour of RTG even if there is no intention to do so at this stage.

Log in

Don't have an account? Sign up now