DirectX 12 Multi-GPU Performance

Shifting gears, let’s take a look at multi-GPU performance on the latest Ashes beta. The focus of our previous article, Ashes’ support for DX12 explicit multi-GPU makes it the first game to support the ability to pair up RTG and NVIDIA GPUs in an AFR setup. Like traditional same-vendor AFR configurations, Ashes’ AFR setup works best when both GPUs are similar in performance, so although this technology does allow for some unusual cross-vendor comparisons, it does not (yet) benefit from pairing up GPUs that widely differ in performance, such as a last-generation video card with a current-generation video card. None the less, running a Radeon and a GeForce card together is an interesting sight, if only for the sheer audacity of it.

Meanwhile as a result of the significant performance optimizations between the last beta build and this latest build, this has also had an equally significant knock-on effect on mutli-GPU performance as compared to the last time we looked at the game.

Ashes of the Singularity (Beta) - 3840x2160 - High Quality - MGPU

Even at 4K a pair of GPUs ends up being almost too much at Ashes’ High quality setting. All four multi-GPU configurations are over 60fps, with the fastest Fury X + 980 Ti configuration nudging past 70fps. Meanwhile the lead over our two fastest single-GPU configurations is not especially great, particularly compared to the Fury X, with the Fury X + 980 Ti configuration only coming in 15fps (27%) faster than a single GPU. The all-NVIDIA comparison does fare better in this regard, but only because of GTX 980 Ti’s lower initial performance.

Digging deeper, what we find is that even at 4K we’re actually CPU limited according to the benchmark data. Across all four multi-GPU configurations, our hex-core overclocked Core i7-4960X can only setup frames at roughly 70fps, versus 100fps+ for a single-GPU configuration.


Top: Fury X. Bottom: Fury X + 980 Ti

The increased CPU load from utilizing multi-GPU is to be expected, as the CPU now needs to spend time synchronizing the GPUs and waiting on them to transfer data between each other. However dropping to 70fps means that Ashes has become a surprisingly heavy CPU test as well, and that 4K at high quality alone isn’t enough to max out our dual GPU configurations.

Ashes of the Singularity (Beta) - 3840x2160 - Extreme Quality - MGPU

Cranking up the quality setting to Extreme finally gives our dual-GPU configurations enough of a workload to back off from the CPU performance cap. Once again the fastest configuration is the Fury X + 980 Ti, which lands just short of 60fps, followed by the Fury X + Fury configuration at 55.1fps. In our first look at Ashes multi-GPU scaling we found that having a Fury X card as the lead card resulted in better performance, and this has not changed for the newest beta. The Fury continues to be faster at reading data off of other cards. Still, the gap between the Fury X + 980 Ti configuration and the 980 Ti + Fury X configuration has closed some as compared to last time, and now stands at 11%.

Backing off from the CPU limit has also put the multi-GPU configurations well ahead of the single-GPU configurations. We’re now looking at upwards of a 65% performance boost versus a single GTX 980, and a smaller 31% performance boost versus a single Fury X. These are smaller gains for multi-GPU configurations than we first saw last year, but it’s also very much a consequence of Ashes’ improved performance across the board. Though we didn’t have time to test it, Ashes does have one higher quality setting – Crazy – which may drive a bit of a larger wedge between the multi-GPU configurations and the Fury X, though the overhead of synchronization will always present a roadblock.

DirectX 12 Single-GPU Performance DirectX 12 vs. DirectX 11
Comments Locked

153 Comments

View All Comments

  • Koenig168 - Wednesday, February 24, 2016 - link

    There is a brief mention of GTX 680 2GB "CPU memory limitations". I take it you mean "VRAM memory limitations". It would be interesting to know if this can be overcome by DX12 memory stacking, either a pair of GTX 680s or the GTX 690.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    That was meant to be "GPU memory limitations", thanks for the catch.
  • B3an - Wednesday, February 24, 2016 - link

    Why is Beta 2 still not available on Steam? Have the media got early access? At the time of posting this there's still only Beta 1 available.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    It's out to the public tomorrow.
  • hemipepsis5p - Wednesday, February 24, 2016 - link

    Hey, so I'm confused by the mixed GPU testing. I thought that both cards had to be the same in order to run them in SLI/Crossfire? How did they test a Fury X + 980Ti?
  • Ext3h - Wednesday, February 24, 2016 - link

    That's no longer the case with DX12. It used to be like this with DX11 and earlier versions, when the driver decided if/how to split the workload onto multiple GPUs, but with DX12 that choice is now up to the application.

    So if the developer chooses to support asymmetric configurations, even cross vendor or exotic combinations like Intel IGP + AMD dGPU, then it can be made to work.
  • anubis44 - Thursday, February 25, 2016 - link

    I'm willing to bet that nVidia's Maxwell cards can't use DX12's async compute at all, and they're falling back to the DX11 code path, even when you 'enable' DX12 for them.
  • Ext3h - Thursday, February 25, 2016 - link

    You loose that bet.

    The asynchronous compute term only defines how tasks are synchronized against each other, whereby the "asynchronous" term only states tasks won't block while waiting for each other. The default of doing that in software, in order to create a sequential schedule, is perfectly legit and fulfills the specification in whole.

    Hardware support isn't required for this feature at all, even though you *can* optionally use hardware to perform much better than the software solution. Parallel execution does require hardware support and can bring an huge performance boost, but "asynchronous compute" does not specify that parallel execution would be required.
  • BradGrenz - Thursday, February 25, 2016 - link

    The whole point of async compute is to take advantage of parallel execution. It doesn't matter what nVidia's drivers tell an application, if it accepts these commands but is forced to reorder them for serial execution because the hardware can do nothing else then it doesn't really support the technology at all. It's be like claiming support for texture compression even though your driver has to decompress every texture to an uncompressed format before the GPU can read it. It doesn't matter if the application thinks compressed textures are being used if the hardware actually provides none of the benefits the technology intended (in this case more/larger textures in a given amount of VRAM, and in the case of async compute, more efficient utilization of shader ALUs).
  • Sajin - Thursday, February 25, 2016 - link

    "Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled."

    Source: http://www.anandtech.com/show/10067/ashes-of-the-s...

Log in

Don't have an account? Sign up now