Ashes GPU Performance: Single & Mixed High-End GPUs

Since seeing is believing, we’ll start things off with a look at some direct captured recordings of Ashes running the built-in benchmark. These recordings showcase a Radeon R9 Fury X and a GeForce GTX 980 Ti in AFR mode, with each video swapping the primary and secondary video card. Both videos are captures with our 2560x1440 settings and with v-sync on, though YouTube limits 60fps videos to 1080p at this time.

Overall you’d be hard-pressed to find a difference between the two videos. No matter which GPU is primary, both setups render correctly and without issue, showcasing that DirectX 12 explicit multi-adapter and Ashes’ AFR implementation on top of it is working as expected.

Diving into our results then, let’s start with performance at 2560x1440.

Ashes of the Singularity (Alpha) - 2560x1440 - High Quality - 2x MSAA

It’s interesting to note that when picking these settings, the settings were chosen first and the cards second. So the fact that the GTX 980 Ti and R9 Fury X end up being so close in average performance comes as a pleasant surprise. With AFR performance gains dependent in part on how similar the two cards are, this should give us a better look at performance than cards that differ widely in performance.

In any case what we find is that with a single card setup the GTX 980 Ti and R9 Fury X are within 5% of each other with the older driver sets we needed to use for AFR compatibility. Both AMD and NVIDIA do see some performance gains with newer driver sets, with NVIDIA picking up 8% while AMD picks up 5%.

But more importantly let’s talk about mutli-GPU setups. If everything is working correctly and there are no unexpected bottlenecks, then on paper in mixed GPU setups we should get similar results no matter which card is the primary. And indeed that’s exactly what we find here, with only 1.4fps (2%) separating the GeForce + Radeon setups. Using the Radeon Fury X as the primary card gets the best results at 70.8fps, while swapping the order to let the GTX 980 Ti lead gives us 69.4fps.

However would you believe that the mixed GPU setups are faster than the homogenous setups? Trailing the mixed setups is the R9 Fury X + R9 Fury setup, averaging 67.1fps and trailing the slower mixed setup by 3.5%. Slower still – and unexpectedly so – is the GTX 980 Ti + GTX Titan X setup, which averages just 61.6fps, some 12% slower than the GTX 980 Ti + Fury X setup. The card setup is admittedly somewhat unusual here – in order to consistently use the GTX 980 Ti as the primary card we had to make the secondary card a GTX Titan X, seeing as how we don’t have another GTX 980 Ti or a third-tier GM200 card comparable to the R9 Fury – but even so the only impact here should be that the GTX Titan X doesn’t get to stretch its legs quite as much since it needs to wait on the slightly slower GTX 980 Ti primary in order to stay in sync.

Ashes of the Singularity (Alpha) - 2560x1440 - Multi-GPU Perf. Gains

Looking at performance from the perspective of overall performance gains, the extra performance we see here isn’t going to be chart-topping – an optimized SLI/CF setup can get better than 80% gains – but overall the data here confirms our earlier raw results: we’re actually seeing a significant uptick in performance with the mixed GPU setups. R9 Fury X + GTX 980 Ti is some 75% faster than a single R9 Fury X while GTX 980 Ti + R9 Fury X is 64% faster than a single GTX 980 Ti. Meanwhile the dual AMD setup sees a 66% performance gain, followed by the dual NVIDIA setup at only 46%.

The most surprising thing about all of this is that the greatest gains are with the mixed GPU setups. It’s not immediately clear why this is – if there’s something more efficient about having each vendor and their drivers operating one GPU instead of two – or if AMD and NVIDIA are just more compatible than either company cares to admit. Either way this shows that even with Ashes’ basic AFR implementation, multi-adapter rendering is working and working well. Meanwhile the one outlier, as we briefly discussed before, is the dual NVIDIA setup, which just doesn’t scale quite as well.

Finally let’s take a quick look at the GPU utilization statistics from MSI Afterburner, with our top-performing R9 Fury X + GTX 980 Ti setup. Here we can see the R9 Fury X in its primary card role operate at near-100% load the entire time, while the GTX 980 Ti secondary card is averaging close to 90% utilization. The difference, as best as we can tell, is the fact that the secondary card has to wait on additional synchronization information from the primary card, while the primary card is always either rendering or reading in a frame from the secondary card.

3840x2160

Now we’ll kick things up a notch by increasing the resolution to 3840x2160.

Ashes of the Singularity (Alpha) - 3840x2160 - High Quality - 2x MSAA

Despite the fact that this was just a resolution increase, the performance landscape has shifted by more than we would expect here. The top configuration is still the mixed GPU configuration, with the R9 Fury X + GTX 980 Ti setup taking the top spot. However the inverse configuration of the GTX 980 Ti + R9 Fury X isn’t neck-and-neck this time, rather it’s some 15% slower. Meanwhile the pokey-at-2560 dual NVIDIA setup is now in second place by a hair, trailing the mixed GPU setup by 10%. Following that at just 0.1fps lower is the dual AMD setup with 46.7fps.

These tests are run multiple times and we can consistently get these results, so we are not looking at a fluke. At 3840x2160 the fastest setup by a respectable 10.5% margin is the R9 Fury X + GTX 980 Ti setup, which for an experimental implementation of DX12 unlinked explicit multi-adapter is quite astonishing. “Why not both” may be more than just a meme here…

From a technical perspective the fact that there’s now a wider divergence between the mixed GPU setups is unexpected, but not necessarily irrational. If the R9 Fury X is better at reading shared resources than the GTX 980 Ti – be it due to hardware differences, driver differences, or both – then that would explain what we’re seeing. Though at the same time we can’t rule out the fact that this is an early tech demo of this functionality.

Ashes of the Singularity (Alpha) - 3840x2160 - Multi-GPU Perf. Gains

As you’d expect from those raw numbers, the best perf gains are with the R9 Fury X + GTX 980 Ti, which picks up 66% over the single-GPU setups. After that the dual AMD GPU setup picks up 50%, the dual NVIDA setup a more respectable 46%, and finally the mixed GTX 980 Ti + R9 Fury X setup tops out with a 40% performance gain.

Finally, taking a quick look of GPU frametimes as reported by Ashes internal performance counters, we can see that across all of the mutli-GPU setups there’s a bit of a consistent variance going on. Overall there’s at least a few milliseconds difference between the alternating frames, with the dual NVIDIA setup faring the best while the dual AMD setup fares the worst. Otherwise the performance leader, the R9 Fury X + GTX 980 Ti, only averages a bit more variance than the dual NVIDIA setup. At least in this one instance, there’s some evidence to suggest that NVIDIA secondary cards have a harder time supplying frame data to the primary card (regardless of its make), while NVIDIA and AMD primary cards are similar in read performance.

Ashes of the Singularity: Unlinked Explicit Multi-Adapter w/AFR & The Test Ashes GPU Performance: Single & Mixed 2012 GPUs
POST A COMMENT

180 Comments

View All Comments

  • jimjamjamie - Tuesday, October 27, 2015 - link

    [pizza-making intensifies] Reply
  • geniekid - Monday, October 26, 2015 - link

    On one hand the idea of unlinked EMA is awesome. On the other hand, I have to believe 95% of developers will shy away from implementing anything other than AFR in their game due to the sheer amount of effort the complexity would add to their QA/debugging process. If Epic manages to pull off their post-processing offloading I would be very impressed. Reply
  • DanNeely - Monday, October 26, 2015 - link

    I'd guess it'd be the other way around. SLI/XFire AFR is complicated enough that it's normally only done for big budget AAA games. Other than replacing two vendor APIs with a single OS API DX12 doesn't seem to offer a whole lot of help there; so I don't expect to see a lot change.

    Handing off the tail end of every frame seems simpler; especially since the frame pacing difficulties that make AFR so hard and require a large amount of per game work won't be a factor. This sounds like something that could be baked into the engines themselves, and that shouldn't require a lot of extra work on the game devs part. Even if it ends up only being a modest gain for those of us with mid/high end GPUs; it seems like it could end up being an almost free gift.
    Reply
  • nightbringer57 - Monday, October 26, 2015 - link

    That's only half relevant.
    I wonder how much can be implemented at the engine level. This kind of thing may be at least partially transparent to devs if says Unreal Engine and Unity get compatibility for it... I don't know how much it can do, though.
    Reply
  • andrewaggb - Monday, October 26, 2015 - link

    Agreed, I would hope that if the Unreal Engine, Unity, Frostbite etc support it that maybe 50% or more of new games will support it.

    We'll have to see though. The idea of having both an AMD and Nvdia card in the same machine is both appealing and terrifying. Occasionally games work better on one than the other, so you might avoid some pain sometimes, but I'm sure you'd get a whole new set of problems sometimes as well.

    I think making use of the iGPU and discrete cards is probably the better scenario to optimize for. (Like Epic is apparently doing)
    Reply
  • Gigaplex - Monday, October 26, 2015 - link

    Problems such as NVIDIA intentionally disabling PhysX when an AMD GPU is detected in the system, even if it's not actively being used. Reply
  • Friendly0Fire - Monday, October 26, 2015 - link

    It really depends on a lot of factors I think, namely how complex the API ends up being.

    For instance, I could really see shadow rendering being offloaded to one GPU. There's minimal crosstalk between the two GPUs, the shadow renderer only needs geometry and camera information (quick to transfer/update) and only outputs a single frame buffer (also very quick to transfer), yet the process of shadow rendering is slow and complex and requires extremely high bandwidth internally, so it'd be a great candidate for splitting off.

    Then you can also split off the post-processing to the iGPU and you've suddenly shaved maybe 6-8ms off your frame time.
    Reply
  • Oogle - Monday, October 26, 2015 - link

    Yikes. Just one more exponential factor to add when doing benchmarks. More choice is great for us consumers. But reviews and comparisons are going to start looking more complicated. I'll be interested to see how guys will make recommendations when it comes to multi-gpu setups. Reply
  • tipoo - Monday, October 26, 2015 - link

    Wow, seems like a bigger boost than I had anticipated. Will be nice to see all that unused silicon (in dGPU environments) getting used. Reply
  • gamerk2 - Monday, October 26, 2015 - link

    As this test is a smaller number of combinations it’s not clear where the bottlenecks are, but it’s none the less very interesting how we get such widely different results depending on which card is in the lead. In the GTX 680 + HD 7970 setup, either the GTX 680 is a bad leader or the HD 7970 is a bad follower, and this leads to this setup spinning its proverbial wheels. Otherwise letting the HD 7970 lead and GTX 680 follow sees a bigger performance gain than we would have expected for a moderately unbalanced setup with a pair of cards that were never known for their efficient PCIe data transfers. So long as you let the HD 7970 lead, at least in this case you could absolutely get away with a mixed GPU pairing of older GPUs.


    Drivers. Pretty much that simple. Odds are, the NVIDIA drivers are treating the HD 7970 the same way it's treating the 680 GTX, which will result in performance problems. AMD and NVIDIA use very different GPU architectures, and you're seeing it here. NVIDIA is probably attempting to utilize the 7970 in a way it just can't handle.

    I'd be very interested to see something like 680/Titan, or some form of lower/newer setup, which is what most people would actually use this for (GPU upgrade).
    Reply

Log in

Don't have an account? Sign up now