The Performance Impact of Asynchronous Shading

Finally, let’s take a look at Ashes’ latest addition to its stable of DX12 headlining features; asynchronous shading/compute. While earlier betas of the game implemented a very limited form of async shading, this latest beta contains a newer, more complex implementation of the technology, inspired in part by Oxide’s experiences with multi-GPU. As a result, async shading will potentially have a greater impact on performance than in earlier betas.

Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled.

Ashes of the Singularity (Beta) - High Quality - Async Shader Performance

Since async shading is turned on by default in Ashes, what we’re essentially doing here is measuring the penalty for turning it off. Not unlike the DirectX 12 vs. DirectX 11 situation – and possibly even contributing to it – what we find depends heavily on the GPU vendor.

Ashes of the Singularity (Beta) - High Quality - Async Shading Perf. Gain

All NVIDIA cards suffer a minor regression in performance with async shading turned on. At a maximum of -4% it’s really not enough to justify disabling async shading, but at the same time it means that async shading is not providing NVIDIA with any benefit. With RTG cards on the other hand it’s almost always beneficial, with the benefit increasing with the overall performance of the card. In the case of the Fury X this means a 10% gain at 1440p, and though not plotted here, a similar gain at 4K.

These findings do go hand-in-hand with some of the basic performance goals of async shading, primarily that async shading can improve GPU utilization. At 4096 stream processors the Fury X has the most ALUs out of any card on these charts, and given its performance in other games, the numbers we see here lend credit to the theory that RTG isn’t always able to reach full utilization of those ALUs, particularly on Ashes. In which case async shading could be a big benefit going forward.

As for the NVIDIA cards, that’s a harder read. Is it that NVIDIA already has good ALU utilization? Or is it that their architectures can’t do enough with asynchronous execution to offset the scheduling penalty for using it? Either way, when it comes to Ashes NVIDIA isn’t gaining anything from async shading at this time.

Ashes of the Singularity (Beta) - Extreme Quality - Async Shading Perf. Gain

Meanwhile pushing our fastest GPUs to their limit at Extreme quality only widens the gap. At 4K the Fury X picks up nearly 20% from async shading – though a much smaller 6% at 1440p – while the GTX 980 Ti continues to lose a couple of percent from enabling it. This outcome is somewhat surprising since at 4K we’d already expect the Fury X to be rather taxed, but clearly there’s quite a bit of shader headroom left unused.

DirectX 12 vs. DirectX 11 Closing Thoughts
POST A COMMENT

153 Comments

View All Comments

  • extide - Wednesday, February 24, 2016 - link

    If you are CPU limited, and it's using lots of threads, then yeah more cores would be faster. They were CPU limited on an overclocked 4960X, which is no slouch, that was very surprising! Reply
  • rhysiam - Wednesday, February 24, 2016 - link

    I agree that will be very interesting. I'm surprised more hasn't been made of the seemingly pretty hard CPU limit to ~70fps, irrespective of the detail settings or resolution. And that on a still very capable 4960X @ 4.2Ghz. If we estimate Skylake has a 20% IPC advantage, that would still see the current top tier 6700K (at stock) maxing out in the mid 80s, a long way short of what you might like on a 144hz monitor. Does that mean a brand new quad core CPU like the i5 6400 with its low base clock might struggle to sustain 60fps, even on lower detail settings?

    I realise this is beta and all preliminary, but it's interesting nonetheless.
    Reply
  • DanNeely - Wednesday, February 24, 2016 - link

    Does DX12 Multi-adapter offer any benefits with cards that are mismatched in performance? I'm currently running a GTX 980 in my main PC and also have an older GTX 770 sitting around; would pairing them offer any speedup over just the 980, or would the faster card end up held back by the slower one?

    I'd be equally interested in seeing how AMD does with significantly mismatched GPUs; since they've been trying (with varying degrees of success) to push XFire between their IGPs and the significantly faster chips in midrange Radeon cards.
    Reply
  • BigLan - Wednesday, February 24, 2016 - link

    The article has a quote from the developer about using mismatched cards...
    "For example, you will never get more than twice the speed of the slowest video card. You would be better off just using the new card alone."

    You might get some benefit, but likely not that much.
    Reply
  • Friendly0Fire - Wednesday, February 24, 2016 - link

    I think that's rather narrow minded and way too absolute. Mismatched cards can be used to their full potential, but you'd need some smart coding to make it so. For instance, you could offload some of the work to the weaker GPU, keeping the stronger one for the main rendering.

    One excellent example which would fully utilize two mismatched cards is VR: multiadapter rendering would be used to offload the VR projection and transformation steps to the integrated GPU in most modern CPUs, while the main GPU would do the regular rendering. The data transfer requirement is minimal, but there's a fair amount of computations required, making it an ideal scenario.

    Other examples include doing post-processing on the weaker card (SSAO, subsurface scattering, screenspace reflections, etc.). The big problem is judging just how much work should be offloaded to the secondary GPU - just detecting the hardware would be extremely laborious.
    Reply
  • Ryan Smith - Wednesday, February 24, 2016 - link

    It's a correct description for how Ashes works. They implement a (relatively) straightforward AFR setup, so the cards need to be similar in performance. Reply
  • Senti - Wednesday, February 24, 2016 - link

    What Multi-adapter does is left completely to developer. In some cases it can give you nothing, in others every bit of hardware can be useful including iGPU. Reply
  • extide - Wednesday, February 24, 2016 - link

    Their current implementation is AFR, so the performance of the cards should be as close to identical as possible. In the future I think they may plan on offloading some of the raw compute onto a second GPU, and in that case an older slower GPU would be beneficial. Reply
  • Drumsticks - Wednesday, February 24, 2016 - link

    These are always interesting results to see. I'm pretty excited for Polaris - I can't wait to pickup a higher end GPU to replace my old, old 7850. Reply
  • mattevansc3 - Wednesday, February 24, 2016 - link

    Isn't Oxide's statement that they don't optimise for certain hardware a bit disingenuous?

    If you read their developer diaries not only was AoS built around Mantle, not only was the engine built upon Mantle but they've stated that they developed more of Mantle than AMD did.

    Before DX12 was even announced Oxide were working directly with AMD and building AoS to champion Mantle and take advantage of it a low level while only supporting nVidia hardware on DX11. That of course will automatically bias results in favour of RTG even if there is no intention to do so at this stage.
    Reply

Log in

Don't have an account? Sign up now