The Performance Impact of Asynchronous Shading

Finally, let’s take a look at Ashes’ latest addition to its stable of DX12 headlining features; asynchronous shading/compute. While earlier betas of the game implemented a very limited form of async shading, this latest beta contains a newer, more complex implementation of the technology, inspired in part by Oxide’s experiences with multi-GPU. As a result, async shading will potentially have a greater impact on performance than in earlier betas.

Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled.

Ashes of the Singularity (Beta) - High Quality - Async Shader Performance

Since async shading is turned on by default in Ashes, what we’re essentially doing here is measuring the penalty for turning it off. Not unlike the DirectX 12 vs. DirectX 11 situation – and possibly even contributing to it – what we find depends heavily on the GPU vendor.

Ashes of the Singularity (Beta) - High Quality - Async Shading Perf. Gain

All NVIDIA cards suffer a minor regression in performance with async shading turned on. At a maximum of -4% it’s really not enough to justify disabling async shading, but at the same time it means that async shading is not providing NVIDIA with any benefit. With RTG cards on the other hand it’s almost always beneficial, with the benefit increasing with the overall performance of the card. In the case of the Fury X this means a 10% gain at 1440p, and though not plotted here, a similar gain at 4K.

These findings do go hand-in-hand with some of the basic performance goals of async shading, primarily that async shading can improve GPU utilization. At 4096 stream processors the Fury X has the most ALUs out of any card on these charts, and given its performance in other games, the numbers we see here lend credit to the theory that RTG isn’t always able to reach full utilization of those ALUs, particularly on Ashes. In which case async shading could be a big benefit going forward.

As for the NVIDIA cards, that’s a harder read. Is it that NVIDIA already has good ALU utilization? Or is it that their architectures can’t do enough with asynchronous execution to offset the scheduling penalty for using it? Either way, when it comes to Ashes NVIDIA isn’t gaining anything from async shading at this time.

Ashes of the Singularity (Beta) - Extreme Quality - Async Shading Perf. Gain

Meanwhile pushing our fastest GPUs to their limit at Extreme quality only widens the gap. At 4K the Fury X picks up nearly 20% from async shading – though a much smaller 6% at 1440p – while the GTX 980 Ti continues to lose a couple of percent from enabling it. This outcome is somewhat surprising since at 4K we’d already expect the Fury X to be rather taxed, but clearly there’s quite a bit of shader headroom left unused.

DirectX 12 vs. DirectX 11 Closing Thoughts
Comments Locked

153 Comments

View All Comments

  • Koenig168 - Wednesday, February 24, 2016 - link

    There is a brief mention of GTX 680 2GB "CPU memory limitations". I take it you mean "VRAM memory limitations". It would be interesting to know if this can be overcome by DX12 memory stacking, either a pair of GTX 680s or the GTX 690.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    That was meant to be "GPU memory limitations", thanks for the catch.
  • B3an - Wednesday, February 24, 2016 - link

    Why is Beta 2 still not available on Steam? Have the media got early access? At the time of posting this there's still only Beta 1 available.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    It's out to the public tomorrow.
  • hemipepsis5p - Wednesday, February 24, 2016 - link

    Hey, so I'm confused by the mixed GPU testing. I thought that both cards had to be the same in order to run them in SLI/Crossfire? How did they test a Fury X + 980Ti?
  • Ext3h - Wednesday, February 24, 2016 - link

    That's no longer the case with DX12. It used to be like this with DX11 and earlier versions, when the driver decided if/how to split the workload onto multiple GPUs, but with DX12 that choice is now up to the application.

    So if the developer chooses to support asymmetric configurations, even cross vendor or exotic combinations like Intel IGP + AMD dGPU, then it can be made to work.
  • anubis44 - Thursday, February 25, 2016 - link

    I'm willing to bet that nVidia's Maxwell cards can't use DX12's async compute at all, and they're falling back to the DX11 code path, even when you 'enable' DX12 for them.
  • Ext3h - Thursday, February 25, 2016 - link

    You loose that bet.

    The asynchronous compute term only defines how tasks are synchronized against each other, whereby the "asynchronous" term only states tasks won't block while waiting for each other. The default of doing that in software, in order to create a sequential schedule, is perfectly legit and fulfills the specification in whole.

    Hardware support isn't required for this feature at all, even though you *can* optionally use hardware to perform much better than the software solution. Parallel execution does require hardware support and can bring an huge performance boost, but "asynchronous compute" does not specify that parallel execution would be required.
  • BradGrenz - Thursday, February 25, 2016 - link

    The whole point of async compute is to take advantage of parallel execution. It doesn't matter what nVidia's drivers tell an application, if it accepts these commands but is forced to reorder them for serial execution because the hardware can do nothing else then it doesn't really support the technology at all. It's be like claiming support for texture compression even though your driver has to decompress every texture to an uncompressed format before the GPU can read it. It doesn't matter if the application thinks compressed textures are being used if the hardware actually provides none of the benefits the technology intended (in this case more/larger textures in a given amount of VRAM, and in the case of async compute, more efficient utilization of shader ALUs).
  • Sajin - Thursday, February 25, 2016 - link

    "Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled."

    Source: http://www.anandtech.com/show/10067/ashes-of-the-s...

Log in

Don't have an account? Sign up now