The Performance Impact of Asynchronous Shading

Finally, let’s take a look at Ashes’ latest addition to its stable of DX12 headlining features; asynchronous shading/compute. While earlier betas of the game implemented a very limited form of async shading, this latest beta contains a newer, more complex implementation of the technology, inspired in part by Oxide’s experiences with multi-GPU. As a result, async shading will potentially have a greater impact on performance than in earlier betas.

Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled.

Ashes of the Singularity (Beta) - High Quality - Async Shader Performance

Since async shading is turned on by default in Ashes, what we’re essentially doing here is measuring the penalty for turning it off. Not unlike the DirectX 12 vs. DirectX 11 situation – and possibly even contributing to it – what we find depends heavily on the GPU vendor.

Ashes of the Singularity (Beta) - High Quality - Async Shading Perf. Gain

All NVIDIA cards suffer a minor regression in performance with async shading turned on. At a maximum of -4% it’s really not enough to justify disabling async shading, but at the same time it means that async shading is not providing NVIDIA with any benefit. With RTG cards on the other hand it’s almost always beneficial, with the benefit increasing with the overall performance of the card. In the case of the Fury X this means a 10% gain at 1440p, and though not plotted here, a similar gain at 4K.

These findings do go hand-in-hand with some of the basic performance goals of async shading, primarily that async shading can improve GPU utilization. At 4096 stream processors the Fury X has the most ALUs out of any card on these charts, and given its performance in other games, the numbers we see here lend credit to the theory that RTG isn’t always able to reach full utilization of those ALUs, particularly on Ashes. In which case async shading could be a big benefit going forward.

As for the NVIDIA cards, that’s a harder read. Is it that NVIDIA already has good ALU utilization? Or is it that their architectures can’t do enough with asynchronous execution to offset the scheduling penalty for using it? Either way, when it comes to Ashes NVIDIA isn’t gaining anything from async shading at this time.

Ashes of the Singularity (Beta) - Extreme Quality - Async Shading Perf. Gain

Meanwhile pushing our fastest GPUs to their limit at Extreme quality only widens the gap. At 4K the Fury X picks up nearly 20% from async shading – though a much smaller 6% at 1440p – while the GTX 980 Ti continues to lose a couple of percent from enabling it. This outcome is somewhat surprising since at 4K we’d already expect the Fury X to be rather taxed, but clearly there’s quite a bit of shader headroom left unused.

DirectX 12 vs. DirectX 11 Closing Thoughts
Comments Locked

153 Comments

View All Comments

  • CiccioB - Sunday, February 28, 2016 - link

    The so called ASync Compute implementation AMD has in HW IS NOT PART OF DX12 SPECIFICS.
    I hope that is clear written that way.

    DX12 describe the use of multiple threads flying at the same time. nvidia does support them, with some limitations in number and preemption capabilities with respect to what AMD HW can.
    This however does not mean that nvdia HW does not support Async compute or it is out of specs. AMD just made a better implementation of it.
    Think it as it was for tessellation: nvidia implementation is way better than AMD one, but the fact that AMD can't go over certain values does not mean they are not DX11 compliant.

    What you are looking here is a benchmark (more than a game) that stresses the multi-threaded capabilities of AMD HW. You can see that AMD is in a better position here. But the question is: how many other games are going to benefit from using such a technique and how many of them are going to implement such a heavy duty load?

    We just don't know now. We have to wait to see if this technique can really improve performance (and thus image quality) in many other situations or it is just a show off for AMD (that has clearly partnered to make this feature even more heavy on nvidia HW).
    When nvidia will star making developers using their HW accelerated Voxels we will start to see what feature is going to hit worse one another's HW and which is going to give better image quality improvements.

    For now I just think this is a over used feature that like many other engine characteristics in DX11 is going to give advantage to one side rather than the other.
  • anubis44 - Thursday, February 25, 2016 - link

    That's because it never will be. You can't enable missing hardware.
  • xTRICKYxx - Wednesday, February 24, 2016 - link

    I hate to be that guy, but I think it is time to dump the X79 platform for X99 or Z170.
  • Ryan Smith - Wednesday, February 24, 2016 - link

    Yep, Broadwell-E is on our list of things to do once it's out.
  • Will Robinson - Wednesday, February 24, 2016 - link

    NVidia got rekt.
    DX12 lays the smak on Chizow's green dreams.
  • Roboyt0 - Wednesday, February 24, 2016 - link

    Do you have 3840x2160 results for the R9 290X per chance?
  • Ryan Smith - Wednesday, February 24, 2016 - link

    No. We only ran 4K on Fury X and 980 Ti.
  • Stuka87 - Wednesday, February 24, 2016 - link

    Really hating the colors of the graphs here. All grey, legend has one blue item, but no blue on the graph....
  • Ryan Smith - Wednesday, February 24, 2016 - link

    It's something of a limitation of the CMS. The color bar is the average; the grey bars are in the same order as they are in the legend: normal, medium, and heavy batch counts.
  • Mr Perfect - Thursday, February 25, 2016 - link

    I was wondering what was up with that. Maybe someone could do a little MS-Paint bucket fill on the images before publishing? :)

Log in

Don't have an account? Sign up now