The Performance Impact of Asynchronous Shading

Finally, let’s take a look at Ashes’ latest addition to its stable of DX12 headlining features; asynchronous shading/compute. While earlier betas of the game implemented a very limited form of async shading, this latest beta contains a newer, more complex implementation of the technology, inspired in part by Oxide’s experiences with multi-GPU. As a result, async shading will potentially have a greater impact on performance than in earlier betas.

Update 02/24: NVIDIA sent a note over this afternoon letting us know that asynchornous shading is not enabled in their current drivers, hence the performance we are seeing here. Unfortunately they are not providing an ETA for when this feature will be enabled.

Ashes of the Singularity (Beta) - High Quality - Async Shader Performance

Since async shading is turned on by default in Ashes, what we’re essentially doing here is measuring the penalty for turning it off. Not unlike the DirectX 12 vs. DirectX 11 situation – and possibly even contributing to it – what we find depends heavily on the GPU vendor.

Ashes of the Singularity (Beta) - High Quality - Async Shading Perf. Gain

All NVIDIA cards suffer a minor regression in performance with async shading turned on. At a maximum of -4% it’s really not enough to justify disabling async shading, but at the same time it means that async shading is not providing NVIDIA with any benefit. With RTG cards on the other hand it’s almost always beneficial, with the benefit increasing with the overall performance of the card. In the case of the Fury X this means a 10% gain at 1440p, and though not plotted here, a similar gain at 4K.

These findings do go hand-in-hand with some of the basic performance goals of async shading, primarily that async shading can improve GPU utilization. At 4096 stream processors the Fury X has the most ALUs out of any card on these charts, and given its performance in other games, the numbers we see here lend credit to the theory that RTG isn’t always able to reach full utilization of those ALUs, particularly on Ashes. In which case async shading could be a big benefit going forward.

As for the NVIDIA cards, that’s a harder read. Is it that NVIDIA already has good ALU utilization? Or is it that their architectures can’t do enough with asynchronous execution to offset the scheduling penalty for using it? Either way, when it comes to Ashes NVIDIA isn’t gaining anything from async shading at this time.

Ashes of the Singularity (Beta) - Extreme Quality - Async Shading Perf. Gain

Meanwhile pushing our fastest GPUs to their limit at Extreme quality only widens the gap. At 4K the Fury X picks up nearly 20% from async shading – though a much smaller 6% at 1440p – while the GTX 980 Ti continues to lose a couple of percent from enabling it. This outcome is somewhat surprising since at 4K we’d already expect the Fury X to be rather taxed, but clearly there’s quite a bit of shader headroom left unused.

DirectX 12 vs. DirectX 11 Closing Thoughts
POST A COMMENT

153 Comments

View All Comments

  • BurntMyBacon - Thursday, February 25, 2016 - link

    @anubis44: "nVidia wasn't expecting AMD to force Microsoft's hand and release DX12 so soon."

    I do believe you are correct. Given the lack of ability to throw driver optimizations at the DX12 code path and nVidia's proficiency at doing it, I'd say this will be quite damaging. They've lost one clear advantage they held (at least in DX11).

    @anubis44: "It's beginning to look like nVidia's been check-mated by AMD here."

    I wouldn't go that far. They probably won't have the necessary hardware in Pascal, but you can be sure Volta will have what it needs. Besides, most games will likely have a DX11 code path for the foreseeable future as developers wouldn't want to lock themselves out of an entire market. Also, at the moment, nVidia can still play DX12 fine, they just don't appear to have the advantage at the moment given the small sample set of available data points.

    In conclusion, it is more like they have lost a rook or queen. Of course, they've taken a few of ATi's pieces as well, so lets just wait and see who plays their remaining pieces better.
    Reply
  • rhysiam - Thursday, February 25, 2016 - link

    The other thing I would add to this is that it's not like Nvidia have nowhere to go here. Take the GTX 970 vs the R9 390 for example... they're in a similar price & performance tier. Yet the 970 is smaller with fewer transistors (usually meaning it's cheaper to produce) and generally has a much higher overclocking headroom (because Nvidia wasn't under pressure to clock the card closer to the limit to reach relevant performance). So it's reasonable to expect Nvidia could both lower the price and clock it higher to get a significantly better value card with minimal basically no substantive engineering/architectural changes.

    I'm not suggesting Nvidia will do that with the 970 specifically. Rather, what I'm saying is that if they find Pascal is similarly behind AMD they've got plenty of room to tweak performance and price before we can start calling them "check-mated". But it's certainly good new for us if DX12 performance like this continues and AMD essentially forces Nvidia to lower its margin.
    Reply
  • CiccioB - Sunday, February 28, 2016 - link

    They can do exactly as AMD has done with GCN: they just can start using 30 or 50% bigger GPUs to close the performance gap if they really need to. Reply
  • The_Countess - Thursday, February 25, 2016 - link

    nvidia's entire performance advantage in DX11 is based on game specific driver optimizations. they have a virtual army of developers slaving away on those (and coming up with way to hurt everyone's performance as long as it hurt AMD the most or makes their own latest gen cards look better... but that's a different matter)

    with DX12 however the drivers becomes MUCH thinner and doesn't have nearly as much influence. so basically nvidia's main competitive advantage is gone with dx12 and vulkan.

    as for being relevant: this year pretty much every game where performance matters will have either a DX12 or Vulkan render option. adding in the fact that AMD cards generally age better then nvidia's (those game specific optimizations focus pretty much exclusively only on their latest generation of cards) and i would say that yes it is very relevant.
    Reply
  • BurntMyBacon - Thursday, February 25, 2016 - link

    @The_Countess: "nvidia's entire performance advantage in DX11 is based on game specific driver optimizations. they have a virtual army of developers slaving away on those ..."

    True, they have lost a large advantage. Keep in mind, though, that nVidia's developer relations are still in play. What they once achieved through the use of driver optimizations may still be accomplished through code path optimization and design guidance for nVidia architecture. The first beta for Vulkan (The Talos Principle) showed that merely replacing a high level API (OpenGL/DX11) with a low level one (Vulkan/DX12) does not automatically improve the experience. If nVidia can convince developers to avoid certain non-optimal features or program in such a way as to take better advantage of nVidia hardware in their titles (for the sake of performance on the majority of discrete card owners out there of course) then ATi will be in the same position as they are now. Better hardware, worse software support. Then again, low level API cross-platform titles will most assuredly program to take advantage of the console architectures which happens to be ATi's at the moment.
    Reply
  • nevcairiel - Wednesday, February 24, 2016 - link

    Considering the Fury X just has a tad bit more raw power than a (older) 980Ti, I would say the DX12 numbers are fine, and what is really showing is AMDs lack of performance in DX11? Reply
  • tuxRoller - Wednesday, February 24, 2016 - link

    I don't agree with this. I think this is more a case of nvidia not being able to rely so much on the ENORMOUS number of special cases in their driver.
    IOW, this is about two things: hardware and game design. The drivers are trivial next to d3d11/ogl.
    Reply
  • jasonelmore - Wednesday, February 24, 2016 - link

    Fury X's Architecture is much newer than Maxwell 2's. Lets see what the true DX12 cards can do this summer. Reply
  • tuxRoller - Wednesday, February 24, 2016 - link

    Did you not notice the across the board improvements for all gcn cards?
    The point I was making, and that others have made for sometime, is that AMD makes really good hardware but this is typically masked by poor drivers.
    You can see this by looking at their excellent performance in compute workloads where the code in the driver is more recent and doesn't have the legacy cruft of their d3d/ogl code.
    Reply
  • Despoiler - Thursday, February 25, 2016 - link

    It's not their drivers. It's purely architectural. GCN moved their schedulers into to hardware. GCN requires the API to be able to feed it enough work. What people have been calling "driver overhead" is nothing of the sort. DX11 is just not capable of fully utilizing AMD hardware. DX12 is and that is why AMD created Mantle. It forced MS to create DX12 and that set off the creation of Vulkan. All of the next gen APIs are tailored to exploit AMDs already being sold hardware. Reply

Log in

Don't have an account? Sign up now