DirectX 12 vs. Mantle, Power Consumption

Although the bulk of our coverage today is going to be focused on DirectX 12 versus DirectX 11, we also wanted to take a moment to also stop and look at DirectX 12 and how it compares to AMD’s Mantle. Mantle offers an interesting point of contrast being that it has been in beta longer than DirectX 12, but also due to the fact that it’s an even lower level API than DirectX 12. Since Mantle only needs to work on AMD’s GPUs and can be tweaked for AMD’s architectures, it offers AMD the chance to exploit their GPUs in a few additional ways that a common, cross-vendor API like DirectX 12 cannot.

Star Swarm - Direct3D 12 vs. Mantle (4 Cores) - Extreme Quality

With 4 cores we find that AMD achieves better results with Mantle than DirectX 12 across the board. The gains are never very great – a few percent here and there – but they are consistent and just outside our window of variability for the Star Swarm benchmark. With such a small gain there are a number of factors that can possibly explain this outcome – better developed drivers, better developed application, further benefits of working with a known hardware platform – so we can’t credit any one factor. But it’s safe to say that at least in this one instance, at this time, Star Swarm’s Mantle rendering path produces even better results than its DirectX 12 path on AMD cards.

Star Swarm - Direct3D 12 vs. Mantle (2 Cores) - Extreme Quality

On the other hand, Mantle doesn’t seem to be able to accommodate a two-core situation as well, with the 290X seeing a small but distinct performance regression from switching to Mantle from DirectX 12. Though we didn’t have time to look at an AMD APU for this article, it would be interesting to see if this regression occurs on their 2M/4C parts as well as it does here; AMD is banking heavily on low-level APIs like Mantle to help level the CPU playing field with Intel, so if Mantle needs 4 CPU cores to fully spread its wings with faster cards, that might be a problem.

Star Swarm CPU Batch Submission Time (4 Cores) - D3D vs. Mantle - Extreme Quality

Diving deeper, we can see that part of the explanation for our Mantle performance regression may come from the batch submission process. DirectX 12 is unexpectedly well ahead of Mantle here, with batch submission taking on average a bit more than half as long as it does under Mantle. As batch submission times are highly correlated to CPU bottlenecking on Star Swarm, this would imply that DirectX 12 would bottleneck later than Mantle in this instance. That said, since we’re so strongly GPU-bound right now it’s not at all clear if either API would be CPU bottlenecked any time soon.

Update: Oxide Games has emailed us this evening with a bit more detail about what's going on under the hood, and why Mantle batch submission times are higher. When working with large numbers of very small batches, Star Swarm is capable of throwing enough work at the GPU such that the GPU's command processor becomes the bottleneck. For this reason the Mantle path includes an optimization routine for small batches (OptimizeSmallBatch=1), which trades GPU power for CPU power, doing a second pass on the batches in the CPU to combine some of them before submitting them to the GPU. This bypasses the command processor bottleneck, but it increases the amount of work the CPU needs to do (though note that in AMD's case, it's still several times faster than DX11).

This feature is enabled by default in our build, and by combining those small batches this is the likely reason that the Mantle path holds a slight performance edge over the DX12 path on our AMD cards. The tradeoff is that in a 2 core configuration, the extra CPU workload from the optimization pass is just enough to cause Star Swarm to start bottlenecking at the CPU again. For the time being this is a user-adjustable feature in Star Swarm, and Oxide notes that in any shipping game the small batch feature would likely be turned off by default on slower CPUs.

Star Swarm CPU Batch Submission Time (4 Cores) - Small Batch Optimization

Star Swarm - Direct3D 12 vs. Mantle (4 Cores) - Small Batch Optimization

If we turn off the small batch optimization feature, what we find is that Mantle' s batch submission time drops nearly in half, to an average of 4.4ms. With the second pass removed, Mantle and DirectX 12 take roughly the same amount of time to submit batches in a single pass. However as Oxide noted, there is a performance hit; the Mantle rendering path's performance goes from being ahead of DirectX 12 to trailing it. So given sufficient CPU power to pay the price for batch optimization, it can have a signifcant impact (16%) on improving performance under Mantle.

Star Swarm System Power Consumption (6 Cores)

Finally, we wanted to take a quick look at power consumption among cards and APIs. To once again repeat what we said earlier, Star Swarm is an imperfect, non-deterministic benchmark, and coupled with the in-development status of DirectX 12 everything here is subject to change. However we thought this was interesting enough to include in our evaluation.

As expected, the increased throughput from DirectX 12 and Mantle drive up system power consumption. With the CPU no longer the bottleneck, the GPU never gets a chance to idle and video card power consumption ramps up to full load.

GPU Scaling Mid Quality Performance
Comments Locked

245 Comments

View All Comments

  • ObscureAngel - Saturday, February 7, 2015 - link

    Ryan can you do an article demonstrating the low performance of AMD GPUs in low end CPUs like i3 or anything, in more CPU Bound games comparing to nvidia GPUs in the same CPUs?

    Unworthy websites have done it, like GameGPU.ru or Digital foundry.
    They don't have so much expression because well, sometimes they are a bit dumb.
    I confirmed that recently with my own benchmarks, AMD GPUs really have much less performance in the same CPU (low-end CPUs) than an nvidia GPU.

    If you look into it and publish maybe that would put a little pressure on AMD and they start to look into it.
    But not sure if you can do it, AMD gives your website AMD GPUS and CPUs to benchmark, i'm pretty sure AMD wouldn't like to read the truth..

    But since Futuremark new 3dmark is close to release that new benchmark that benchmarks overhead/drawcalls.

    It could be nice to give a little highlight of that problem with AMD.
    Many people are starting to notice that problem, but AMD are ignoring everyone that claims the lack of performance, so we need somebody strong like Anandtech or other website to analyse these problems and publish to everyone see that something is wrong.

    Keep in mind that AMD just fixed the frametime problem in crossfire, cause one website (which i dont remember) publish that, and people start to complain about it, and they start to fix it, and they really fix it.
    Now, we already have the complains but we dont have the upper voice like you guys.
  • okp247 - Sunday, February 8, 2015 - link

    Sorry, my bad. The numbers I've stated in the above posts were indeed from either the Follow or Attract scenario.

    So what is up with the underutilized AMD cards? Clearly, they are not stretching their legs under DX11. In the article you touch upon the CPU batch submission times, and how these are taking a (relatively) long time on the AMD cards. Is this the case also with other draw-call heavy games or is it a fluke in Star Swarm?
  • ObscureAngel - Monday, February 9, 2015 - link

    It happens on games too.
    I did a video and everything about it.

    Spread the word, we need to get AMD attention for this..Since they dont answer me i decided to publicly start to say bad things about them :D

    https://www.youtube.com/watch?v=2-nvGOK6ud8
  • killeak - Saturday, February 7, 2015 - link

    Both API (D3D12 and Mantle) are under NDA. In the case of D3D12, in theory if you are working with D3D12 you can't speak about it unless you have explicit authorization from MS. The same with Mantle and AMD.

    I hope D3D12 goes public by GDC time, I mean the public beta no the final version, after that things will change ;)
  • Klimax - Saturday, February 7, 2015 - link

    Thanks for numbers. They show perfectly how broken and craptastick entire POS is. There are extreme number of idiocies and stupidities in it that it couldn't pass any review by any competent developer.

    1)Insane number of batches. You want to have at least 100 objects in one to actually see benefit. (Civilization V default settings) To see quite better performance I would say at least 1000 objects to be in one. (Civilization V test with adjusted config) Star Swarm has between 10 to 50 times more batches then Civilization. (Precise number cannot be said as I don't have number of objects to be drawn reported from that "benchmark")

    2)Absolutely insane number of superfluous calls. Things like IASetPrimitiveTopology are called (almost) each time an object is to be drawn with same parameters(constants) and with large number of batches those functions add to overhead. That's why you see such large time for DX11 draw - it has to reprocess many things repeatedly. (Some caching and shortcuts can be done as I am sure NVidia implemented them, but there are limits even for otherwise very cheap functions)

    3)Simulation itself is so atrociously written that it doesn't really scale at all! This is in space, where number of intersection is very small, so you can process it at maximum possible parallelization.
    360s run had 4 cores used for 5,65s with 5+ for 6,1s in total. Bad is weak word...

    And I am pretty sure I haven't uncovered all. Note: I used Intel VTune for analysis 1 year ago. Since then no update came so I don't think anything changed at all... (Seeing those numbers I am sure of it)
  • nulian - Saturday, February 7, 2015 - link

    The draw calls are misused on purpose in this demo to show how much better it has become. The advantage for normal games is they can do more light and more effects that use a lot of draw calls without breaking the performance on pc. It is one of the biggest performance different between console and PC draw calls.
  • BehindEnemyLines - Saturday, February 7, 2015 - link

    Or maybe they are doing that on purpose to show the bottleneck of DX11 API? Just a thought. If this is a "poorly" written performance demo, then you can only imagine the DX12 improvements after it's "properly" written.
  • Teknobug - Saturday, February 7, 2015 - link

    Wasn't there some kind of leaked info that DX12 was basically a copy of Mantle with DX API? Wouldn't surprise me that it'd come close to Mantle's performance.
  • dragonsqrrl - Sunday, February 8, 2015 - link

    Right, cause Microsoft only started working on DX12 when Mantle was announced...
  • bloodypulp - Sunday, February 8, 2015 - link

    You're missing the point. Mantle/D12 are so similar you could essentially call DX12 the Windows-only version of Mantle. By releasing Mantle, AMD gave developers an opportunity to utilize the new low-level APIs nearly two years before Microsoft was ready to release their own as naturally it was tied to their OS. Those developers who had the foresight to take advantage of Mantle during those two years clearly benefited. They'll launch DX12-ready games before their competitors.

Log in

Don't have an account? Sign up now