CPU Scaling

Diving into our look at DirectX 12, let’s start with what is going to be the most critical component for a benchmark like Star Swarm, the CPU scaling.

Because Star Swarm is designed to exploit the threading inefficiencies of DirectX 11, the biggest gains from switching to DirectX 12 on Star Swarm come from removing the CPU bottleneck. Under DirectX 11 the bulk of Star Swarm’s batch submission work happens under a single thread, and as a result the benchmark is effectively bottlenecked by single-threaded performance, unable to scale out with multiple CPU cores. This is one of the issues DirectX 12 sets out to resolve, with the low-level API allowing Oxide to more directly control how work is submitted, and as such better balance it over multiple CPU cores.

Star Swarm CPU Scaling - Extreme Quality - GeForce GTX 980

Star Swarm CPU Scaling - Extreme Quality - Radeon R9 290X

Starting with a look at CPU scaling on our fastest cards, what we find is that besides the absurd performance difference between DirectX 11 and DirectX 12, performance scales roughly as we’d expect among our CPU configurations. Star Swarm's DirectX 11 path, being single-threaded bound, scales very slightly with clockspeed and core count increases. The DirectX 12 path on the other hand scales up moderately well from 2 to 4 cores, but doesn’t scale up beyond that. This is due to the fact that at these settings, even pushing over 100K draw calls, both GPUs are solidly GPU limited. Anything more than 4 cores goes to waste as we’re no longer CPU-bound. Which means that we don’t even need a highly threaded processor to take advantage of DirectX 12’s strengths in this scenario, as even a 4 core processor provides plenty of kick.

Meanwhile this setup also highlights the fact that under DirectX 11, there is a massive difference in performance between AMD and NVIDIA. In both cases we are completely CPU bound, with AMD’s drivers only able to deliver 1/3rd the performance of NVIDIA’s. Given that this is the original Mantle benchmark I’m not sure we should read into the DirectX 11 situation too much since AMD has little incentive to optimize for this game, but there is clearly a massive difference in CPU efficiency under DirectX 11 in this case.

Star Swarm D3D12 CPU Scaling - Extreme Quality

Having effectively ruled out the need for 6 core CPUs for Star Swarm, let’s take a look at a breakdown across all of our cards for performance with 2 and 4 cores. What we find is that Star Swarm and DirectX 12 are so efficient that only our most powerful card, the GTX 980, finds itself CPU-bound with just 2 cores. For the AMD cards and other NVIDIA cards we can get GPU bound with the equivalent of an Intel Core i3 processor, showcasing just how effective DirectX 12’s improved batch submission process can be. In fact it’s so efficient that Oxide is running both batch submission and a complete AI simulation over just 2 cores.

Star Swarm CPU Batch Submission Time (4 Cores)

Speaking of batch submission, if we look at Star Swarm’s statistics we can find out just what’s going on with batch submission. The results are nothing short of incredible, particularly in the case of AMD. Batch submission time is down from dozens of milliseconds or more to just 3-5ms for our fastest cards, an improvement just overof a whole order of magnitude. For all practical purposes the need to spend CPU time to submit batches has been eliminated entirely, with upwards of 120K draw calls being submitted in a handful of milliseconds. It is this optimization that is at the core of Star Swarm’s DirectX 12 performance improvements, and going forward it could potentially benefit many other games as well.


Another metric we can look at is actual CPU usage as reported by the OS, as shown above. In this case CPU usage more or less perfectly matches our earlier expectations: with DirectX 11 both the GTX 980 and R9 290X show very uneven usage with 1-2 cores doing the bulk of the work, whereas with DirectX 12 CPU usage is spread out evenly over all 4 CPU cores.

At the risk of speaking to the point that it’s redundant, what we’re seeing here is exactly why Mantle, DirectX 12, OpenGL Next, and other low-level APIs have been created. With single-threaded performance struggling to increase while GPUs continue to improve by leaps and bounds with each generation, something must be done to allow games to better spread out their rendering & submission workloads over multiple cores. The solution to that problem is to eliminate the abstraction and let the developers do it themselves through APIs like DirectX 12.

Star Swarm & The Test GPU Scaling
Comments Locked

245 Comments

View All Comments

  • ObscureAngel - Saturday, February 7, 2015 - link

    Ryan can you do an article demonstrating the low performance of AMD GPUs in low end CPUs like i3 or anything, in more CPU Bound games comparing to nvidia GPUs in the same CPUs?

    Unworthy websites have done it, like GameGPU.ru or Digital foundry.
    They don't have so much expression because well, sometimes they are a bit dumb.
    I confirmed that recently with my own benchmarks, AMD GPUs really have much less performance in the same CPU (low-end CPUs) than an nvidia GPU.

    If you look into it and publish maybe that would put a little pressure on AMD and they start to look into it.
    But not sure if you can do it, AMD gives your website AMD GPUS and CPUs to benchmark, i'm pretty sure AMD wouldn't like to read the truth..

    But since Futuremark new 3dmark is close to release that new benchmark that benchmarks overhead/drawcalls.

    It could be nice to give a little highlight of that problem with AMD.
    Many people are starting to notice that problem, but AMD are ignoring everyone that claims the lack of performance, so we need somebody strong like Anandtech or other website to analyse these problems and publish to everyone see that something is wrong.

    Keep in mind that AMD just fixed the frametime problem in crossfire, cause one website (which i dont remember) publish that, and people start to complain about it, and they start to fix it, and they really fix it.
    Now, we already have the complains but we dont have the upper voice like you guys.
  • okp247 - Sunday, February 8, 2015 - link

    Sorry, my bad. The numbers I've stated in the above posts were indeed from either the Follow or Attract scenario.

    So what is up with the underutilized AMD cards? Clearly, they are not stretching their legs under DX11. In the article you touch upon the CPU batch submission times, and how these are taking a (relatively) long time on the AMD cards. Is this the case also with other draw-call heavy games or is it a fluke in Star Swarm?
  • ObscureAngel - Monday, February 9, 2015 - link

    It happens on games too.
    I did a video and everything about it.

    Spread the word, we need to get AMD attention for this..Since they dont answer me i decided to publicly start to say bad things about them :D

    https://www.youtube.com/watch?v=2-nvGOK6ud8
  • killeak - Saturday, February 7, 2015 - link

    Both API (D3D12 and Mantle) are under NDA. In the case of D3D12, in theory if you are working with D3D12 you can't speak about it unless you have explicit authorization from MS. The same with Mantle and AMD.

    I hope D3D12 goes public by GDC time, I mean the public beta no the final version, after that things will change ;)
  • Klimax - Saturday, February 7, 2015 - link

    Thanks for numbers. They show perfectly how broken and craptastick entire POS is. There are extreme number of idiocies and stupidities in it that it couldn't pass any review by any competent developer.

    1)Insane number of batches. You want to have at least 100 objects in one to actually see benefit. (Civilization V default settings) To see quite better performance I would say at least 1000 objects to be in one. (Civilization V test with adjusted config) Star Swarm has between 10 to 50 times more batches then Civilization. (Precise number cannot be said as I don't have number of objects to be drawn reported from that "benchmark")

    2)Absolutely insane number of superfluous calls. Things like IASetPrimitiveTopology are called (almost) each time an object is to be drawn with same parameters(constants) and with large number of batches those functions add to overhead. That's why you see such large time for DX11 draw - it has to reprocess many things repeatedly. (Some caching and shortcuts can be done as I am sure NVidia implemented them, but there are limits even for otherwise very cheap functions)

    3)Simulation itself is so atrociously written that it doesn't really scale at all! This is in space, where number of intersection is very small, so you can process it at maximum possible parallelization.
    360s run had 4 cores used for 5,65s with 5+ for 6,1s in total. Bad is weak word...

    And I am pretty sure I haven't uncovered all. Note: I used Intel VTune for analysis 1 year ago. Since then no update came so I don't think anything changed at all... (Seeing those numbers I am sure of it)
  • nulian - Saturday, February 7, 2015 - link

    The draw calls are misused on purpose in this demo to show how much better it has become. The advantage for normal games is they can do more light and more effects that use a lot of draw calls without breaking the performance on pc. It is one of the biggest performance different between console and PC draw calls.
  • BehindEnemyLines - Saturday, February 7, 2015 - link

    Or maybe they are doing that on purpose to show the bottleneck of DX11 API? Just a thought. If this is a "poorly" written performance demo, then you can only imagine the DX12 improvements after it's "properly" written.
  • Teknobug - Saturday, February 7, 2015 - link

    Wasn't there some kind of leaked info that DX12 was basically a copy of Mantle with DX API? Wouldn't surprise me that it'd come close to Mantle's performance.
  • dragonsqrrl - Sunday, February 8, 2015 - link

    Right, cause Microsoft only started working on DX12 when Mantle was announced...
  • bloodypulp - Sunday, February 8, 2015 - link

    You're missing the point. Mantle/D12 are so similar you could essentially call DX12 the Windows-only version of Mantle. By releasing Mantle, AMD gave developers an opportunity to utilize the new low-level APIs nearly two years before Microsoft was ready to release their own as naturally it was tied to their OS. Those developers who had the foresight to take advantage of Mantle during those two years clearly benefited. They'll launch DX12-ready games before their competitors.

Log in

Don't have an account? Sign up now