GPU Scaling

Switching gears, let’s take a look at performance from a GPU standpoint, including how well Star Swarm performance scales with more powerful GPUs now that we have eliminated the CPU bottleneck. Until now Star Swarm has never been GPU bottlenecked on high-end NVIDIA cards, so this is our first time seeing just how much faster Star Swarm can get until it runs into the limits of the GPU itself.

Star Swarm GPU Scaling - Extreme Quality (4 Cores)

As it stands, with the CPU bottleneck swapped out for a GPU bottleneck, Star Swarm starts to favor NVIDIA GPUs right now. Even accounting for performance differences, NVIDIA ends up coming out well ahead here, with the GTX 980 beating the R9 290X by over 50%, and the GTX 680 some 25% ahead of the R9 285, both values well ahead of their average lead in real-world games. With virtually every aspect of this test still being under development – OS, drivers, and Star Swarm – we would advise not reading into this too much right now, but it will be interesting to see if this trend holds with the final release of DirectX 12.

Meanwhile it’s interesting to note that largely due to their poor DirectX 11 performance in this benchmark, AMD sees the greatest gains from DirectX 12 on a relative basis and comes close to seeing the greatest gains on an absolute basis as well. The GTX 980’s performance improves by 150% and 40.1fps when switching APIs; the R9 290X improves by 416% and 34.6fps. As for AMD’s Mantle, we’ll get back to that in a bit.

Star Swarm GPU Scaling - Extreme Quality (2 Cores)

Having already established that even 2 CPU cores is enough to keep Star Swarm fed on anything less than a GTX 980, the results are much the same here for our 2 core configuration. Other than the GTX 980 being CPU limited, the gains from enabling DirectX 12 are consistent with what we saw for the 4 core configuration. Which is to say that even a relatively weak CPU can benefit from DirectX 12, at least when paired with a strong GPU.

However the GTX 750 Ti result in particular also highlights the fact that until a powerful GPU comes into play, the benefits today from DirectX 12 aren’t nearly as great. Though the GTX 750 Ti does improve in performance by 26%, this is far cry from the 150% of the GTX 980, or even the gains for the GTX 680. While AMD is terminally CPU limited here, NVIDIA can get just enough out of DirectX 11 that a 2 core configuration can almost feed the GTX 750 Ti. Consequently in the NVIDIA case, a weak CPU paired with a weak GPU does not currently see the same benefits that we get elsewhere. However as DirectX 12 is meant to be forward looking – to be out before it’s too late – as GPU performance gains continue to outstrip CPU performance gains, the benefits even for low-end configurations will continue to increase.

CPU Scaling DirectX 12 vs. Mantle, Power Consumption
Comments Locked

245 Comments

View All Comments

  • ObscureAngel - Saturday, February 7, 2015 - link

    Ryan can you do an article demonstrating the low performance of AMD GPUs in low end CPUs like i3 or anything, in more CPU Bound games comparing to nvidia GPUs in the same CPUs?

    Unworthy websites have done it, like GameGPU.ru or Digital foundry.
    They don't have so much expression because well, sometimes they are a bit dumb.
    I confirmed that recently with my own benchmarks, AMD GPUs really have much less performance in the same CPU (low-end CPUs) than an nvidia GPU.

    If you look into it and publish maybe that would put a little pressure on AMD and they start to look into it.
    But not sure if you can do it, AMD gives your website AMD GPUS and CPUs to benchmark, i'm pretty sure AMD wouldn't like to read the truth..

    But since Futuremark new 3dmark is close to release that new benchmark that benchmarks overhead/drawcalls.

    It could be nice to give a little highlight of that problem with AMD.
    Many people are starting to notice that problem, but AMD are ignoring everyone that claims the lack of performance, so we need somebody strong like Anandtech or other website to analyse these problems and publish to everyone see that something is wrong.

    Keep in mind that AMD just fixed the frametime problem in crossfire, cause one website (which i dont remember) publish that, and people start to complain about it, and they start to fix it, and they really fix it.
    Now, we already have the complains but we dont have the upper voice like you guys.
  • okp247 - Sunday, February 8, 2015 - link

    Sorry, my bad. The numbers I've stated in the above posts were indeed from either the Follow or Attract scenario.

    So what is up with the underutilized AMD cards? Clearly, they are not stretching their legs under DX11. In the article you touch upon the CPU batch submission times, and how these are taking a (relatively) long time on the AMD cards. Is this the case also with other draw-call heavy games or is it a fluke in Star Swarm?
  • ObscureAngel - Monday, February 9, 2015 - link

    It happens on games too.
    I did a video and everything about it.

    Spread the word, we need to get AMD attention for this..Since they dont answer me i decided to publicly start to say bad things about them :D

    https://www.youtube.com/watch?v=2-nvGOK6ud8
  • killeak - Saturday, February 7, 2015 - link

    Both API (D3D12 and Mantle) are under NDA. In the case of D3D12, in theory if you are working with D3D12 you can't speak about it unless you have explicit authorization from MS. The same with Mantle and AMD.

    I hope D3D12 goes public by GDC time, I mean the public beta no the final version, after that things will change ;)
  • Klimax - Saturday, February 7, 2015 - link

    Thanks for numbers. They show perfectly how broken and craptastick entire POS is. There are extreme number of idiocies and stupidities in it that it couldn't pass any review by any competent developer.

    1)Insane number of batches. You want to have at least 100 objects in one to actually see benefit. (Civilization V default settings) To see quite better performance I would say at least 1000 objects to be in one. (Civilization V test with adjusted config) Star Swarm has between 10 to 50 times more batches then Civilization. (Precise number cannot be said as I don't have number of objects to be drawn reported from that "benchmark")

    2)Absolutely insane number of superfluous calls. Things like IASetPrimitiveTopology are called (almost) each time an object is to be drawn with same parameters(constants) and with large number of batches those functions add to overhead. That's why you see such large time for DX11 draw - it has to reprocess many things repeatedly. (Some caching and shortcuts can be done as I am sure NVidia implemented them, but there are limits even for otherwise very cheap functions)

    3)Simulation itself is so atrociously written that it doesn't really scale at all! This is in space, where number of intersection is very small, so you can process it at maximum possible parallelization.
    360s run had 4 cores used for 5,65s with 5+ for 6,1s in total. Bad is weak word...

    And I am pretty sure I haven't uncovered all. Note: I used Intel VTune for analysis 1 year ago. Since then no update came so I don't think anything changed at all... (Seeing those numbers I am sure of it)
  • nulian - Saturday, February 7, 2015 - link

    The draw calls are misused on purpose in this demo to show how much better it has become. The advantage for normal games is they can do more light and more effects that use a lot of draw calls without breaking the performance on pc. It is one of the biggest performance different between console and PC draw calls.
  • BehindEnemyLines - Saturday, February 7, 2015 - link

    Or maybe they are doing that on purpose to show the bottleneck of DX11 API? Just a thought. If this is a "poorly" written performance demo, then you can only imagine the DX12 improvements after it's "properly" written.
  • Teknobug - Saturday, February 7, 2015 - link

    Wasn't there some kind of leaked info that DX12 was basically a copy of Mantle with DX API? Wouldn't surprise me that it'd come close to Mantle's performance.
  • dragonsqrrl - Sunday, February 8, 2015 - link

    Right, cause Microsoft only started working on DX12 when Mantle was announced...
  • bloodypulp - Sunday, February 8, 2015 - link

    You're missing the point. Mantle/D12 are so similar you could essentially call DX12 the Windows-only version of Mantle. By releasing Mantle, AMD gave developers an opportunity to utilize the new low-level APIs nearly two years before Microsoft was ready to release their own as naturally it was tied to their OS. Those developers who had the foresight to take advantage of Mantle during those two years clearly benefited. They'll launch DX12-ready games before their competitors.

Log in

Don't have an account? Sign up now