Star Swarm & The Test

For today’s DirectX 12 preview, Microsoft and Oxide Games have supplied us with a newer version of Oxide’s Star Swarm demo. Originally released in early 2014 as a demonstration of Oxide’s Nitrous engine and the capabilities of Mantle, Star Swarm is a massive space combat demo that is designed to push the limits of high-level APIs and demonstrate the performance advantages of low-level APIs. Due to its use of thousands of units and other effects that generate a high number of draw calls, Star Swarm can push over 100K draw calls, a massive workload that causes high-level APIs to simply crumple.

Because Star Swarm generates so many draw calls, it is essentially a best-case scenario test for low-level APIs, exploiting the fact that high-level APIs can’t effectively spread out the draw call workload over several CPU threads. As a result the performance gains from DirectX 12 in Star Swarm are going to be much greater than most (if not all) video games, but none the less it’s an effective tool to demonstrate the performance capabilities of DirectX 12 and to showcase how it is capable of better distributing work over multiple CPU threads.

It should be noted that while Star Swarm itself is a synthetic benchmark, the underlying Nitrous engine is relevant and is being used in multiple upcoming games. Stardock is using the Nitrous engine for their forthcoming Star Control game, and Oxide is using the engine for their own game, set to be announced at GDC 2015. So although Star Swarm is still a best case scenario, many of its lessons will be applicable to these future games.

As for the benchmark itself, we should also note that Star Swarm is a non-deterministic simulation. The benchmark is based on having two AI fleets fight each other, and as a result the outcome can differ from run to run. The good news is that although it’s not a deterministic benchmark, the benchmark’s RTS mode is reliable enough to keep the run-to-run variation low enough to produce reasonably consistent results. Among individual runs we’ll still see some fluctuations, while the benchmark will reliably demonstrate larger performance trends.


Star Swarm RTS Mode

The Test

For today’s preview Microsoft, NVIDIA, and AMD have provided us with the necessary WDDM 2.0 drivers to enable DirectX 12 under Windows 10. The NVIDIA driver is 349.56 and the AMD driver is 15.200. At this time we do not know when these early WDDM 2.0 drivers will be released to the public, though we would be surprised not to see them released by the time of GDC in early March.

In terms of bugs and other known issues, Microsoft has informed us that there are some known memory and performance regressions in the current WDDM 2.0 path that have since been fixed in interim builds of Windows. In particular the WDDM 2.0 path may see slightly lower performance than the WDDM 1.3 path for older drivers, and there is an issue with memory exhaustion. For this reason Microsoft has suggested that a 3GB card is required to use the Star Swarm DirectX 12 binary, although in our tests we have been able to run it on 2GB cards seemingly without issue. Meanwhile DirectX 11 deferred context support is currently broken in the combination of Star Swarm and NVIDIA's drivers, causing Star Swarm to immediately crash, so these results are with D3D 11 deferred contexts disabled.

For today’s article we are looking at a small range of cards from both AMD and NVIDIA to showcase both performance and compatibility. For NVIDIA we are looking at the GTX 980 (Maxwell 2), GTX 750 Ti (Maxwell 1), and GTX 680 (Kepler). For AMD we are looking at the R9 290X (GCN 1.1), R9 285 (GCN 1.2), and R9 260X (GCN 1.1). As we mentioned earlier support for Fermi and GCN 1.0 cards will be forthcoming in future drivers.

Meanwhile on the CPU front, to showcase the performance scaling of Direct3D we are running the bulk of our tests on our GPU testbed with 3 different settings to roughly emulate high-end Core i7 (6 cores), i5 (4 cores), and i3 (2 cores) processors. Unfortunately we cannot control for our 4960X’s L3 cache size, however that should not be a significant factor in these benchmarks.

DirectX 12 Preview CPU Configurations (i7-4960X)
Configuration Emulating
6C/12T @ 4.2GHz Overclocked Core i7
4C/4T @ 3.8GHz Core i5-4670K
2C/4T @ 3.8GHz Core i3-4370

Though not included in this preview, AMD’s recent APUs should slot between the 2 and 4 core options thanks to the design of AMD’s CPU modules.

CPU: Intel Core i7-4960X @ 4.2GHz
Motherboard: ASRock Fatal1ty X79 Professional
Power Supply: Corsair AX1200i
Hard Disk: Samsung SSD 840 EVO (750GB)
Memory: G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)
Case: NZXT Phantom 630 Windowed Edition
Monitor: Asus PQ321
Video Cards: AMD Radeon R9 290X
AMD Radeon R9 285
AMD Radeon R7 260X
NVIDIA GeForce GTX 980
NVIDIA GeForce GTX 750 Ti
NVIDIA GeForce GTX 680
Video Drivers: NVIDIA Release 349.56 Beta
AMD Catalyst 15.200 Beta
OS: Windows 10 Technical Preview 2 (Build 9926)

Finally, while we’re going to take a systematic look at DirectX 12 from both a CPU standpoint and a GPU standpoint, we may as well answer the first question on everyone’s mind: does DirectX 12 work as advertised? The short answer: a resounding yes.

Star Swarm GPU Scaling - Extreme Quality (4 Cores)

The Current State of DirectX 12 & WDDM 2.0 CPU Scaling
Comments Locked

245 Comments

View All Comments

  • alaricljs - Friday, February 6, 2015 - link

    It takes time to devise such tests and more time validate that the test is really doing what you want and yet more time to DO the testing... and meanwhile I'm pretty sure they're not going to just drop everything else that's in the pipeline.
  • monstercameron - Friday, February 6, 2015 - link

    and amd knows that well but maybe nvidia should also...maybe?
  • JarredWalton - Friday, February 6, 2015 - link

    As an owner -- an owner that actually BOUGHT my 970, even though I could have asked for one -- of a GTX 970, I can honestly say that the memory segmentation issue isn't much of a concern. The reality is that when you're running settings that are coming close to 3.5GB of VRAM use, you're also coming close to the point where the performance is too low to really matter in most games.

    Case in point: Far Cry 4, or Assassin's Creed Unity, or Dying Light, or Dragon Age: Inquisition, or pretty much any other game I've played/tested, the GTX 980 is consistently coming in around 20-25% faster than the GTX 970. In cases where we actually come close to the 4GB VRAM on those cards (e.g. Assassin's Creed Unity at 4K High or QHD Ultra), both cards struggle to deliver acceptable performance. And there are dozens of other games that won't come near 4GB VRAM that are still providing unacceptable performance with these GPUs at QHD Ultra settings (Metro: Last Light, Crysis 3, Company of Heroes 2 -- though that uses SSAA so it really kills performance at higher quality settings).

    Basically, with current games finding a situation where GTX 980 performs fine but the GTX 970 performance tanks is difficult at best, and in most cases it's a purely artificial scenario. Most games really don't need 4GB of textures to look good, and when you drop texture quality from Ultra to Very High (or even High), the loss in quality is frequently negligible while the performance gains are substantial.

    Finally, I think it's worth noting again that NVIDIA has had memory segmentation on other GPUs, though perhaps not quite at this level. The GTX 660 Ti has a 192-bit memory interface with 2GB VRAM, which means there's 512MB of "slower" VRAM on one of the channels. That's one fourth of the total VRAM and yet no one really found cases where it mattered, and here we're talking about 1/8 of the total VRAM. Perhaps games in the future will make use of precisely 3.75GB of VRAM at some popular settings and show more of an impact, but the solution will still be the same: twiddle a few settings to get back to 80% of the GTX 980 performance rather than worrying about the difference between 10 FPS and 20 FPS, since neither one is playable.
  • shing3232 - Friday, February 6, 2015 - link

    Those people who own two 970 will not agree with you.
  • JarredWalton - Friday, February 6, 2015 - link

    I did get a second one, thanks to Zotac (I didn't pay for that one, though). So sorry to disappoint you. Of course, there are issues at times, but that's just the way of multiple GPUs, whether it be SLI or CrossFire. I knew that going into the second GPU acquisition.

    At present, I can say that Far Cry 4 and Dying Light are not working entirely properly with SLI, and neither are Wasteland 2 or The Talos Principle. Assassin's Creed: Unity seems okay to me, though there is a bit of flicker perhaps on occasion. All the other games I've tried work fine, though by no means have I tried "all" the current games.

    For CrossFire, the list is mostly the same with a few minor additions. Assassin's Creed: Unity, Company of Heroes 2, Dying Light, Far Cry 4, Lords of the Fallen, and Wasteland 2 all have problems, and scaling is pretty poor on at least a couple other games (Lichdom: Battlemage and Middle-Earth: Shadow of Mordor scale, but more like by 25-35% instead of 75% or more).

    Overall, GTX 970 SLI and R9 290X CF are basically tied at both 4K and QHD testing in my results across quite a few games, with NVIDIA taking a slight lead at 1080p and lower. In fact for single GPUs, 290X wins on average by 10% at 4K (but neither card is typically playable except at lower quality settings), while the difference is 1% or less at QHD Ultra.
  • Cryio - Saturday, February 7, 2015 - link

    "Overall, GTX 970 SLI and R9 290X CF are basically tied at both 4K and QHD testing in my results across quite a few games, with NVIDIA taking a slight lead at 1080p and lower."

    By virtue of *every* benchmark I've seen on the internets, on literally every game, 4K, maxed out settings, CrossFire of 290Xs are faster than both SLI 970s *and* 980s.

    In 1080p and 1440p, by all intents and purposes, 290Xs trade blows with 970s and the 980s reign supreme. But at 4K, the situation completely shifts and the 290Xs come on top.
  • JarredWalton - Saturday, February 7, 2015 - link

    Note that my list of games is all relatively recent stuff, so the fact that CF fails completely in a few titles certainly hurts -- and that's reflected in my averages. If we toss out ACU, CoH2, DyLi, FC4, LotF... then yes, it would do better, but then I'm cherry picking results to show the potential rather than the reality of CrossFire.
  • Kjella - Saturday, February 7, 2015 - link

    Owner of 2x970s here, reviews show that 2x780 Ti generally wins current games at 3840x2160 with only 3GB of memory so it doesn't seem to matter much today, I've seen no non-synthetic benchmarks at playable resolutions/frame rates to indicate otherwise. Nobody knows what future games will bring but I would have bought them as a "3.5 GB" card too, though of course I feel a little cheated that they're worse than the 980 GTX in a way I didn't expect.
  • JarredWalton - Saturday, February 7, 2015 - link

    I don't have 780 Ti (or 780 SLI for that matter), but interestingly GTX 780 just barely ends up ahead of a single GTX 970 at QHD Ultra and 4K High/Ultra settings. There are times where 970 leads, but the times when 780 leads are by slightly higher margins. Effectively, GTX 970 is equal to GTX 780 but at a lower price point and with less power.
  • mapesdhs - Tuesday, February 10, 2015 - link


    That's the best summary I've read on all this IMO, ie. situations which would demonstrate
    the 970's RAM issue are where performance isn't good enough anyway, typically 4K
    gaming, so who cares? Right now, if one wants better performance at that level, then
    buy one or more 980, 290X, whatever, because two of any lesser card aren't going
    to be quick enough by definition.

    I bought two 980s, first all-new GPU purchase since I bought two of EVGA's infamous
    GTX 460 FTW cards when they first came out. Very pleased with the 980s, they're
    excellent cards. Bought a 3rd for benchmarking, etc., the three combined give 8731
    for Fire Strike Ultra (result no. 4024577), I believe the highest S1155 result atm, but
    the fps numbers still aren't really that high.

    Truth is, by the time a significant number of people will be concerned about a typical
    game using more than 3.5GB RAM, GPU performance needs to be a heck of a lot
    quicker than a 970. It's a non-issue. None of the NVIDIA-hate I've seen changes the
    fact that the 970 is a very nice card, and nothing changes how well it performs as
    shown in initial reviews. I'll probably get one for my brother's bday PC I'm building,
    to go with a 3930K setup.

    Most of those complaining about all this are people who IMO have chosen to believe
    that NVIDIA did all of this deliberately, because they want that to be the case, irrespective
    of what actually happened, and no amount of evidence to the contrary will change their
    minds. The 1st Rule gets broken again...

    As I posted elsewhere, all those complainig about the specs discrepancy do however
    seem perfectly happy for AMD (and indeed NVIDIA) to market dual-GPU cards as
    having double RAM numbers which is completely wrong, not just misleading. Incredible
    hypocrasy here.

    Ian.

Log in

Don't have an account? Sign up now