Exploring DirectX 12: 3DMark API Overhead Feature Test

Name: Exploring DirectX 12: 3DMark API Overhead Feature Test
Item: Exploring DirectX 12: 3DMark API Overhead Feature Test

by Ryan Smith & Ian Cutress on March 27, 2015 8:00 AM EST

113 Comments | Add A Comment

113 Comments

To say there’s a bit of excitement for DirectX 12 and other low-level APIs is probably an understatement. A big understatement. With DirectX 12 ramping up for a release later this year, Mantle 1.0 already in pseudo-release, and its successor Vulkan under active development, the world of graphics APIs is changing in a way not seen since the earliest days, when APIs such as Direct3D, OpenGL, and numerous vendor proprietary APIs were first released. From a consumer standpoint this change will still take a number of years, but from a development standpoint 2015 is going to be the year that everything changed for PC graphics programming.

So far much has been made about the benefits of these APIs, the potential performance improvements, and ultimately what can be done and what new things can be achieved with them. The true answer to those questions are that this is going to be a multi-generational effort; until games are built from the ground-up for these APIs, developers won’t be able to make full use of their capabilities. Even then, the coolest tricks will take some number of years to develop, as developers become better acquainted with these new APIs, their idiosyncrasies, and the capabilities of the underlying hardware when interfaced with these APIs. In other words, right now we’re just scratching the surface.

The first DirectX 12 games are expected towards the end of the year, and in the meantime Microsoft and their hardware partners have been ramping up the DirectX 12 ecosystem, hammering out the API implementation in Windows 10 while the hardware vendors write and debug their WDDM 2.0 drivers. Meanwhile as this has been going on, we’ve seen a slow release of software released designed to showcase DirectX 12 features in a proof of concept manner. A number of various internal demos exist, and we saw the first semi-public DirectX 12 software release last month with our look at Star Swarm.

This week the benchmarking gurus over at Futuremark are releasing their own first run at a DirectX 12 test with their latest update for the 3DMark benchmark. Futuremark has been working away at DirectX 12 for some time – in fact they were the first partner to show DirectX 12 code in action at Microsoft’s 2014 DX12 unveiling – and now they are releasing their first DirectX 12 project.

In keeping with the general theme of the demos we’ve seen so far, Futuremark’s new DirectX 12 release is another proof of concept test. Dubbed the 3DMark API Overhead Feature Test, this benchmark is a purely synthetic benchmark designed to showcase the draw call benefits of the new API even more strongly than earlier benchmarks. Whereas Star Swarm was a best-case scenario test within the confines of a realistic graphics workload, the API Overhead Feature Test is a proper synthetic benchmark that is designed to test one thing and one thing only: how many draw calls a system can handle. The end result, as we’ll see, showcases just how great the benefits of DirectX 12 are in this situation, allowing for an order of magnitude’s improvement, if not more.

To do this, Futuremark has written a relatively simple test that draws out a very simple scene with an ever-increasing number of objects in order to measure how many draw calls a system can handle before it becomes saturated. As expected for a synthetic test, the underlying rendering task is very simple – render an immense amount of building-like objections at both the top and bottom of the screen – and the bottleneck is in processing the draw calls. Generally speaking, under this test you should either be limited by the number of draw calls you can generate (CPU limited) or limited by the number of draw calls you can consume (GPU’s command processor limited), and not the GPU’s actual rendering capabilities. The end result is that the API Overhead Feature Test can push an even larger number of draw calls than Star Swarm could.

To showcase the difference between various APIs, this test is available with DirectX 12 and Mantle, but also two different DirectX 11 modes. Standard DirectX 11 single-threading is one mode, alongside support for DirectX 11 multi-threading. The latter has a checkered history – it never did work as well in the real world as initially hoped – and in practice only NVIDIA supports it to any decent degree. But regardless, as we’ll see DirectX 12’s throughput will put even DX11MT to shame.

FutureMark’s complete technical description is posted below:

The test is designed to make API overhead the performance bottleneck. The test scene contains a large number of geometries. Each geometry is a unique, procedurally-generated, indexed mesh containing 112 -127 triangles.

The geometries are drawn with a simple shader, without post processing. The draw call count is increased further by drawing a mirror image of the geometry to the sky and using a shadow map for directional light.

The scene is drawn to an internal render target before being scaled to the back buffer. There is no frustum or occlusion culling to ensure that the API draw call overhead is always greater than the application side overhead generated by the rendering engine.

Starting from a small number of draw calls per frame, the test increases the number of draw calls in steps every 20 frames, following the figures in the table below.

To reduce memory usage and loading time, the test is divided into two parts. The second part starts at 98304 draw calls per frame and runs only if the first part is completed at more than 30 frames per second.

Draw calls per frame Draw calls per frame increment per step Accumulated duration in frames

192 – 384 12 320

384 – 768 24 640

768 – 1536 48 960

1536 – 3072 96 1280

3072 – 6144 192 1600

6144 – 12288 384 1920

12288 – 24576 768 2240

24576 – 49152 1536 2560

49152 – 98304 3072 2880

98304 – 196608 6144 3200

196608 – 393216 12288 3520

Other Notes & The Test

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

113 Comments

View All Comments

tipoo - Friday, March 27, 2015 - link
4X gains seen here

http://www.pcworld.com/article/2900814/tested-dire...
Ryan Smith - Friday, March 27, 2015 - link
Sorry, that was an error in that table. We didn't have the 4770R for this article.
geekfool - Saturday, March 28, 2015 - link
hhm pcw says " All of our tests were performed at 1280x720 resolution at Microsoft's recommendation."
if that's the case with your tests too then its seems that the real test today should be 1080p and a provisional 4k/UHD1 to get a set of future core numbers regardless of MS's wishes...
Ryan Smith - Sunday, March 29, 2015 - link
720p is the internal rendering resolution, and is used to avoid potential ROP bottlenecks (especially at the early stages). This is supposed to be a directed, synthetic benchmark, and the ability to push pixels is not what is intended to be tested.

That said, the actual performance impact from switching resolutions on most of these GPUs is virtually nil since there's more than enough ROP throughput for all of this.
Winterblade - Friday, March 27, 2015 - link
Very interesting results, and very informative article, the only small caveat I find is that for proper comparison of 2, 4 and 6 cores (seems to be one of the focal points of the article) the clock should be the same for all 3 configurations, it is a bit misleading otherwise. The difference seems to be around 10 - 15% in going from 4 to 6 cores but there is also a 10% difference in clock rate between them.
chizow - Friday, March 27, 2015 - link
Fair point, it almost looks like they are trying to artifically force some contrast in the results there. Biggest issue I have with that is you are more likely to find higher clocked 4-cores in the wild since they tend to overclock better than the TDP and size limited 6-core chips.

That's the tradeoff any power-user faces there, higher overclock on that 4790K (and soon Broadwell-K) chip or the higher L3 cache and more cores of a 6-core chip with lower OC potential.
dragonsqrrl - Friday, March 27, 2015 - link
I got 1.7M draw calls per second with an i7-970 and GTX480 in DX11, and 2.3M in DX11MT. Pretty much identical to every other Nvidia card benchmarked. Interested to see what kind of draw call gains I get with a 480 once Windows 10 and DX12 come out with finalized drivers.
godrilla - Friday, March 27, 2015 - link
Vulkan seems more attractive for devs though.

The battle of the APIs incoming.
junky77 - Friday, March 27, 2015 - link
Well, currently, the limiting factor is almost always the GPU, with with a powerful GPU, unless we are talking AMD CPUs which are TDP limtied in many cases or an I3 and even then the differences are not great

So, I think that it's mainly a look for the future, allowing higher draws scenes, potentially
Mat3 - Friday, March 27, 2015 - link
Would be interesting to see how the FX-8350 compares to the i7-4960X for this test.

Exploring DirectX 12: 3DMark API Overhead Feature Test

Post Your Comment

113 Comments

View All Comments

tipoo - Friday, March 27, 2015 - link

Ryan Smith - Friday, March 27, 2015 - link

geekfool - Saturday, March 28, 2015 - link

Ryan Smith - Sunday, March 29, 2015 - link

Winterblade - Friday, March 27, 2015 - link

chizow - Friday, March 27, 2015 - link

dragonsqrrl - Friday, March 27, 2015 - link

godrilla - Friday, March 27, 2015 - link

junky77 - Friday, March 27, 2015 - link

Mat3 - Friday, March 27, 2015 - link

Log in

Don't have an account? Sign up now

Draw calls per frame	Draw calls per frame increment per step	Accumulated duration in frames
192 – 384	12	320
384 – 768	24	640
768 – 1536	48	960
1536 – 3072	96	1280
3072 – 6144	192	1600
6144 – 12288	384	1920
12288 – 24576	768	2240
24576 – 49152	1536	2560
49152 – 98304	3072	2880
98304 – 196608	6144	3200
196608 – 393216	12288	3520