Direct3D 12 In Depth

This brings us to Direct3D 12, which is Microsoft’s entry into the world of low level graphics programming. Microsoft is still neck-deep in development of Direct3D 12 so far – they’re currently targeting it for game releases in Holiday 2015, roughly 18 months off – and as such Microsoft hasn’t released a ton of details about the API to the public yet. But they have given us a broad overview of what the plan to accomplish, with a couple of technical details on how they will be doing this.

At a high level there is no denying the fact that Direct3D 12 looks a lot like Mantle. Microsoft has set out with the same basic goals as AMD did with Mantle and looks to be achieving some of them in the same manner. Which to no surprise then that the end products are going to be similar as a result.

As with Mantle, the primary goal for Direct3D 12 is to greatly reduce the CPU overhead that we’ve talked about previously. As the biggest source of CPU overhead is having Direct3D assemble the command lists/buffers for a GPU, Direct3D 12 will be moving that job over to developers. By assembling their own command lists developers can more easily spread out the task over multiple cores, and this alone will have a significant impact on CPU utilization. At this point we don’t know what Direct3D 12 command lists will look like, and this will likely be one of the design choices that separates Direct3D 12 from Mantle, but there’s no reason at this time to expect them to be much different.


For Comparison: D3D11 Command Buffer

Microsoft will also be introducing a similar concept, a bundle, which is functionally a form of a reusable command list. This again is another CPU saving step, as using a bundle in place of multiple command lists further cuts down on the amount of CPU time spent making submissions. In this case the idea behind a bundle is to submit work once, and then allow the bundle to be executed multiple times with minor variations. Microsoft specifically notes having a character drawn twice with different textures as being a use case for this structure.

Meanwhile it’s interesting to note that with this change Microsoft has admitted that Direct3D 11 style immediate/deferred command lists haven’t lived up to their goals, stating “deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.” To our knowledge the only game able to make significant use of the feature was Civilization V, and even then we’ve seen AMD video cards perform very well without supporting the feature.

Moving on, Direct3D 12 will also be introducing pipeline state objects. With pipeline state objects we’re really getting into the nitty-gritty of command buffer execution and how the various graphics architectures differ, but the important bit to take away is that most architectures don’t have the ability to freely transition between pipeline states as much as Direct3D 11 would like. This leads to problems for how quickly the hardware state can be set, as Direct3D must go back and take into account these hardware limitations.

The solution to this will be the aforementioned pipeline state objects (PSOs). PSOs bypass some of these pipeline limitations by using objects that are finalized on creation. Nitty-gritty details aside, the outcome from this is that it further reduces CPU overhead, once again increasing the number of draw calls the CPU can submit or freeing it up for other tasks.

The final major addition to Direct3D 12 is descriptor heaps. Going back to 2012, one of the features introduced on NVIDIA’s then-new Kepler architecture was bindless resources, which bypassed the previous 128 slot limitation on resources (textures, etc). Through bindless an essentially infinite number of resources could be addressed, at a performance penalty, though an additional layer of indirection in memory accesses.

Descriptor heaps in turn appear to be the integration of bindless resources in Direct3D 12. Microsoft does not specifically call descriptor heaps bindless, but the description of slots and draw calls makes it clear that they’re intending to solve the problem with the bindless solution. With descriptor heaps and descriptor tables to reside in those heaps, Direct3D 12 will be able to perform bindless operations, both expanding the number of resources available to shader programs, and even outright dynamic indexing of resources.

Finally, there are a few miscellaneous features that have popped up in Microsoft’s slides that have caught our attention, if only due to the lack of details provided. Specifically, the mention of compressed resources stands out. The resources mentioned, ASTC and JPEG, are not resources formats that we know to be supported on any current PC GPU. In the case of ASTC, Khronos’s next generation texture compression format, it is a finalized standard that will be supported on all GPUs in time as a core part of the OpenGL standard. Meanwhile JPEG is not a feature we’ve seen on any API roadmaps before.


Image Courtesy PC Perspective

To that end, the addition of ASTC is not all that surprising. Since it is royalty free and not otherwise restricted to OpenGL-only, there’s no reason not to support it when all of the underlying hardware will (eventually) support it anyhow.

JPEG on the other hand is a very curious thing to mention, as its lack of existence on any API roadmaps goes along with the fact that we’re not aware of anyone having announced plans to support JPEG in hardware. Furthermore JPEG is not a fixed ratio compressor – the number of bits a given sized input will generate can vary – which for GPUs would typically be a bad thing. It stands to reason then that Microsoft knows a bit more about what features are in the R&D pipelines for the GPU makers, and that someone will be implementing hardware JPEG support. So we’ll have to keep an eye on this and see what pops up.

Making a Common Low Level API

The need for a low level graphics API like Direct3D 12 is clear, but establishing a common API is no easy task. Abstraction is both what gives Direct3D 11 its ability to work on multiple platforms and robs Direct3D 11 of some of its performance. So to make a low level API that works across AMD, NVIDIA, Intel, Qualcomm, and others’ GPUs requires a careful balancing act to bring low level API improvements while adding no more abstraction than is necessary.

At this stage in development Microsoft is not ready to talk about that aspect of API development; for the moment that level of access is restricted to a small group of approved developers. But given their hardware requirements we can make a few educated guesses about what’s going on behind the scenes.

Of the big 3 GPU vendors, all of them have confirmed what GPUs will be supported. For Intel their Gen 7.5 GPUs (Haswell generation) will support Direct3D 12. As for NVIDIA, Fermi, Kepler, and Maxwell will support Direct3D 12. And for AMD, GCN 1.0 and GCN 1.1 will support Direct3D 12.

Direct3D 12 Confirmed Supported GPUs
AMD GCN 1.0 (Radeon 7000/8000/200)
GCN 1.1 (Radeon 200)
Intel Gen 7.5 (Haswell/4th Gen Core)
NVIDIA Fermi (GeForce 400/500)
Kepler (GeForce 600/700/800)
Maxwell (GeForce 700/800)

The interesting thing about all of this is what’s excluded: namely, AMD’s D3D11 VLIW5 and VLIW4 architectures. We’ve written about VLIW in comparison to GCN in great depth, and the takeaway from that is that unlike any of the other architectures here, only AMD was using a VLIW design. Every architecture has its strengths and weaknesses, and while VLIW could pack a lot of hardware in a small amount of space, the inflexible scheduling inherent to the execution model was a very big part of the reason that AMD moved to GCN, along with a number of special cases regarding pipeline and memory operations.

Now why do we bring this up? Because with GCN, Fermi, and Gen 7.5, all PC GPUs suddenly started looking a lot more alike. To be clear there are still a number of differences between these architectures, from their warp/wavefront size to how their SIMDs are organized and what they’re capable of. But the important point is that with each successive generation, driven by the flexibility required for efficient GPU computing, these architectures have become more and more alike. They’re far more similar now than they have been since even the earliest days of programmable GPUs.


Wavefront Execution Example: SIMD vs. VLIW. Not To Scale - Wavefront Size 16

Ultimately, all of this is a long-winded way of saying that a bit part of the reason that there can even be a common low level graphics API is because the hardware has homogenized to the point where less and less abstraction is necessary. On a spectrum ranging from a shared ISA (e.g. x86) to widely divergent designs, we’re nowhere near the former, but importantly we’re also nowhere near the latter. This is a subject we’re going to have to watch with great interest, because MS and the GPU vendors (through their drivers) are still going to have to introduce some level of abstraction to make everyone work together through a single common low level API. But the situation with modern hardware means that (with any luck) the additional abstraction with Direct3D 12 over something like Mantle will prove to be insignificant.

Finally, it’s worth pointing out that last week’s developments with Direct3D couldn’t be happening without a degree of political backbone, too. The problem in introducing any new graphics standard is not just technical, but in bringing together companies with differing interests and whose best interests don’t necessarily involve fast-tracking every technology proposed.

Microsoft to that end currently holds a very interesting spot in the world of PC graphics, being the maintainer of the most popular PC graphics API. And unlike the designed-by-committee OpenGL, Microsoft has some (but not complete) leverage to push new technologies through when the GPU vendors and software vendors would otherwise be at loggerheads with each other. So while Microsoft is being clear this is a joint effort between all of the involved parties, there’s still something to be said for having the influence and power to bring down changes that may not be popular with everyone.

Why Low Level Programming? Game Development, Consoles, and Mobile Devices
Comments Locked

105 Comments

View All Comments

  • ninjaquick - Tuesday, March 25, 2014 - link

    It is not BS at all. Developers have been asking, even crying out, for low level access to GPU hardware on PC for ages. The Xbox One was the last straw though, currently it is no more programmable than a PC. This caused Crytek a massive headache as they budgeted rendering based on 'to the metal' efficiency, and were instead met with massive draw overheads forcing them to severely reduce the quality of their work in 'Ryse'. Other developers have complained on the very same thing.. The Xbox 360 is more programmable. The benefit D3D12 has this time around is that the X1 is based on a hybrid WindowsRT/8x64, meaning d3d12 can be pushed to all win8 gen devices.
  • rootheday3 - Monday, March 24, 2014 - link

    I am pretty sure that at least one of the demos (3D MARK?) was actually run on Haswell iGpu - meaning Intel is well along on driver development. Some if the announced features (support for order independent transparency) also sound like Intrl extensions on dx11 (pixel sync)
  • rootheday3 - Monday, March 24, 2014 - link

    Also- reducing driver cost and single threaded perf should also help ensure that mobile gaming on laptops and tablets is less likely to be cpu bound due to frequency constraints. Should also allow more of the thermal budget to go to gpu for better rendering/ less throttling.
  • Zak - Tuesday, March 25, 2014 - link

    "Powered by a GeForce GTX Titan Black, Microsoft tells us the demo is capable of sustaining 60fps."

    Titan Black? No kidding. At what resolution?
  • Scali - Tuesday, March 25, 2014 - link

    "To use consoles as an example once again, this is why they are capable of so much with such a (relatively) weak CPU, as they’re better able to utilize their multiple CPU cores than a high level programmed PC can."

    This is patently false.
    Namely, the PS3 with its Cell has only a single regular CPU core. The SPEs are very limited and not suitable for batching up draw calls.
    The XBox 360 is a more 'regular' CPU, but it has 'only' 3 cores, and the rendering is mostly done on a single core. (PS4 and XBox One are too new to draw any conclusions yet, so 'console efficiency' is what we know of consoles that are not all about multithreading).

    You are confusing low-level with multithreading. Low-level is about programming the GPU with a very direct path, little abstraction. That is why it is efficient on consoles. There is a much thinner layer between OS, GPU, driver, API and application than on a regular PC.

    Multithreading is another way of speeding up graphics, but this does not necessarily require programming the GPU directly. Which is also not what DX12 is going to do. It will still abstract the hardware to a point where vendor and implementation are not relevant. But it will allow better control of batching up calls on threads other than the master rendering thread (D3D11 already has support for multithreading, but it has some limitations).

    It seems that AMD has done a great job on confusing the general public, with its Mantle propaganda.
  • Death666Angel - Tuesday, March 25, 2014 - link

    All these articles make me think of John Carmack's QuakeCon 2013 (I think) keynote where he talked about having the same access to the GPU as he does to the CPU and basically programming in machine code. Hope this is coming. :) I need the performance for 120fps/4k Oculus Rift games! :D
  • Scali - Tuesday, March 25, 2014 - link

    The ironly is that the original D3D had low-level access to the GPU, with its execute buffer system. This would easily allow multiple threads to batch up commands for the GPU in parallel efficiently.
    But Carmack complained that the API was too hard to use.
    It looks like we're going back in time somewhat, getting closer to the original D3D.
  • Ramon Zarat - Tuesday, March 25, 2014 - link

    The only thing I really want to know is this:

    Does the current gen GPU hardware, mainly Kepler/Maxwell and Tahiti/Hawaii, *technically* ABLE to support DX12? With strong emphasis on ABLE, as even if they are able, I seriously doubt AMD or Nvidia will be "charitable" enough to actually do it and instead, force us all to upgrade again, thanks to the wonder of artificial market segmentation.

    Will we see modded driver enabling (at least partially) DX12 featues on current hardware? That would be interesting... For example, I'm currently running a modded Intel OROM BIOS on my Z68 board so it can use TRIM under RAID0 with SSD. With the Z77 and Z87, no TRIM problem out of the box and they are 99.9% identical to the Z68 SATA controller. I had ZERO problem in 2 1/2 years, so yeah, TRIM work in RAID0 SSD on the Z68, thanks a lot, Intel...for nothing.
  • inighthawki - Wednesday, March 26, 2014 - link

    Considering nvidia made it a huge point that Fermi and above will be supported and represent >50% of the existing market, yeah probably. Why would they announce that then not write drivers for it?
  • TheJian - Wednesday, March 26, 2014 - link

    "For low level PC graphics APIs Mantle will be the only game in town for the next 18-20 months; but after that, then what?"

    Ignorance or Dumbest comment I've seen this year ;) Either ignorant of just trying to pretend OpenGL can't do this already for years. You forgot the OpenGL speeches showing 5-30x better draw calls and it is ALREADY here, not 2yrs away like DX12.
    http://blogs.nvidia.com/blog/2014/03/20/opengl-gdc...
    Still showing bias here I guess...Fee free to watch the 52min video in that link on DRAW CALLS and how pointless Mantle is as OpenGL has had this stuff for years. He even shows some code.
    "How to get a crap-ton more draw calls" (a new technical term he said in the video…LOL), which clearly is telling you in the first minute, there is no need for mantle as it is about 10x draw calls right? Mantle=dead. If not by DX12 (but this isn’t out for a while-Win9?), then OpenGL/SteamOS/Android pushing OpenGL.

    NV’s speech wasn't a SMALL speech or there wouldn't be a 130 slide doc (at least you guys mentioned it, that's not enough) explaining what they covered right (on top of the 52min dev day video covering Draw Calls +many others)? Considering it is ALREADY working in opengl (same crap as mantle, even carmack said you can do the same thing already and get as close to metal as you'd like in OpenGL with extensions a YEAR ago), I'm sure the OpenGL speeches had more concrete info than the DX speech at GDC (valve didn’t say a word about anything BUT OpenGL recently and how to port DX9 to it). How can you not detail the info on OpenGL and call yourself a hardware site? I know you guys hate cuda (or you'd test it vs. opencl/AMD repeatedly to death), but why the hate for OpenGL which NV has no control over?

    Mantle's main competitor for 2yrs while we wait for DX12 is ...wait for it...OPENGL, which already does the same crap mantle does... ;) I expected nothing less from Ryan Smith. Where is the big OpenGL speeches coverage? Devs skipped the VR speech to hear about the draw call speech (it was going on right next door and John Mcdonald even says at the end of the vid he'd be in there if he wasn't giving the speech...LOL) and NV was a bit surprised those devs skipped VR. They expected 5 people, got a crowd wanting OpenGL draw call info ;)

    But believe Ryan, Mantle's only competition is DX even though OpenGL (ES) is the only thing really used in games on mobile which will further Valves desktop opengl push too. Steam Dev Days was all about leaving DX and going OpenGL, much of GDC was the same or on ES3.1 mobile info. DX12 won't get far unless you believe MS will take out Android/SteamOS/iOS. They are already stuck in cement being so far ahead with pure unit sales off the charts, game devs' mind share already on ios/android, etc, they can easily push OpenGL together and all 3 want DX/Wintel dead.

    Wintel lost 21% of ALL notebook share last year. This year we have 64bit cpus coming from all ARM soc players and they'll will use those to go further up the chain from crapbooks (chromebooks to me...LOL) to REAL notebooks, and low-end desktops (+some servers) and move up again with the next revs into everything high-end that x86 owns today.

    NV’s speech wasn't a SMALL speech or there wouldn't be a 130 slide doc explaining what they covered right (on top of the 52min video from steam dev days+many others)? Considering it is ALREADY working in opengl (same stuff as mantle, even carmack said you can do the same thing already and get as close to metal as you'd like in OpenGL with extensions a YEAR ago on stage), I'm sure the OpenGL speeches had more concrete info than the DX speech at GDC (valve didn’t say a word about anything BUT OpenGL). How can you not detail the info on OpenGL and call yourself a hardware site? I know you guys hate cuda (or you'd test it vs. opencl/AMD repeatedly to death), but why the hate for OpenGL which NV has no control over? It hurts mantle, is DONE now, AMD pays you for a portal etc, so it's off limits?

Log in

Don't have an account? Sign up now