Original Link: http://www.anandtech.com/show/7889/microsoft-announces-directx-12-low-level-graphics-programming-comes-to-directx



With GDC 2014 having drawn to a close, we have finally seen what is easily the most exciting piece of news for PC gamers. As previously teased by Microsoft, Microsoft took to the stage last week to announce the next iteration of DirectX: DirectX 12. And as hinted at by the session description, Microsoft’s session was all about bringing low level graphics programming to Direct3D.

As is often the case for these early announcements Microsoft has been careful on releasing too many technical details at once. But from their presentation and the smaller press releases put together by their GPU partners, we’ve been given our first glimpse at Microsoft’s plans for low level programming in Direct3D.

Preface: Why Low Level Programming?

The subject of low level graphics programming has become a very hot topic very quickly in the PC graphics industry. In the last 6 months we’ve gone from low level programming being a backburner subject, to being a major public initiative for AMD, to now being a major initiative for the PC gaming industry as a whole through Direct3D 12. The sudden surge in interest and development isn’t a mistake – this is a subject that has been brewing for years – but it’s within the last couple of years that all of the pieces have finally come together.

But why are we seeing so much interest in low level graphics programming on the PC? The short answer is performance, and more specifically what can be gained from returning to it.

Something worth pointing out right away is that low level programming is not new or even all that uncommon. Most high performance console games are written in such a manner, thanks to the fact that consoles are fixed platforms and therefore easily allow this style of programming to be used. By working with hardware at such a low level programmers are able to tease out a great deal of performance of this hardware, which is why console games look and perform as well as they do given the consoles’ underpowered specifications relative to the PC hardware from which they’re derived.

However with PCs the same cannot be said. PCs, being a flexible platform, have long worked off of high level APIs such as Direct3D and OpenGL. Through the powerful abstraction provided by these high level APIs, PCs have been able to support a wide variety of hardware and over a much longer span of time. With low level PC graphics programming having essentially died with DOS and vendor specific APIs, PCs have traded some performance for the convenience and flexibility that abstraction offers.

The nature of that performance tradeoff has shifted over the years though, requiring that it be reevaluated. As we’ve covered in great detail in our look at AMD’s Mantle, these tradeoffs were established at a time when CPUs and GPUs were growing in performance by leaps and bounds year after year. But in the last decade or so that has changed – CPUs are no longer rapidly increasing in performance, especially in the case of single-threaded performance. CPU clockspeeds have reached a point where higher clockspeeds are increasingly power-expensive, and the “low hanging fruit” for improving CPU IPC has long been exhausted. Meanwhile GPUs have roughly continued their incredible pace of growth, owing to the embarrassingly parallel nature of graphics rendering.

The result is that when looking at single threaded CPU performance, GPUs have greatly outstripped CPU performance growth. This in and of itself isn’t necessarily a problem, but it does present a problem when coupled with the high level APIs used for PC graphics. The bulk of the work these APIs do in preparing data for GPUs is single threaded by its very nature, causing the slowdown in CPU performance increases to create a bottleneck. As a result of this gap and its ever-increasing nature, the potential for bottlenecking has similarly increased; the price of abstraction is the CPU performance required to provide it.

Low level programming in contrast is more resistant against this type of bottlenecking. There is still the need for a “master” thread and hence the possibility of bottlenecking on that master, but low level programming styles have no need for a CPU-intensive API and runtime to prepare data for GPUs. This makes it much easier to farm out work to multiple CPU cores, protecting against this bottlenecking. To use consoles as an example once again, this is why they are capable of so much with such a (relatively) weak CPU, as they’re better able to utilize their multiple CPU cores than a high level programmed PC can.

The end result of this situation is that it has become time to seriously reevaluate the place of low level graphics programming in the PC space. Game developers and GPU vendors alike want better performance. Meanwhile, though it’s a bit cynical, there’s a very real threat posed by the latest crop of consoles, putting PC gaming in a tight spot where it needs to adapt to keep pace with the consoles. PCs still hold a massive lead in single-threaded CPU performance, but given the limits we’ve discussed earlier, too much bottlenecking can lead to the PC being the slower platform despite the significant hardware advantage. A PC platform that can process fewer draw calls than a $400 game console is a poor outcome for the industry as a whole.



Direct3D 12 In Depth

This brings us to Direct3D 12, which is Microsoft’s entry into the world of low level graphics programming. Microsoft is still neck-deep in development of Direct3D 12 so far – they’re currently targeting it for game releases in Holiday 2015, roughly 18 months off – and as such Microsoft hasn’t released a ton of details about the API to the public yet. But they have given us a broad overview of what the plan to accomplish, with a couple of technical details on how they will be doing this.

At a high level there is no denying the fact that Direct3D 12 looks a lot like Mantle. Microsoft has set out with the same basic goals as AMD did with Mantle and looks to be achieving some of them in the same manner. Which to no surprise then that the end products are going to be similar as a result.

As with Mantle, the primary goal for Direct3D 12 is to greatly reduce the CPU overhead that we’ve talked about previously. As the biggest source of CPU overhead is having Direct3D assemble the command lists/buffers for a GPU, Direct3D 12 will be moving that job over to developers. By assembling their own command lists developers can more easily spread out the task over multiple cores, and this alone will have a significant impact on CPU utilization. At this point we don’t know what Direct3D 12 command lists will look like, and this will likely be one of the design choices that separates Direct3D 12 from Mantle, but there’s no reason at this time to expect them to be much different.


For Comparison: D3D11 Command Buffer

Microsoft will also be introducing a similar concept, a bundle, which is functionally a form of a reusable command list. This again is another CPU saving step, as using a bundle in place of multiple command lists further cuts down on the amount of CPU time spent making submissions. In this case the idea behind a bundle is to submit work once, and then allow the bundle to be executed multiple times with minor variations. Microsoft specifically notes having a character drawn twice with different textures as being a use case for this structure.

Meanwhile it’s interesting to note that with this change Microsoft has admitted that Direct3D 11 style immediate/deferred command lists haven’t lived up to their goals, stating “deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.” To our knowledge the only game able to make significant use of the feature was Civilization V, and even then we’ve seen AMD video cards perform very well without supporting the feature.

Moving on, Direct3D 12 will also be introducing pipeline state objects. With pipeline state objects we’re really getting into the nitty-gritty of command buffer execution and how the various graphics architectures differ, but the important bit to take away is that most architectures don’t have the ability to freely transition between pipeline states as much as Direct3D 11 would like. This leads to problems for how quickly the hardware state can be set, as Direct3D must go back and take into account these hardware limitations.

The solution to this will be the aforementioned pipeline state objects (PSOs). PSOs bypass some of these pipeline limitations by using objects that are finalized on creation. Nitty-gritty details aside, the outcome from this is that it further reduces CPU overhead, once again increasing the number of draw calls the CPU can submit or freeing it up for other tasks.

The final major addition to Direct3D 12 is descriptor heaps. Going back to 2012, one of the features introduced on NVIDIA’s then-new Kepler architecture was bindless resources, which bypassed the previous 128 slot limitation on resources (textures, etc). Through bindless an essentially infinite number of resources could be addressed, at a performance penalty, though an additional layer of indirection in memory accesses.

Descriptor heaps in turn appear to be the integration of bindless resources in Direct3D 12. Microsoft does not specifically call descriptor heaps bindless, but the description of slots and draw calls makes it clear that they’re intending to solve the problem with the bindless solution. With descriptor heaps and descriptor tables to reside in those heaps, Direct3D 12 will be able to perform bindless operations, both expanding the number of resources available to shader programs, and even outright dynamic indexing of resources.

Finally, there are a few miscellaneous features that have popped up in Microsoft’s slides that have caught our attention, if only due to the lack of details provided. Specifically, the mention of compressed resources stands out. The resources mentioned, ASTC and JPEG, are not resources formats that we know to be supported on any current PC GPU. In the case of ASTC, Khronos’s next generation texture compression format, it is a finalized standard that will be supported on all GPUs in time as a core part of the OpenGL standard. Meanwhile JPEG is not a feature we’ve seen on any API roadmaps before.


Image Courtesy PC Perspective

To that end, the addition of ASTC is not all that surprising. Since it is royalty free and not otherwise restricted to OpenGL-only, there’s no reason not to support it when all of the underlying hardware will (eventually) support it anyhow.

JPEG on the other hand is a very curious thing to mention, as its lack of existence on any API roadmaps goes along with the fact that we’re not aware of anyone having announced plans to support JPEG in hardware. Furthermore JPEG is not a fixed ratio compressor – the number of bits a given sized input will generate can vary – which for GPUs would typically be a bad thing. It stands to reason then that Microsoft knows a bit more about what features are in the R&D pipelines for the GPU makers, and that someone will be implementing hardware JPEG support. So we’ll have to keep an eye on this and see what pops up.

Making a Common Low Level API

The need for a low level graphics API like Direct3D 12 is clear, but establishing a common API is no easy task. Abstraction is both what gives Direct3D 11 its ability to work on multiple platforms and robs Direct3D 11 of some of its performance. So to make a low level API that works across AMD, NVIDIA, Intel, Qualcomm, and others’ GPUs requires a careful balancing act to bring low level API improvements while adding no more abstraction than is necessary.

At this stage in development Microsoft is not ready to talk about that aspect of API development; for the moment that level of access is restricted to a small group of approved developers. But given their hardware requirements we can make a few educated guesses about what’s going on behind the scenes.

Of the big 3 GPU vendors, all of them have confirmed what GPUs will be supported. For Intel their Gen 7.5 GPUs (Haswell generation) will support Direct3D 12. As for NVIDIA, Fermi, Kepler, and Maxwell will support Direct3D 12. And for AMD, GCN 1.0 and GCN 1.1 will support Direct3D 12.

Direct3D 12 Confirmed Supported GPUs
AMD GCN 1.0 (Radeon 7000/8000/200)
GCN 1.1 (Radeon 200)
Intel Gen 7.5 (Haswell/4th Gen Core)
NVIDIA Fermi (GeForce 400/500)
Kepler (GeForce 600/700/800)
Maxwell (GeForce 700/800)

The interesting thing about all of this is what’s excluded: namely, AMD’s D3D11 VLIW5 and VLIW4 architectures. We’ve written about VLIW in comparison to GCN in great depth, and the takeaway from that is that unlike any of the other architectures here, only AMD was using a VLIW design. Every architecture has its strengths and weaknesses, and while VLIW could pack a lot of hardware in a small amount of space, the inflexible scheduling inherent to the execution model was a very big part of the reason that AMD moved to GCN, along with a number of special cases regarding pipeline and memory operations.

Now why do we bring this up? Because with GCN, Fermi, and Gen 7.5, all PC GPUs suddenly started looking a lot more alike. To be clear there are still a number of differences between these architectures, from their warp/wavefront size to how their SIMDs are organized and what they’re capable of. But the important point is that with each successive generation, driven by the flexibility required for efficient GPU computing, these architectures have become more and more alike. They’re far more similar now than they have been since even the earliest days of programmable GPUs.


Wavefront Execution Example: SIMD vs. VLIW. Not To Scale - Wavefront Size 16

Ultimately, all of this is a long-winded way of saying that a bit part of the reason that there can even be a common low level graphics API is because the hardware has homogenized to the point where less and less abstraction is necessary. On a spectrum ranging from a shared ISA (e.g. x86) to widely divergent designs, we’re nowhere near the former, but importantly we’re also nowhere near the latter. This is a subject we’re going to have to watch with great interest, because MS and the GPU vendors (through their drivers) are still going to have to introduce some level of abstraction to make everyone work together through a single common low level API. But the situation with modern hardware means that (with any luck) the additional abstraction with Direct3D 12 over something like Mantle will prove to be insignificant.

Finally, it’s worth pointing out that last week’s developments with Direct3D couldn’t be happening without a degree of political backbone, too. The problem in introducing any new graphics standard is not just technical, but in bringing together companies with differing interests and whose best interests don’t necessarily involve fast-tracking every technology proposed.

Microsoft to that end currently holds a very interesting spot in the world of PC graphics, being the maintainer of the most popular PC graphics API. And unlike the designed-by-committee OpenGL, Microsoft has some (but not complete) leverage to push new technologies through when the GPU vendors and software vendors would otherwise be at loggerheads with each other. So while Microsoft is being clear this is a joint effort between all of the involved parties, there’s still something to be said for having the influence and power to bring down changes that may not be popular with everyone.



The Changing State of Game Development

The entry of Microsoft and Direct3D into this world stands to significantly change the status quo, due to the fact that Direct3D is by far the most widely used PC graphics API. As the maintainer of Direct3D Microsoft gets to set the pace in the PC graphics industry in several ways, so while Direct3D 12 won’t be the first modern low level graphics API, there’s little question after this announcement that it’s going to have the widest impact on game developers.

Perhaps the biggest reason for this is because of the fact that like every version of Direct3D before it, Direct3D 12 is going to be a cross-vendor standard that works on multiple GPUs. Though I don’t think it’s wise to treat Mantle and Direct3D as competitors at this point, the fact that this is a cross-vendor standard and not an AMD standard means that using it targets every video card and not just AMD video cards. So for all of the impact Mantle has had over the past 6 months, and will continue to have over the coming years, the fact that we’re to a point where there’s a cross-vendor standard will be a significant milestone.

That said, whenever we talk about low level programming it’s good to also recall who this model is and isn’t for. The purpose of abstraction is not only to provide wider hardware compatibility, but to outright hide certain types of execution ugliness from programmers. The reduction in abstraction will bring with it a reduction in the amount of this ugliness that gets hidden, and as a result the amount of knowledge needed to efficiently program at a low level goes up. Low level programing should not require a code wizard, but it’s unquestionably harder than straightforward (no optimization tricks) Direct3D 11.

Which is why the launch of Direct3D 12 is poised to increase the number of options available to graphics programmers, but not replace the high level programming model entirely. The development teams best suited for taking advantage of Direct3D 12 will be the well-funded AAA game studios, particularly those doing multi-platform titles across PCs and consoles. If you’re already doing low level programming for Xbox One and Playstation 4 – and more importantly have the staff and institutional knowledge for such an endeavor – then Direct3D 12 is but a small step, mostly one of learning the syntax of the new API. But for smaller game developers that aren’t able to put together large, experienced game development teams, then a need for a high level programming API will remain. Microsoft has not talked about high level programming within the context of Direct3D 12 thus far, but one way or another – be it Direct3D 11 or a high level friendly Direct3D 12 – high level programming will be here to stay.

Though when it comes to development, the role of middleware cannot be ignored. AMD and NVIDIA already target middleware developers for integration of their proprietary technologies, and the same concept applies on a larger scale when we’re talking about making low level programming accessible to more developers. Furthermore with the massive change in middleware licensing terms we’re seeing with this generation – Unreal Engine 4 for example is just 5% of gross revenues for smaller developers that can’t negotiate otherwise – powerful middleware is increasingly accessible to all categories of developers. So even if smaller developers can’t internally develop their own Direct3D 12 code, they will have the ability to target it by inheriting the capabilities through the middleware they use.

Consoles & Mobile Devices Too

The introduction of Direct3D 12 stands to not only change the nature of graphics development for Windows, but on other Microsoft platforms too. With Microsoft’s consumer arm having their hand in everything from phones to consoles, Microsoft is seeking to extend Direct3D 12 and its benefits to these platforms too.

Specifically, Microsoft is already committing to bringing Direct3D 12 to the Xbox One, their current-generation console. Powered by an AMD SoC whose GPU in turn is based on GCN 1.1, the Xbox One is functionally an x86 PC with a modern AMD GPU, so the fact that this is even technically possible is not a surprise. But what does come as a surprise is that the Direct3D12 API is different enough that this is even necessary.

The Xbox One, as you may recall, uses Microsoft’s Direct3D 11.X API. This details of this API are scarce as they’re only open to registered Xbox One developers, but fundamentally it’s said to be a variant of Direct3D 11 with a number of Xbox One additions, including low level API features that would be suitable for programming a console. Having the Xbox One be in alignment with Direct3D 12 is going to be a good thing regardless – it will make porting between the platforms easier – but the fact that Direct3D 12 will bring any kind of meaningful improvement to the Xbox One is unexpected. Without more details on the Xbox One API it’s impossible to say with any certainty what exact functionality isn’t currently available in Direct3D 11.X or what kind of performance benefit this would bring the Xbox One, but it stands to reason that unless most Xbox One programmers have been doing high level programming, the gains won’t be as great as for the PC.

Moving on, we have the fact that Microsoft will also be bringing Direct3D 12 to handheld devices. We’re presumably talking about Windows RT tablets and Windows Phone phones, extending Direct3D 12 to the bottom as well as it goes to the top on the PC. Handheld devices stand to gain just as much from this as PCs and consoles do, due to the fact that handheld devices are even more CPU-bottlenecked than PC laptops and desktops, so a low level API is as much a natural development for these platforms as it is the PC.

The question on our end is what kind of impact this will have on the Direct3D 12 standard with respect to abstraction. SoC-class GPUs are typically years behind PC GPUs in functionality (never mind performance), and at least among current GPUs wildly differ from each other in ways the PC GPU market hasn’t seen in years. So while extending Direct3D 12 to cover multiple PC GPUs should be relatively easy, having to support SoC GPUs certainly muddles the picture. This may mean Microsoft is looking at the long view here, when SoCs such as the Tegra K1 come along with feature sets that match recent PC architectures, coupled with the fact that Windows RT/Phone has not traditionally supported a large number of SoC GPU architectures. In which case only having to cover a handful of SoC GPU architectures instead of all 7 would certainly be an easier task.



Early Direct3D 12 Demos

Wrapping things up, while DirectX 12 is not scheduled for public release until the Holiday 2015 time period, Microsoft tells us that they’ve already been working on the API for a number of years now. So although the API is 18-20 months off from its public release, Microsoft already has a very early version up and running on partner NVIDIA’s hardware.

In their demos Microsoft showed off a couple of different programs. The first of which was Futuremark’s 3DMark 2011, which along with being a solid synthetic benchmark for heavy workloads, also offers the ability to easily be dissected to find bottlenecks and otherwise monitor the rendering process.


3DMark 2011 CPU Time: Direct3D 11 vs. Direct3D 12

As part of their presentation Microsoft showed off some CPU utilization data comparing the Direct3D 11 and Direct3D 12 versions of 3DMark, which succinctly summarize the CPU performance gains. By moving the benchmark to Direct3D 12, Microsoft and Futuremark were able to significantly reduce the single-threaded bottlenecking, distributing more of the User Mode Driver workload across multiple threads. Meanwhile the use of the Kernel Mode Driver and the CPU time it consumed were eliminated entirely, as was some time within the Windows kernel itself. Finally, the amount of time spent within Direct3D was again reduced.

This benchmark likely leans towards a best case outcome for the use of Direct3D 12, but importantly it does show all of the benefits of a low level API at once. Some of the CPU workload has been distributed to other threads, other aspects of the CPU workload have been eliminated entirely. Yet despite all of this there’s still a clear “master” thread, showcasing the fact that not even the use of a low level graphics API can result in the workload being perfectly distributed among CPU threads. So there will still be a potential single-threaded bottleneck even with Direct3D 12, however it will be greatly diminished compared to the kinds of bottlenecking that could occur before.

Moving on, Microsoft’s other demo was a game demo, showcasing Forza Motorsport 5 running on a PC. Developer Turn 10 had ported the game from Direct3D 11.X to Direct3D 12, allowing the game to easily be run on a PC. Powered by a GeForce GTX Titan Black, Microsoft tells us the demo is capable of sustaining 60fps.

First Thoughts

Wrapping things up, it’s probably best to start with a reminder that this is a beginning rather than an end. While Microsoft has finally publically announced DirectX 12, what we’ve seen thus far is the parts that they are ready to show off to the public at large, and not what they’re telling developers in private. So although we’ve seen some technical details about the graphics API, it’s very clear that we haven’t seen everything DirectX 12 will bring. Even a far as Direct3D is concerned, it’s a reasonable bet right now that Microsoft will have some additional functionality in the works – quite possibly functionality relating to next-generation GPUs – that will be revealed as the API is closer to completion.

But even without a complete picture, Microsoft has certainly released enough high level and low level information for us to get a good look at what they have planned; and based on what we’re seeing we have every reason to be excited. A lot of this is admittedly a rehash of we’ve said several months ago when Mantle was unveiled, but then again if Direct3D 12 and Mantle are as similar as some developers are hinting, then there may not be very many differences to discuss.

The potential for improved performance in PC graphics is clear, as are the potential benefits to multi-platform developers. A strong case has been laid out by AMD, and now Microsoft, NVIDIA, and Intel that we need a low level graphics API to better map to the capabilities of today’s GPUs and CPUs. Direct3D 12 in turn will be the common API needed to bring those benefits to everyone at once, as only a common API can do.

It’s important to be exceedingly clear that at least for the first phase the greatest benefits are on the CPU side and not the GPU side – something we’ve already seen in practice with Mantle – so the benefits in GPU-bound scenarios will not be as great at first. But in the long run this means changing how the GPU itself is fed work and how that work is processed, so through features such as descriptor heaps the door to improved GPU efficiency is at least left open. But since we are facing an increasing gap between GPU performance and single-threaded CPU performance, even just the CPU bottlenecking reductions alone can be worth it as developers look to push larger and larger batches.

Finally, while I feel it’s a bit too early to say anything definitive, I do want to close with the question of what this means for AMD’s Mantle. For low level PC graphics APIs Mantle will be the only game in town for the next 18-20 months; but after that, then what? If nothing else Mantle is an incredibly important public proving ground for the benefits of low level graphics APIs, so even if Direct3D 12 were to supplant Mantle, Mantle has done its job. But I’m nowhere close to declaring Mantle’s fate yet, as we only have a handful of details on Direct3D 12 and Mantle itself is still in beta. Does Mantle continue alongside Direct3D 12, an easy target for porting since the two APIs are (apparently) so similar? Does Mantle disappear entirely? Or does AMD take Mantle and make it an open API, setting it up against Direct3D 12 in a similar manner as OpenGL sits against Direct3D 11 today? I imagine AMD already has a plan in mind, but that will be a discussion for another day…

Log in

Don't have an account? Sign up now