Original Link: http://www.anandtech.com/show/2716
No matter whether we've got a low end or high end system, we all expect the realtime 3D revolution to continue until we achieve near parity with reality. The push forward is backed by many factors including pure hardware performance and brilliant advances in techniques for better approximating what we see. But there's another side to the equation beyond just hardware and developers: there is the graphics API.
Unlike CPUs, graphics hardware (GPUs) do not have a common instruction set upon which tools and software can be built. In order to get the power of the hardware out to the public, we need a common interface that works no matter what GPU is underneath. It's left to the graphics hardware designer to take the code generated by this application programming interface (API) and translate it into something that their chip can use. Because it's the developer's single point of contact, the graphics API is incredibly important. It defines how much flexibility programmers have in using hardware and shapes the world of high performance realtime 3D graphics.
Some of the key work done through the graphics API is taking descriptions of 3D objects in a 3D world, sending those objects and other resources to the hardware, and then telling the hardware what to do with them. There is sort of a step by step process that needs to be followed that we generally call a pipeline. Graphics API pipelines have stages where different work is done. Here's the general structure of a 3D graphics pipeline:
First vertex data (information about the position of the corners of shapes) is taken in and processed. Then those shapes can then be further manipulated and re-processed if needed. After this, 3D objects are broken down from 3D shapes by projecting them into 2D fragments called pixels (this step is called rasterization), and then these pixels are each processed by looking up texture information and using lighting techniques and so on. When pixels are finished processing, they are output and displayed on the screen. And that's the mile high overview of how 3D graphics work.
For the past dozen years (it seems longer doesn't it?), we've seen makers of 3D graphics hardware accelerate two very prominent APIs: OpenGL and DirectX.
We recently touched on advancements tangential to OpenGL in our OpenCL article, but today our focus will be on DirectX. Microsoft's DirectX graphics API is much more heavily used in game engines than OpenGL, in a large part because DirectX tends to move much more quickly and sets the bar for both the hardware and DirectX in terms of feature set and flexibility. That always makes upcoming versions of DirectX exciting to talk about: they define the future capabilities of hardware and expose improved tools to developers. Upcoming DirectX versions are glimpses into our graphical future. Currently we have a lot of DirectX 9 and DirectX 10 games available and in development, but DirectX 11 looms on the horizon.
As usual, Microsoft will be trying to time the release of their next DirectX revision with the release of compatible graphics hardware. As with last time, DirectX 11 will also be released with Windows 7. With the Windows 7 Beta already under way, we expect the OS to be done some time this year.
Microsoft has been rather aggressive with Windows 7 scheduling in light of the rejection of Vista, so it appears they are stepping up to the plate to get everything out sooner rather than later. There was a little more than 4 years between the release of DirectX 9 and DirectX 10. As it hit the streets with Vista in January of 2007, DirectX 10 has just turned 2 and we are already anticipating it's replacement in the very near future. As we will learn, this speedy transition should be very good for DirectX 11 adoption as DirectX 10 hasn't even become pervasive yet: many games are still DirectX 9 only.
But let's take a closer look at what we are talking about before we go any further.
Introducing DirectX 11: The Pipeline and Features
This is DirectX 10.
We all remember him from our G80 launch article back in the day when no one knew how much Vista would really suck. Some of the shortfalls of DirectX 10 have been in operating system support, driver support, time to market issues, and other unfortunate roadblocks that kept developers from making full use of all the cool new features and tools DirectX 10 brought.
Meet DirectX 11.
She's much cooler than her older brother, and way hotter too. Many under-the-hood enhancements mean higher performance for features available but less used under DX10. The major changes to the pipeline mark revolutionary steps in graphics hardware and software capabilities. Tessellation (made up of the hull shader, tessellator and domain shader) and the Compute Shader are major developments that could go far in assisting developers in closing the gap between reality and unreality. These features have gotten a lot of press already, but we feel the key to DirectX 11 adoption (and thus exploitation) is in some of the more subtle elements. But we'll get in to all that in due time.
Along with the pipeline changes, we see a whole host of new tweaks and adjustments. DirectX 11 is actually a strict superset of DirectX 10.1, meaning that all of those features are completely encapsulated in and unchanged by DirectX 11. This simple fact means that all DX11 hardware will include the changes required to be DX 10.1 compliant (which only AMD can claim at the moment). In addition to these tweaks, we also see these further extensions:
While changes in the pipeline allow developers to write programs to accomplish different types of tasks, these more subtle changes allow those programs to be more complex, higher quality, and/or higher performance. Beyond all this, Microsoft has also gone out of its way to help make parallel programming a little bit easier for game developers.
From Evolution to Expansion and Multi-Threading: The Mile High Overview
The November DirectX SDK update was the first to include some DirectX 11 features for developers to try out. Of course, there is no DX11 hardware yet, but what is included will run on the current DX10 setup with DX10 hardware under Vista and the beta Windows 7. This combined with the fact that Khronos finished the OpenCL specification last month mark two major developments on the path to more general purpose computing on the GPU. Of course, DX11 is more geared toward realtime 3D and OpenCL is targeted at real general purpose data parallel programming (across multiple CPUs and GPUs) distinct from graphics, but these two programming APIs are major milestones in the future history of computing.
There is more than just the compute shader included in DX11, and since our first real briefing about it at this year's NVISION, we've had the chance to do a little more research, reading slides and listening to presentations from SIGGRAPH and GameFest 2008 (from which we've included slides to help illustrate this article). The most interesting things to us are more subtle than just the inclusion of a tessellator or the addition of the Compute Shader, and the introduction of DX11 will also bring benefits to owners of current DX10 and DX10.1 hardware, provided AMD and NVIDIA keep up with appropriate driver support anyway.
Many of the new aspects of DirectX 11 seem to indicate to us that the landscape is ripe for a fairly quick adoption, especially if Microsoft brings Windows 7 out sooner rather than later. There have been adjustments to the HLSL (high-level shader language) that should make it much more attractive to developers, the fact that DX10 is a subset of DX11 has some good transitional implications, and changes that make parallel programming much easier should all go a long way to helping developers pick up the API quickly. DirectX 11 will be available for Vista, so there won't be as many complications from a lack of users upgrading, and Windows 7 may also inspire Windows XP gamers to upgrade, meaning a larger install base for developers to target as well.
The bottom line is that while DirectX 10 promised features that could bring a revolution in visual fidelity and rendering techniques, DirectX 11 may actually deliver the goods while helping developers make the API transition faster than we've seen in the past. We might not see techniques that take advantage of the exclusive DirectX 11 features right off the bat, but adoption of the new version of the API itself will go a long way to inspiring amazing advances in realtime 3D graphics.
From DirectX 6 through DirectX 9, Microsoft steadily evolved their graphics programming API from a fixed function vehicle for setting state and moving data structures around to a rich, programmable environment enabling deep control of graphics hardware. The step from DX9 to DX10 was the final break in the old ways, opening up and expanding on the programmability in DX9 to add more depth and flexibility enabled by newer hardware. Microsoft also forced a shift in the driver model with the DX10 transition to leave the rest of the legacy behind and try and help increase stability and flexibility when using DX10 hardware. But DirectX 11 is different.
Rather than throwing out old constructs in order to move towards more programmability, Microsoft has built DirectX 11 as a strict superset of DirectX 10/10.1, which enables some curious possibilities. Essentially, DX10 code will be DX11 code that chooses not to implement some of the advanced features. On the flipside, DX11 will be able to run on down level hardware. Of course, all of the features of DX11 will not be available, but it does mean that developers can stick with DX11 and target both DX10 and DX11 hardware without the need for two completely separate implementations: they're both the same but one targets a subset of functionality. Different code paths will be necessary if something DX11 only (like the tessellator or compute shader) is used, but this will still definitely be a benefit in transitioning to DX11 from DX10.
Running on lower spec'd hardware will be important, and this could make the transition from DX10 to DX11 one of the fastest we have ever seen. In fact, with lethargic movement away from DX9 (both by developers and consumers), the rush to bring out Windows 7, and slow adoption of Vista, we could end up looking back at DX10 as merely a transitional API rather than the revolutionary paradigm shift it could have been. Of course, Microsoft continues to push that the fastest route to DX11 is to start developing DX10.1 code today. With DX11 as a superset of DX10, this is certainly true, but developer time will very likely be better spent putting the bulk of their effort into a high quality DX9 path with minimal DX10 bells and whistles while saving the truly fundamental shifts in technique made possible by DX10 for games targeted at the DX11 hardware and timeframe.
We are especially hopeful about a faster shift to DX11 because of the added advantages it will bring even to DX10 hardware. The major benefit I'm talking about here is multi-threading. Yes, eventually everything will need to be drawn, rasterized, and displayed (linearly and synchronously), but DX11 adds multi-threading support that allows applications to simultaneously create resources or manage state and issue draw commands, all from an arbitrary number of threads. This may not significantly speed up the graphics subsystem (especially if we are already very GPU limited), but this does increase the ability to more easily explicitly massively thread a game and take advantage of the increasing number of CPU cores on the desktop.
With 8 and 16 logical processor systems coming soon to a system near you, we need developers to push beyond the very coarse grained and heavy threads they are currently using that run well on two core systems. The cost/benefit of developing a game that is significantly assisted by the availability of more than two cores is very poor at this point. It is too difficult to extract enough parallelism to matter on quad core and beyond in most video games. But enabling simple parallel creation of resources and display lists by multiple threads could really open up opportunities for parallelizing game code that would otherwise have remained single threaded. Rather than one thread to handle all the DX state change and draw calls (or very well behaved and heavily synchronized threads sharing the responsibility), developers can more naturally create threads to manage types or groups of objects or parts of a world, opening up the path to the future where every object or entity can be managed by it's own thread (which would be necessary to extract performance when we eventually expand into hundreds of logical cores).
The fact that Microsoft has planned multi-threading support for DX11 games running on DX10 hardware is a major bonus. The only caveat here is that AMD and NVIDIA will need to do a little driver work for their existing DX10 hardware to make this work to its fullest extent (it will "work" but not as well even without a driver change). Of course, we expect that NVIDIA and especially AMD (as they are also a multi-core CPU company) will be very interested in making this happen. And, again, this provides major incentives for game developers to target DX11 even before DX11 hardware is widely available or deployed.
All this is stacking up to make DX11 look like the go-to technology. The additions to and expansions of DX10, the timing, and the ability to run on down level hardware could create a perfect storm for a relatively quick uptake. By relatively quick, we are still looking at years for pervasive use of DX11, but we expect that the attractiveness of the new features and benefit to the existing install base will provide a bigger motivation for game developers to transition than we've seen before.
If only Microsoft would (and could) back-port DX11 to Windows XP, there would be no reason for game developers to maintain legacy code paths. I know, I know, that'll never (and can't by design) happen. While we wholeheartedly applaud the idea of imposing strict minimum requirements on hardware for a new operating system, unnecessarily cutting off an older OS at the knees is not the way to garner support. If Windows 7 ends up being a more expensive Vista in a shiny package, we may still have some pull towards DX9, especially for very mainstream or casual games that tend to lag a bit anyway (and as some readers have pointed out because consoles will still be DX9 for the next few years). It's in these incredibly simple but popular games and console games that the true value of amazing realtime 3D graphics could be brought to the general computing populous, but craptacular low end hardware and limiting API accessibility on popular operating systems further contribute to the retardation of graphics in the mainstream.
But that's the overview. Let's take some time to drill down a bit further into some of the technology.
Drilling Down: DX11 And The Multi-Threaded Game Engine
In spite of the fact that multi-threaded programming has been around for decades, mainstream programmers didn't start focusing on parallel programming until multi-core CPUs started coming along. Much general purpose code is straightforward as a single thread; extracting performance via parallel programming can be difficult and isn't always obvious. Even with talented programmers, Amdahl's Law is a bitch: your speed up from parallelization is limited by the percent of code that is necessarily sequential.
Currently, in game development, rendering is one of those "necessarily" sequential tasks. DirectX 10 isn't set up to appropriately handle multiple threads all throwing commands at the GPU. That doesn't mean parallelization of renderers can't happen, but it does limit speed up because costly synchronization techniques or management threads need to be implemented in order to make sure nothing steps out of line. All this limits the benefit of parallelization and discourages programmers from trying too hard. After all, it's a better idea to put more of your effort into areas where performance can be improved more significantly. (John Carmack put it really well once, but I can't remember the quote... and I'm doing too much benchmarking to go look for it now. :-P)
No matter what anyone does, some stuff in the renderer will need to be sequential. Programs, textures, and resources must be loaded up; geometry happens before pixel processing; draw calls intended to be executed while a certain state is active must have that state set first and not changed until completion. Even in such a massively parallel machine, order must be maintained for many things. But order doesn't always matter.
Making more things thread-safe through an extended device interface using multiple contexts and making a lot of synchronization overhead the responsibility of the API and/or graphics driver, Microsoft has enabled game developers to more easily and effortlessly thread not only their rendering code, but their game code as well. These things will also work on DX10 hardware running on a system with DX11, though some missing hardware optimizations will reduce the performance benefit. But the fundamental ability to write code differently will go a long way to getting programmers more used to and better at parallelization. Let's take a look at the tools available to accomplish this in DX11.
First up is free threaded asynchronous resource loading. That's a bit of a mouthful, but this feature gives developers the ability to upload programs, textures, state objects, and all resources in a thread-safe way and, if desired, concurrent with the rendering process. This doesn't mean that all this stuff will get pushed up in parallel with rendering, as the driver will manage what gets sent to the GPU and when based on priority, but it does mean the developer no longer has to think about synchronizing or manually prioritizing resource loading. Multiple threads can start loading whatever resources they need whenever they need them. The fact that this can also be done concurrently with rendering could improve performance for games that stream in data for massive open worlds in addition to enabling multi-threaded opportunities.
In order to enable this and other threading, the D3D device interface is now split into three separate interfaces: the Device, the Immediate Context, and the Deferred Context. Resource creation is done through the Device. The Immediate Context is the interface for setting device state, draw calls, and queries. There can only be one Device and one Immediate Context. The Deferred Context is another interface for state and draw calls, but many can exist in one program and can be used as the per-thread interface (Deferred Contexts themselves are thread unsafe though). Deferred Contexts and the free threaded resource creation through the device are where DX11 gets it multi-threaded benefit.
Multiple threads submit state and draw calls to their Deferred Context which complies a display list that is eventually executed by the Immediate Context. Games will still need a render thread, and this thread will use the Immediate Context to execute state and draw calls and to consume the display lists generated by Deferred Contexts. In this way, the ultimate destination of all state and draw calls is the Immediate Context, but fine grained synchronization is handled by the API and the display driver so that parallel threads can be better used to contribute to the rendering process. Some limitations on Deferred Contexts include the fact that they cannot query the device and they can't download or read back anything from the GPU. Deferred Contexts can, however, consume the display lists generated by other Deferred Contexts.
The end result of all this is that the future will be more parallel friendly. As two and four core CPUs become more and more popular and 8 and 16 (logical) core CPUs are on the horizon, we need all the help we can get when trying to extract performance from parallelism. This is a good move for DirectX and we hope it will help push game engines to more fully utilize more than two or even four cores when the time comes.
Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL
Many developers are excited about the added flexibility of the Compute Shader (also referred to as the CS). This addition to the pipeline steps further from a render-centric API and enables more general purpose algorithms. We see added flexibility in both the type of operations that can be preformed on data and the type of data that can be operated on.
In other pipeline stages, we see limitations imposed that are designed to speed up execution that get in the way of general purpose code. Although we can shoehorn general purpose algorithms into a pixel shader program, we don't have the freedom to use data structures like trees, sharing data between pixels (and thus threads) is difficult and costly, and we have to go through the motions of drawing triangles and mapping solutions onto this.
Enter DirectX11 and the CS. Developers have the option to pass data structures over to the Compute Shader and run more general purpose algorithms on them. The Compute Shader, like the other fully programmable stages of the DX10 and DX11 pipeline, will share a single set of physical resources (shader processors).
This hardware will need to be a little more flexible than it currently is as when it runs CS code it will have to support random reads and writes and irregular arrays (rather than simple streams or fixed size 2D arrays), multiple outputs, direct invocation of individual or groups of threads as per the programmer's needs, 32k of shared register space and thread group management, atomic instructions, synchronization constructs, and the ability to perform unordered IO operations.
At the same time, the CS loses some features as well. As each thread is no longer treated as a pixel, so the association with geometry is lost (unless specifically passed in a data structure). This means that, although CS programs can still use texture samplers, automatic trilinear LOD calculations are not automatic (LOD must be specified). Additionally, depth culling, anti-aliasing, alpha blending, and other operations that have no meaning to generic data cannot be performed inside a CS program.
The type of new applications opened up by the CS are actually infinite, but the most immediate interest will come from game developers looking to augment their graphics engines with fancy techniques not possible in the Pixel Shader. Some of these applications include A-Buffer techniques to allow very high quality anti-aliasing and order independent transparency, more advanced deferred shading techniques, advanced post processing effects and convolution, FFTs (fast Fourier transforms) for frequency domain operations, and summed area tables.
Beyond the rendering specific applications, game developers may wish to do things like IK (inverse kinematics), physics, AI, and other traditionally CPU specific tasks on the GPU. Having this data on the GPU by performing calculations in the CS means that the data is more quickly available for use in rendering and some algorithms may be much faster on the GPU as well. It might even be an option to run things like AI or physics on both the GPU and the CPU if algorithms that always yield the same result on both types of processors can be found (which would essentially substitute compute power for bandwidth).
Even though the code will run on the same hardware, PS and CS code will perform very differently based on the algorithms being implemented. One of the interesting things to look at is exposure and histogram data often used in HDR rendering. Calculating this data in the PS requires several passes and tricks to take all the pixels and either bin them or average them. Despite the fact that sharing data is going to slow things down quite a bit, sharing data can be much faster than running many passes and this makes the CS an ideal stage for such algorithms.
A while back we took a look at OpenCL, and we know that OpenCL will be able to share data structures with OpenGL. We haven't yet gotten a developer's take on comparing OpenCL and the DX11 CS, but at first blush it seems that the possibilities opened up for game developers and graphics processing with DX11 and the Compute Shader will also be possible with OpenGL+OpenCL. Although the CS can be used as a general purpose hardware accelerated GPU computing interface, OpenCL is targeted more at that arena and its independence from Microsoft and DirectX will likely mean wider adoption as a GPU compute language for general purpose tasks.
The use of OpenGL has declined significantly in the game developer community over the last five years. While OpenCL may enable DX11 like applications to be written in combination with OpenGL, it is more likely that this will be the venue of workstation applications like CAD/CAM and simulations that require visualization. While I'm a fan of OpenGL myself, I don't see the flexibility of OpenCL as a significant boon to its adoption in game engines.
So What's a Tessellator?
This has been covered before now in other articles about DirectX 11, but we first touched on the subject with the R600 launch. Both R6xx and R7xx hardware have tessellators, but since these are proprietary implementations, they won't be directly compatible with DirectX 11 which uses a much more sophisticated setup. While neither AMD nor the DX11 tessellator itself are programmable, DX11 includes programmable input to and output from the tesselator (TS) through two additional pipeline stages called the Hull Shader (HS) and the Domain Shader (DS).
The tessellator can take coarse shapes and break them up into smaller parts. It can also take these smaller parts and reshape them to form geometry that is much more complex and that more closely approximates reality. It can take a cube and turn it into a sphere with very little overhead and much fewer space requirements. Quality, performance and manageability benefit.
The Hull Shader takes in patches and control points out outputs data on how to configure the tessellator. Patches are a new primitive (like vertices and pixels) that define a segment of a plane to be tessellated. Control points are used to define the parametric shape of the desired surface (like a curve or something). If you've ever used the pen tool in Photoshop, then you know what control points are: these just apply to surfaces (patches) instead of lines. The Hull Shader uses the control points to determine how to set up the tessellator and then passes them forward to the Domain Shader.
The tessellator just tessellates: it breaks up patches fed to it by the Hull Shader based on the parameters set by the Hull shader per patch. It outputs a stream of points to the Domain Shader, which then needs to finish up the process. While programmers must write HS programs for their code, there isn't any programming required for the TS. It's just a fixed function block that processes input based on parameters.
The Domain Shader takes points generated by the tessellator and manipulates them to form the appropriate geometry based on control points and/or displacement maps. It performs this manipulation by running developer designed DS programs which can manipulate how the newly generated points are further shifted or displaced based on control points and textures. The Domain Shader, after processing a point, outputs a vertex. These vertices can be further processed by a Geometry Shader, which can also feed them back up to the Vertex Shader using stream out functionality. More likely than heading back up for a second pass, we will probably see most output of the Domain Shader head straight on to rasterization so that its geometry can be broken down into screen space fragments for Pixel Shader processing.
That covers what the basics of what the tesselator can do and how it does it. But do you find your self wondering: "self, can't the Geometry Shader just be used to create tessellated surfaces and move the resulting vertices around?" Well, you would be right. That is technically possible, but not practical at this point. Let's dive into that a bit more.
Tessellation: Because The GS Isn't Fast Enough
Microsoft and AMD tend to get the most excited about tessellation whenever the topic of DX11 comes up. AMD jumped on the tessellation bandwagon long ago, and perhaps it does make sense for consoles like the XBox 360. Adding fixed function hardware to quickly and efficiently handle a task that improves memory footprint has major advantages in the living room. We still aren't sold on the need for a tessellator on the desktop, but who's to argue with progress?
Or is it really progressive? The tessellator itself is fixed function rather than programmable. Sure, the input to and output of the tessellator can be manipulated a bit through the Hull Shader and Domain Shader, but the heart of the beast is just not that flexible. The Geometry Shader is the programmable block in the pipeline that is capable of tessellation as well as much more, but it just doesn't have the power to do tessellation on any useful scale. So while most everything has been moving towards programmability in the rendering pipe, we have sort of a step backward here. But why?
The argument between fixed function and programmable hardware is always one of performance versus flexibility and usefulness. In the beginning, fixed function was necessary to get the desired performance. As time went on, it became clear that adding in more fixed function hardware to graphics chips just wasn't feasible. The transistors put into specialized hardware just go unused if developers don't program to take advantage of it. This made a shift toward architectures where expanding the pool of compute resources that could be shared and used for many different tasks became a much more attractive way to go. In the general case anyway. But that doesn't mean that fixed function hardware doesn't have it's place.
We do still have the problem that all the transistors put into the tessellator are worthless unless developers take advantage of the hardware. But the reason it makes sense is that the ROI (return on investment: what you get for what you put in) on those transistors is huge if developers do take advantage of the hardware: it's much easier to get huge tessellation performance out of a fixed function tessellator than to put the necessary resources into the Geometry Shader to allow it to be capable of the same tessellation performance programmatically. This doesn't mean we'll start to see a renaissance of fixed function blocks in our graphics hardware; just that significantly advanced features going forward may still require the sacrifice of programability in favor of early adoption of a feature. The majority of tasks will continue to be enabled in a flexible programmable way, and in the future we may see more flexibility introduced into the tessellator until it becomes fully programmable as well (or ends up just being merged into some future version of the Geometry Shader).
Now don't let this technical assessment of fixed function tessellation make you think we aren't interested in reaping the benefits of the tessellator. Currently, artists need to create different versions of their objects for different LODs (Level of Detail -- reducing or increasing complexity as the object moves further or nearer the viewer), and geometry simulation through texturing at each LOD needs to be done by pixel shaders. This requires extra work from both artists and programmers and costs a good bit in terms of performance. There are also some effects than can only be done with more geometry.
Tessellation is a great way to get that geometry in there for more detail, shadowing, and smooth edges. High geometry also allows really cool displacement mapping effects. Currently, much geometry is simulated through textures and techniques like bump mapping or parallax occlusion mapping or some other technique. Even with high geometry, we will want to have large normal maps for our lighting algorithms to use, but we won't need to do so much work to make things like cracks, bumps, ridges, and small detail geometry appear to be there when it isn't because we can just tessellate and displace in a single pass through the pipeline. This is fast, efficient, and can produce very detailed effects while freeing up pixel shader resources for other uses. With tessellation, artists can create one sub division surface that can have a dynamic LOD free of charge; a simple hull shader and a displacement map applied in the domain shader will save a lot of work, increase quality, and improve performance quite a bit.
If developers adopt tessellation, we could see cool things, and with the move to DX11 class hardware both NVIDIA and AMD will be making parts with tessellation capability. But we may not see developers just start using tessellation (or the compute shader for that matter) right away. Because DirectX 11 will run on down level hardware and at the release of DX11 we will already have a huge number cards on the market capable of running a subset of DX11 bringing with it a better, more refined, programming language in the new version of HLSL and seamless parallelization optimizations, we will very likely see the first DX11 games only implementing features that can run completely on DX10 hardware.
Of course, at that point developers can be fully confident of exploiting all the aspects of DX10 hardware, which they still aren't completely taking advantage of. Many people still want and need a DX9 path because of Vista's failure, which means DX10 code tends to be more or less an enhanced DX9 path rather than something fundamentally different. So when DirectX 11 finally debuts, we will start to see what developers could really do with DX10.
Certainly there will be developers experimenting with tessellation, but these will probably just be simple amplification to get rid of those jagged edges around curved surfaces at first. It will take time for the real advanced tessellation techniques everyone is excited about to come to fruition.
One Last Thing and Closing Thoughts
The final bit of DX11 we'll touch on is the update to HLSL (MS's High Level Shader Language) in version 5.0 which brings some very developer friendly adjustments. While HLSL has always been similar in syntax to C, 5.0 adds support for classes and interfaces. We still don't get to use pointers though.
These changes are being made because of the sheer size of shader code. Programmers and artists need to build or generate either a single massive shader or tons of smaller shader programs for any given game. These code resources are huge and can be hard to manage without OOP (Object Oriented Programming) constructs. But there are some differences to how things work in other OOP languages. For instance, there is no need for memory management (because there are no pointers) or constructors / destructors in HLSL. Tasks like initialization are handled through updates to constant buffers, which generally reflect member data.
Aside from the programmability aspect, classes and interfaces were added to support dynamic shader linkage to combat the intricacy of developing with huge numbers of resources and effects. Dynamic linking allows the application to decide at runtime what shaders to compile and link and enables interfaces to be left ambiguous until runtime. At runtime, shaders are dynamically linked and based on what is linked all possible function bodies are then compiled and optimized. Compiled hardware-native code isn't inlined until the appropriate SetShader function is called.
The flexibility this provides will enable development of much more complex and dynamic shader code, as it won't all need to be in one giant block with lots of "ifs", nor will there need to be thousands of smaller shaders cluttering up the developers mind. Performance of the shaders will still limit what can be done, but with this step DirectX helps reduce code complexity as a limiting factor in development.
With all of this - the ability to perform unordered memory accesses, multi-threading, tessellation, and the Compute Shader - DX11 is pretty aggressive. The complexity of the upgrade, however, is mitigated by the fact that this is nothing like the wholesale changes made in the move from DX9 to DX10: DX11 is really just a superset of DX10 in terms of features. This enables the ability for DX11 to run on down-level hardware (where DX11 specific features are not used), which when combined with the enhancements to HLSL with OOP and dynamic shader linking mean that developers should really have fewer qualms about moving from DX10 to DX11 than we saw with the transition from DX9. (Of course, that's nothing new: the first DX8 games shipped when DX9 was out, and it wasn't until DX10 that we saw a reasonable number of DX9 titles.)
To be fair, the OS upgrade requirement also threw a wrench in the gears. That won't be a problem this time, as Vista still sucks but will be getting DX11 support and Windows 7 looks like a better upgrade option for XP users than Vista. Developers who haven't already moved from DX9 may well skip DX10 altogether in favor of DX11 depending on the predicted ship dates of their titles; all signs point to DX11 as setting the time frame when we start to see the revolution promised with the move to DX10 take place. Developers have had time to familiarize themselves with the extended advantages of programmability offered by DX10, coding for DX11 will be much easier though OOP constructs and multi-threaded support, and if the features don't entice them, the ability to run on down-level hardware with a better coding environment might just seal the deal.
I'm still an OpenGL developer at this point, and I've dabbled a bit with DirectX at times. But DirectX 11 (and my disappointment with OpenGL 3.0) mark the first time I think I might actually make the switch. The first preview of DX11 is already available in the latest DX SDK. When I've got time I'll have to download it and get started. Hopefully the implementation is as attractive as the pitch. Wish me luck.