Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL

Many developers are excited about the added flexibility of the Compute Shader (also referred to as the CS). This addition to the pipeline steps further from a render-centric API and enables more general purpose algorithms. We see added flexibility in both the type of operations that can be preformed on data and the type of data that can be operated on.

In other pipeline stages, we see limitations imposed that are designed to speed up execution that get in the way of general purpose code. Although we can shoehorn general purpose algorithms into a pixel shader program, we don't have the freedom to use data structures like trees, sharing data between pixels (and thus threads) is difficult and costly, and we have to go through the motions of drawing triangles and mapping solutions onto this.

Enter DirectX11 and the CS. Developers have the option to pass data structures over to the Compute Shader and run more general purpose algorithms on them. The Compute Shader, like the other fully programmable stages of the DX10 and DX11 pipeline, will share a single set of physical resources (shader processors).

This hardware will need to be a little more flexible than it currently is as when it runs CS code it will have to support random reads and writes and irregular arrays (rather than simple streams or fixed size 2D arrays), multiple outputs, direct invocation of individual or groups of threads as per the programmer's needs, 32k of shared register space and thread group management, atomic instructions, synchronization constructs, and the ability to perform unordered IO operations.

At the same time, the CS loses some features as well. As each thread is no longer treated as a pixel, so the association with geometry is lost (unless specifically passed in a data structure). This means that, although CS programs can still use texture samplers, automatic trilinear LOD calculations are not automatic (LOD must be specified). Additionally, depth culling, anti-aliasing, alpha blending, and other operations that have no meaning to generic data cannot be performed inside a CS program.

The type of new applications opened up by the CS are actually infinite, but the most immediate interest will come from game developers looking to augment their graphics engines with fancy techniques not possible in the Pixel Shader. Some of these applications include A-Buffer techniques to allow very high quality anti-aliasing and order independent transparency, more advanced deferred shading techniques, advanced post processing effects and convolution, FFTs (fast Fourier transforms) for frequency domain operations, and summed area tables.

Beyond the rendering specific applications, game developers may wish to do things like IK (inverse kinematics), physics, AI, and other traditionally CPU specific tasks on the GPU. Having this data on the GPU by performing calculations in the CS means that the data is more quickly available for use in rendering and some algorithms may be much faster on the GPU as well. It might even be an option to run things like AI or physics on both the GPU and the CPU if algorithms that always yield the same result on both types of processors can be found (which would essentially substitute compute power for bandwidth).

Even though the code will run on the same hardware, PS and CS code will perform very differently based on the algorithms being implemented. One of the interesting things to look at is exposure and histogram data often used in HDR rendering. Calculating this data in the PS requires several passes and tricks to take all the pixels and either bin them or average them. Despite the fact that sharing data is going to slow things down quite a bit, sharing data can be much faster than running many passes and this makes the CS an ideal stage for such algorithms.

A while back we took a look at OpenCL, and we know that OpenCL will be able to share data structures with OpenGL. We haven't yet gotten a developer's take on comparing OpenCL and the DX11 CS, but at first blush it seems that the possibilities opened up for game developers and graphics processing with DX11 and the Compute Shader will also be possible with OpenGL+OpenCL. Although the CS can be used as a general purpose hardware accelerated GPU computing interface, OpenCL is targeted more at that arena and its independence from Microsoft and DirectX will likely mean wider adoption as a GPU compute language for general purpose tasks.

The use of OpenGL has declined significantly in the game developer community over the last five years. While OpenCL may enable DX11 like applications to be written in combination with OpenGL, it is more likely that this will be the venue of workstation applications like CAD/CAM and simulations that require visualization. While I'm a fan of OpenGL myself, I don't see the flexibility of OpenCL as a significant boon to its adoption in game engines.

Drilling Down: DX11 And The Multi-Threaded Game Engine So What's a Tessellator?
Comments Locked

109 Comments

View All Comments

  • epyon96 - Sunday, February 1, 2009 - link

    That's very insightful. Can you go into more detail?

    I am confused because there appeared to be significant differences between Dx9C and Dx9B since NVidia made it sound like the difference was like the difference between Dx8.1/2/3 and Dx8.4 which did seem very significant if memory serves me right.

    The difference between 8.4 and 9 seemed minimal in quality of the final output.
  • GourdFreeMan - Monday, February 2, 2009 - link

    The guidelines I spoke of were mentioned on the MSDN Forums circa 2003 regarding how changes to Direct3D would affect DirectX versioning, but seem to have been abandoned in favor of the bimonthly SDK updates following the DX 9.0c release. Bimonthly updates led to faster bug fixes, which in prior versions of DirectX sometimes required a letter update.

    If you are interested in the exact technical changes between DirectX versions, I suggest downloading the old SDK versions prior to the move to bimonthly updates and looking at the Changes section of the documentation.

    Regarding the move between DirectX 8 and DirectX 9, Shader Model 2.0 was introduced making way for games such as Far Cry (admittedly Far Cry was a DX 9.0b game, but the changes from 9.0 to 9.0b mainly involved SM 2.0a and SM 2.0b which for Far Cry meant enhanced performance on ATi and nVIDIA cards). Far Cry would later be patched to support DX 9.0c and SM 3.0, adding features like HDR, but I would argue that the unpatched game still looked considerably better than DX8 titles.

    (Incidentally there is no DirectX 8.3 and 8.4 -- there was 8.1a and 8.1b in the progression instead).
  • epyon96 - Saturday, January 31, 2009 - link

    I wish the article had more background on what you just hypothesized (obviously with some substantiated facts) instead instead of the unnecessary vista bashing. It wound satisfy an actual curiosity.

    I remember that's one of the reasons why the in depth analysis of the development cycle of R770 was so well liked.
  • gamerk2 - Saturday, January 31, 2009 - link

    The issue with DX11 is this: You need to supply a DX10 codepath for those who won't update GFX cards (you can't release a game no one has hardware for), but also would need a DX9 codepath for XP.

    Why would anyone release a game with three seperate grpahics code paths? Its for that reason I see a slow use of DX11, as long as XP holds 15-20% market share.
  • ltcommanderdata - Saturday, January 31, 2009 - link

    If I remember those OS market share reports correctly, as of the end of last year Windows XP had about 65% market share, Vista has about 20% after 2 years, and Mac is nearing 10%. Even if Windows 7 is a roaring success, XP just has too much built-up market share to disappear overnight, so XP and DX9 compatibility will be required for at least another 2 years. The other thing that works against Windows 7 is that even if it isn't released until next year, it's introduction looks to be right in the middle of this economic recession, since things probably won't really pick up until late 2010 or 2011. When the economy does pick up again, there will be huge demand as companies finally switch from XP which would be 10 years old by then, but the first year of Windows 7 sales will probably be slow.
  • bobvodka - Saturday, January 31, 2009 - link

    Well, to be fair, you don't have to have a DX10 path and a DX11 path as such. A few important features work on DX10 cards anyway, such as the multi-threaded rendering stuff, so you need a DX11 and a DX9 path at most; you just have to do some feature detection to find out if you are on a DX11, DX10 or DX10.1 card.

    Still a slight pain, but not as much as 3 real code paths.
  • DarkMadMax - Saturday, January 31, 2009 - link

    And main reason is consoles. There are practically no PC exclusives anymore among large budget titles (e.g. the ones who concentrate on graphics) . So all games target xbox360 hardware (if they dont they are ps3 exclusives). So until new generation of consoles appears there will be no progress in graphics. Period

  • haukionkannel - Saturday, January 31, 2009 - link

    To me, this article mostly talks about new features of DX11 and that some fundamental fealtures can benefit allso dx10 and dx10.1 hardware...
    To me it seems that the Vista part was only there to say why there are not any real DX10 games now, even the features are there. I didn't read it as an Vista hate like many people here seems to think of it.
    All in all it was very good article abou how DX11 can allow those promises that DX10 promised to flourish better this time.
  • scruffypup - Saturday, January 31, 2009 - link

    That though this article was supposed to be about DirectX 11, Derek's bias and opinion about Vista overshadowed the subject of the article,...

    This article shows poor writing at its finest,.. afterall doesn't writing 101 teach one to make the article about the subject you are writing about and not something else?

    Again I say,... Derek does a disservice to anandtech with this bias. If you want to put in your bias towards an unrelated subject,.. at least show clearly the links (relevancy) to your intended subject material and how you come by a conclusion to support that claim other than just spouting off needlessly,... for that is what you have done essentially, as it held no relevance to the subject material the way you wrote the article.
  • chizow - Saturday, January 31, 2009 - link

    Article summary:

    1)DX11 offers nothing new over DX10, as quoted in the article its just a strict superset that builds on and adds features to DX10 capability.
    2)Vista and DX10 sucked because no one wanted to use them.

    Derek, like many others I disagree with your assessment of Vista's importance in the overall OS hierarchy, here's just a quick list:

    1) First OS to bring 64-bit support to the mainstream.
    2) First OS to offer multi-threaded driver improvements. Look at Rel 180 and 8.12 Hot Fix, where multi-threaded drivers are all the rage.
    3) First OS to offer DX10 support. We're finally seeing some of the performance benefits we were promised in DX10 with multi-threaded drivers and improved AA with reading of the multi-sample depth buffer.
    4) Much better OS stability compared to XP. It wasn't always the case, but contrary to your article, most of the problems were fixed in July/August with the various video hot fixes (Ryan Smith can probably confirm or deny this).

    I think Win7 just emphasizes how good Vista is, and how many light years ahead both are compared to XP. You could say Win7 is like Mojave SE, not Vista SE, as you can clearly see all the Vista-haters who are running Win7 glowing about all the features and stability they've missed out on for at least a year (since Vista SP1).

Log in

Don't have an account? Sign up now