Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL
Many developers are excited about the added flexibility of the Compute Shader (also referred to as the CS). This addition to the pipeline steps further from a render-centric API and enables more general purpose algorithms. We see added flexibility in both the type of operations that can be preformed on data and the type of data that can be operated on.
In other pipeline stages, we see limitations imposed that are designed to speed up execution that get in the way of general purpose code. Although we can shoehorn general purpose algorithms into a pixel shader program, we don't have the freedom to use data structures like trees, sharing data between pixels (and thus threads) is difficult and costly, and we have to go through the motions of drawing triangles and mapping solutions onto this.
Enter DirectX11 and the CS. Developers have the option to pass data structures over to the Compute Shader and run more general purpose algorithms on them. The Compute Shader, like the other fully programmable stages of the DX10 and DX11 pipeline, will share a single set of physical resources (shader processors).
This hardware will need to be a little more flexible than it currently is as when it runs CS code it will have to support random reads and writes and irregular arrays (rather than simple streams or fixed size 2D arrays), multiple outputs, direct invocation of individual or groups of threads as per the programmer's needs, 32k of shared register space and thread group management, atomic instructions, synchronization constructs, and the ability to perform unordered IO operations.
At the same time, the CS loses some features as well. As each thread is no longer treated as a pixel, so the association with geometry is lost (unless specifically passed in a data structure). This means that, although CS programs can still use texture samplers, automatic trilinear LOD calculations are not automatic (LOD must be specified). Additionally, depth culling, anti-aliasing, alpha blending, and other operations that have no meaning to generic data cannot be performed inside a CS program.
The type of new applications opened up by the CS are actually infinite, but the most immediate interest will come from game developers looking to augment their graphics engines with fancy techniques not possible in the Pixel Shader. Some of these applications include A-Buffer techniques to allow very high quality anti-aliasing and order independent transparency, more advanced deferred shading techniques, advanced post processing effects and convolution, FFTs (fast Fourier transforms) for frequency domain operations, and summed area tables.
Beyond the rendering specific applications, game developers may wish to do things like IK (inverse kinematics), physics, AI, and other traditionally CPU specific tasks on the GPU. Having this data on the GPU by performing calculations in the CS means that the data is more quickly available for use in rendering and some algorithms may be much faster on the GPU as well. It might even be an option to run things like AI or physics on both the GPU and the CPU if algorithms that always yield the same result on both types of processors can be found (which would essentially substitute compute power for bandwidth).
Even though the code will run on the same hardware, PS and CS code will perform very differently based on the algorithms being implemented. One of the interesting things to look at is exposure and histogram data often used in HDR rendering. Calculating this data in the PS requires several passes and tricks to take all the pixels and either bin them or average them. Despite the fact that sharing data is going to slow things down quite a bit, sharing data can be much faster than running many passes and this makes the CS an ideal stage for such algorithms.
A while back we took a look at OpenCL, and we know that OpenCL will be able to share data structures with OpenGL. We haven't yet gotten a developer's take on comparing OpenCL and the DX11 CS, but at first blush it seems that the possibilities opened up for game developers and graphics processing with DX11 and the Compute Shader will also be possible with OpenGL+OpenCL. Although the CS can be used as a general purpose hardware accelerated GPU computing interface, OpenCL is targeted more at that arena and its independence from Microsoft and DirectX will likely mean wider adoption as a GPU compute language for general purpose tasks.
The use of OpenGL has declined significantly in the game developer community over the last five years. While OpenCL may enable DX11 like applications to be written in combination with OpenGL, it is more likely that this will be the venue of workstation applications like CAD/CAM and simulations that require visualization. While I'm a fan of OpenGL myself, I don't see the flexibility of OpenCL as a significant boon to its adoption in game engines.