Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL

Many developers are excited about the added flexibility of the Compute Shader (also referred to as the CS). This addition to the pipeline steps further from a render-centric API and enables more general purpose algorithms. We see added flexibility in both the type of operations that can be preformed on data and the type of data that can be operated on.

In other pipeline stages, we see limitations imposed that are designed to speed up execution that get in the way of general purpose code. Although we can shoehorn general purpose algorithms into a pixel shader program, we don't have the freedom to use data structures like trees, sharing data between pixels (and thus threads) is difficult and costly, and we have to go through the motions of drawing triangles and mapping solutions onto this.

Enter DirectX11 and the CS. Developers have the option to pass data structures over to the Compute Shader and run more general purpose algorithms on them. The Compute Shader, like the other fully programmable stages of the DX10 and DX11 pipeline, will share a single set of physical resources (shader processors).

This hardware will need to be a little more flexible than it currently is as when it runs CS code it will have to support random reads and writes and irregular arrays (rather than simple streams or fixed size 2D arrays), multiple outputs, direct invocation of individual or groups of threads as per the programmer's needs, 32k of shared register space and thread group management, atomic instructions, synchronization constructs, and the ability to perform unordered IO operations.

At the same time, the CS loses some features as well. As each thread is no longer treated as a pixel, so the association with geometry is lost (unless specifically passed in a data structure). This means that, although CS programs can still use texture samplers, automatic trilinear LOD calculations are not automatic (LOD must be specified). Additionally, depth culling, anti-aliasing, alpha blending, and other operations that have no meaning to generic data cannot be performed inside a CS program.

The type of new applications opened up by the CS are actually infinite, but the most immediate interest will come from game developers looking to augment their graphics engines with fancy techniques not possible in the Pixel Shader. Some of these applications include A-Buffer techniques to allow very high quality anti-aliasing and order independent transparency, more advanced deferred shading techniques, advanced post processing effects and convolution, FFTs (fast Fourier transforms) for frequency domain operations, and summed area tables.

Beyond the rendering specific applications, game developers may wish to do things like IK (inverse kinematics), physics, AI, and other traditionally CPU specific tasks on the GPU. Having this data on the GPU by performing calculations in the CS means that the data is more quickly available for use in rendering and some algorithms may be much faster on the GPU as well. It might even be an option to run things like AI or physics on both the GPU and the CPU if algorithms that always yield the same result on both types of processors can be found (which would essentially substitute compute power for bandwidth).

Even though the code will run on the same hardware, PS and CS code will perform very differently based on the algorithms being implemented. One of the interesting things to look at is exposure and histogram data often used in HDR rendering. Calculating this data in the PS requires several passes and tricks to take all the pixels and either bin them or average them. Despite the fact that sharing data is going to slow things down quite a bit, sharing data can be much faster than running many passes and this makes the CS an ideal stage for such algorithms.

A while back we took a look at OpenCL, and we know that OpenCL will be able to share data structures with OpenGL. We haven't yet gotten a developer's take on comparing OpenCL and the DX11 CS, but at first blush it seems that the possibilities opened up for game developers and graphics processing with DX11 and the Compute Shader will also be possible with OpenGL+OpenCL. Although the CS can be used as a general purpose hardware accelerated GPU computing interface, OpenCL is targeted more at that arena and its independence from Microsoft and DirectX will likely mean wider adoption as a GPU compute language for general purpose tasks.

The use of OpenGL has declined significantly in the game developer community over the last five years. While OpenCL may enable DX11 like applications to be written in combination with OpenGL, it is more likely that this will be the venue of workstation applications like CAD/CAM and simulations that require visualization. While I'm a fan of OpenGL myself, I don't see the flexibility of OpenCL as a significant boon to its adoption in game engines.

Drilling Down: DX11 And The Multi-Threaded Game Engine So What's a Tessellator?
POST A COMMENT

109 Comments

View All Comments

  • DerekWilson - Saturday, January 31, 2009 - link

    Hi, thanks for the feedback ... I've already talked the vista issue to death elsewhere in these comments, so I'll skip that, but ...

    3) You are right that to get the most out of DX10 you need renderer designed for DX10 not ported from DX9. At the same time, there are things that can be done to make DX9 stuff faster by using DX10 capabilities that don't require an engine rewrite. Yes this also requires development time, but it seems developers have opted to put time into adding effects with DX10 rather than increasing framerate. I'm not disappointed with that direction, but performance was an option.

    I agree with what you said about OGL.

    4) I know it's not going to happen, but it didn't need to not be possible. MS could have designed the API to expose new hardware featuers without requiring the driver model change. DX9 can run on XP or Vista's new driver model for example. They chose to make it so that it was impossible to back-port rather than designing DX10 (and subsequent versions) to be tied to the driver model. OpenGL exposes most of the featuers of DX10 to WinXP and all of the things that are interesting to graphics programmers).

    5) You are right -- it's not required to support multithreading but it is required if you want a performance benefit from that multithreading.
    Reply
  • bobvodka - Saturday, January 31, 2009 - link

    Well, to be fair, many of my comments were directed at others in the thread :)

    4) The thing is, OpenGL and DX9, specifically D3D9, live in different places. D3D9 had alot more kernel side code which caused expensive switches when issuing certain commands; this is why D3D9 got all that instance draw stuff and OpenGL didnt, because on small batch sizes OpenGL could be between 2.3x and 1.4x quicker at executing the draw call then D3D9. OpenGL sits the otherside of the kernel calls and gives the implimenters more control on when that switch occures. D3D10 also sits the other side, again not wasting that time.

    There were also changes in the resource model, the driver model and various other areas; some of which were to make the Vista windowing system possible. This resource model and everything about it was very much tied to D3D10 and how it does things. OpenGL's resource model was also fundamentally different to the D3D9 model, with again the implimenter having alot more control; you never suffered a 'lost device' in OpenGL for example and the runtime automatically controlled allocated memory unlike in D3D9. There are some things OpenGL could do which D3D9 couldn't as well; such as on-card async memory copies and render-to-vertex buffer (well, ATI had a hack for it but the OpenGL method was cross-hardware).

    Could they have done it? Well, of course they could have done but unlike previous DX updates it would have taken alot more effort, time and money. All for a, at the time, 5 year old OS they were hoping to begin getting rid of.

    Personally, I think this was a good move all in all, the problem was with how it was presented to the general masses who suddenly saw they had to pay a few 100 USD for a new DX version.
    Reply
  • Dribble - Saturday, January 31, 2009 - link

    The reason all our games are DX9c are because that's what the consoles support (yes I know PS3 uses open GL, but it has 9c feature support - basically has 7800GTX in it).
    Most games are made for console as well as PC, the majority use some cross platform renderer (e.g. unreal 3 engine) and that will support DX9c. Cross platform DX support won't change until the Xbox 720 and PS4 arrive.
    I don't see how DX11 will change this?
    Reply
  • DerekWilson - Saturday, January 31, 2009 - link

    That's a really good point and something I should have considered.

    I think the gap in performance between the PC and consoles will so heavily favor PCs that it will inspire developers to once again shift their focus to the PC. I could be wrong though.
    Reply
  • ssj4Gogeta - Sunday, February 1, 2009 - link

    That will change if Microsoft choose Larrabee for XBox 720. Cause then it will support all the DirectX (even unreleased) versions and all the OpenGL versions. If MS chooses Larrabee, and it also becomes popular for PC gaming, we PC gamers may see a huge benefit because then the games will be built for the latest DX version. Reply
  • bobvodka - Sunday, February 1, 2009 - link

    It's all well and good saying that but Larrabee is currently utterly unproven technology.

    Don't get me wrong, I'd like to see it do well if only because 3 players in the GPU race will be better than 2 from a technology and consumer stand point, however everyone seems to be pinning their hopes on this technology when there hasn't even been a working demo of DX9 at the same speed as NV/AMD, never mind DX10 or DX11.
    Reply
  • ssj4Gogeta - Sunday, February 1, 2009 - link

    You're right. But I'm really excited. It would be so nice if Intel can really pull off such feat. we'll have 2 TFLOPS of general purpose parallel processing power! Reply
  • bobvodka - Saturday, January 31, 2009 - link

    The problem is, the consoles is where the money is. Combine that with a fixed hardware platform (well, hard drives and different screen sizes not withstanding) it makes for a much easier time for devs. Reply
  • piroroadkill - Saturday, January 31, 2009 - link

    That's definitely a good point. Any game engine these days has to support the major consoles for it to be successful - even if that only means the 360, you're still hamstrung by DirectX9, regardless of whether XP has it or not.

    From a personal point of view I'm still running XP because I run a clusterfuck of graphics cards that vista would shit the bed thinking about.
    Reply
  • William Gaatjes - Saturday, January 31, 2009 - link

    "Many under-the-hood enhancements mean higher performance for features available but less used under DX10. "


    I must be remembering it wrong but the same thing was sad about dx10. However it turned out to be nothing more then getting people to buy vista. I wonder how much will come true this time.
    Reply

Log in

Don't have an account? Sign up now