OpenCL 1.0: The Road to Pervasive GPU Computingby Derek Wilson on December 31, 2008 6:40 PM EST
- Posted in
Why NVIDIA Thinks CUDA for C and Brook+ Are Viable Alternatives
While OpenCL is a high level API, it does require the programmer to perform certain tasks that don't have much to do with the parallel algorithm being implemented. OpenCL devices in the system need to be found and set up to properly handle the task at hand. This requires a lot of overhead like creation of a context, device selection, creating the command cue(s), management of buffers for supplying and collecting data on the OpenCL device, and dynamically compiling OpenCL kernels within the program. This is all in addition to writing kernels (data parallel functions) and actually using them in a program that does useful work.
The overhead and management work required is similar to what goes on with OpenGL. This makes sense considering the fact that both use GPUs, they can share data with eachother, and that the same standards body that manages OpenGL is now managing OpenCL. But the fact remains that this type of overhead is cumbersome and can be a real headache for anyone who is more interested in the algorithm. Like scientists working on HPC code who know the theory much better than the programming most of the time.
Both Brook+ and CUDA for C hide the complexity of setting up the hardware by allowing the driver to handle the details. This allows developers to write a kernel, use it, and forget about what's actually going on in the hardware for the most part. Going with something like this as a first move for both NVIDIA and AMD was a good move, as it allows developers to get familiar with the type of programming they will be doing in the future for data parallel problems without tacking higher levels of complexity than necessary.
NVIDIA, for one, believes a language extension as opposed to an API like OpenCL has major benefits and will always have a place in GPU computing (and especially in the HPC space where scientists don't want to be programmers any more than they need to). When asked if they would submit their language to a standards body, NVIDIA said that was highly unlikely as there are other language efforts out there and NVIDIA has been advancing CUDA for C much more rapidly than a standards body would.
On the down side, putting more control in the hands of the developer can result in better, faster code. There is a bit of a "black box" feeling to these solutions: you put code in and get results out, but you can't be sure what goes on in the middle to make it happen. OpenCL gives you better ability to fine tune the software and make sure that exactly what you want to happen happens. Despite NVIDIA's assertions that scientists interested in coding for HPC solutions will have a better experience with CUDA, the cost/benefit of ultra-fine tuning code for HPC machines leans heavily in favor of spending the time and money on optimizing. This means that OpenCL will likely be the choice for performance sensitive HPC applications. CUDA for C and Brook+ will likely have more of a place in just trying out ideas before settling on a final direction.
So there you have it. OpenCL will enable applications in the consumer space to take advantage of data parallel hardware, while Brook+ and CUDA may still have a place in the industry as well (but not on the consumer side of things). That is, until some other more popular standard data parallel language extensions come along and pushes both CUDA for C and Brook+ out of the market.