OpenCL 1.0: The Road to Pervasive GPU Computing

by Derek Wilson on 12/31/2008 6:40 PM EST


Back to Article

  • v12v12 - Wednesday, January 07, 2009 - link

    Testing123, ignore plz Reply
  • corporategoon - Tuesday, January 06, 2009 - link

    Did this article go through an editor? Reply
  • chizow - Friday, January 02, 2009 - link

    Kind of surprising you didn't directly address this given the amount of FUD being thrown around with regards to PhysX, particularly from AMD and its supporters. You indirectly answered what I had already suspected however, that given Nvidia has stated they plan CUDA to be fully portable to both OpenCL and DX11 there should also be no portability issues for AMD and Brook+:


    AMD could make an investment in the CUDA for C language and create either their own compiler (nothing is stopping them). But then you still have the same problem of interoperability as if NVIDIA implemented Brook+. If NVIDIA or AMD want to make their solution work with the other guy, they would need to write a wrapper to translate CAL to PTX or PTX to CAL.

    I'm guessing the unfinished thought from the first sentence should read something like "or write a CUDA to Brook+ wrapper" as thats essentially what the last part suggests. Since both vendors will write wrappers for their code to OpenCL, perhaps this wrapper could pull double duty, although it would double the amount of transcoding needed. Less than efficient for sure, but certainly better than a complete impasse due to incompatibility.

  • ltcommanderdata - Friday, January 02, 2009 - link

    Are you suggesting that hardware PhysX acceleration will come to AMD GPUs as soon as nVidia and AMD enable hardware OpenCL support? Because I don't think it's that simple.

    nVidia seems to have rebranded the meaning of CUDA. Maybe it's all just marketing speak, but CUDA before seemed to mean using nVidia GPUs for GPGPUs operation in general. But now since OpenCL, CUDA seems to more specifically related to the GPGPU interface to nVidia GPUs with languages being separate on top, namely OpenCL, DX11 and C for CUDA. If PhysX is written in C for CUDA, which it no doubt is seeing there wasn't anything else available up to now, then adding support for the OpenCL language in the CUDA interface layer won't help get PhysX supported on AMD GPUs. PhysX will still be written in nVidia's proprietary language which AMD GPUs can't understand. To support AMD GPUs, either nVidia will have to rewrite PhysX from C for CUDA to OpenCL, which would be awfully generous of them or AMD will have to make a C for CUDA to CAL translator and hope PhysX doesn't have any nVidia hardware specific optimizations, which it no doubt has, to mess things up.
  • apanloco - Friday, January 02, 2009 - link

    Anyone knows if multiple applications can take advantage of OpenCL at the same time? I think OpenGL is exclusive to one application, but if OpenCL is used by regular applications this could be a problem?
  • yyrkoon - Thursday, January 01, 2009 - link

    "With R580 AMD (then ATI) actually published part of their ISA and called the initiative CTM (for Close to Metal). Before we had a beta version of CUDA, we had folding@home GPU accelerated on R520 and R580"

    I also read an interview through where ATI was emulating Direct 3D 10 calls in hardware on one of their x1900xtx's ( Direct 3D 9 hardware )long before I heard about folding@home on the GPU. I remember being so impressed with the technology, that I could not wait until Vista + Directx 10 titles became available. Too bad that there are so few ( if any ) titles that currently take advantage of this technology in the ways I had hoped. Hopefully that will change soon.
  • ltcommanderdata - Thursday, January 01, 2009 - link">

    It's interesting that you mentioned that AMD and nVidia look to be continuing to push their proprietary GPGPU solutions, but AMD has actually made statements they are abandoning their proprietary CTM GPGPU implementation and are moving fully to OpenCL. Admittedly, its probably just a realization that CTM isn't taking off as fast as CUDA and it's in their best interest to push OpenCL. In comparison, nVidia will continue to develop their own CUDA implementation alongside OpenCL.

    I wonder if you can get a statement from nVidia whether they will move PhysX to OpenCL? Right now I believe PhysX is written in C for CUDA and of course requires nVidia GPUs for hardware acceleration. If they moved to OpenCL, then AMD GPUs would support it as well. Although perhaps nVidia prefers to keep PhysX to themselves as a product differentiator.

    It'd also be interesting if you could ask AMD whether older GPUs like the X1600, X1800, and X1900 will be supported in OpenCL? You already pointed out in your article that the RV530, R520, and R580 had GPGPU folding@home clients so they are certainly capable of GPGPU operation. It'd probably be in ATI's own interest to have as large an OpenCL base as possible and ATI's original FireStream dedicated GPGPU card was R580 based as well. Apple could probably help them as well seeing the number of X1600 and X1900 used in various iMac, MacBook Pro, and Mac Pro generations that could use support for OpenCL in Snow Leopard.

    And I agree with melgross that it's strange Apple got no mention in the article seeing that they pretty much developed OpenCL, then submitted it to Khronos, and was no doubt a major driving force behind the quick ratification in order to get it ready for Snow Leopard. And I believe Apple's Aaftab Munshi was the chair of the OpenCL working group.
  • danger22 - Thursday, January 01, 2009 - link

    i am looking forward to the day when I can run my finite element simulations on my GPU. come on Ansys its time for a GPGPU Multiphysics! Reply
  • Amiga500 - Thursday, January 01, 2009 - link

    Same boat, same boat... with both CFD and FEA.

    Have you heard of FEAST-GPU (from Dortmund university)?

    Its a GPU accelerated FE package - unfortunately it isn't out in the public domain yet.

    Anyhow - from my own digging, I'm not sure if the CPU is a major bottleneck for FE simulations - a lot of what I see tends to point towards the hard-drive and I/O performance.
  • Sheep100 - Sunday, January 04, 2009 - link

    If you provide enough RAM to the analysis you definitely end up CPU limited for single core runs. We have 24 - 32 GB per node for Abaqus and Nastran analyses. The nodes get RAM - bandwidth limited when stepping up the number of cores used or the number of concurrent runs on a node. We are looking forward to the core i7/Nehalem Xeon systems coming soon that will provide a big improvement here. (These codes run slower on Opteron cores.)

    GPGPU versions of Abaqus, Nastran & Ansys would be very interesting given the large memory bandwidth available on the high end cards. I suspect that re-writing & validating the various solver algorithms to target OpenCL would be a long process. I'm also unsure how possible it is to get data parallelism out of them since the scaling rate of Abaqus, for example, on multi-core systems, even with good bandwidth, is not anywhere near linear. Although this might just highlight the deficiency of the current method of extracting parallelism.
  • melgross - Thursday, January 01, 2009 - link

    It's interesting that while ATI and Nvidia are heavily mentioned with their rapidly depreciating standards, Apple, which after all, developed OpenCL isn't mentioned even once, though it will also likely be the first to implement OpenCL in 10.6 later this year, possibly by March. Even their Logo isn't shown. Very strange! Reply
  • Wwhat - Monday, January 05, 2009 - link

    By march they might (should) not be the first but graphicscard makers should have updated their drivers to support it already, after all they were well aware of OpenCL long before and already announced they would support it, and nvidia said that porting to it would be easy, plus both ATI and nvidia have no problem at all releasing unstable software/drivers, none at all, as we all experienced.
    Oh and nvidia had an OpenGL3 driver out in like 2 days after final specs and ATI a in a few weeks, so that makes you think they can put some steam behind their efforts if they want to.
  • dvinnen - Thursday, January 01, 2009 - link

    The logo picture was taken from their site Reply
  • rdbrown - Friday, January 02, 2009 - link

    On the the Khronos website right above the "Logos" Apple is the one who initially proposed the working group, Apple is also mentioned in the list of companies. They must not of posted Apple's logo knowing that everyone who knows anything about Open CL knows that it is Apple's technology, Heck Apple even owns the trademark rights. Reply
  • melgross - Thursday, January 01, 2009 - link

    At least they should have been mentioned in the article. Reply
  • yyrkoon - Thursday, January 01, 2009 - link

    And to say what ? That Apple feeling left out in the cold has made efforts to take the next obvious step and standardize GPU processing( very late in the game )? That is, assuming what you're saying is true.

    Gee, how very innovative of them.
  • hakime - Saturday, January 03, 2009 - link

    Shut up you are trolling!! You don't know what you are talking about, period.

    The fact that there is not reference of Apple in the article is a serious drawback. Apple invented and designed Open CL as mush as SGI invented and designed Open GL, ignoring it is simply wrong. Credit to who is deserved for, and Apple deserved the credit for inventing Open CL, you have to admit it either you like Apple or not.

    Apple has taken the industry of HPC upside down with Open CL, for the first time there is one single state of the art API and environment for high performance, multi-core and GPU programing, which is also OS and hardware independent. Open CL goes well beyond Direct X, as the latter is not only limited to what you can do for GPGPU, but also it is only designed for GPU (Microsoft is very late in the world of GPGPU, Apple has been targeting the GPU for high performance processing for a while now with Core Image and Core Video).

    Open CL offers an unique interface for both CPU and GPU, which in other words means that it brings together different technologies like Open MP or CUDA, this is unique in the industry, Apple deserves the credit for having created this single interface.

    Open CL is designed to target a large set of devices like CPU, GPU, Cell chips, DSPs, Direct X can't do that. Open CL targets small factor devices like the iPhone, Direct X does not and can not.

    Not only the author of the article fails to recognize this unique aspect of Open CL, but he also fails to comment on the effort made by Apple in creating Open CL. Again you like Apple or not, that does not matter, give the credit to who it is deserved for and get the facts right.

    Please correct the article and make it more interesting on what Open CL is really for, not the general bla, bla which is written.

  • ltcommanderdata - Thursday, January 01, 2009 - link

    Which part isn't true? That Apple developed OpenCL and then submitted to Khronos? Since even Khronos admits that is true.">

    "Apple has proposed the Open Computing Language (OpenCL) specification to enable any application to tap into the vast gigaflops of GPU and CPU resources through an approachable C-based language."

    Apple's Aaftab Munshi was also the chairman of the OpenCL working group.

    And how is OpenCL late in the game? I'm pretty sure that DirectX 11 is the only standardized GPGPU implementation across multiple vendors, but it's still in beta. In comparison OpenCL has been ratified, in record time compared to OpenGL 3.0, probably due to Apple's pressure to get it ready for Snow Leopard. And nVidia has already released OpenCL drivers for Windows and Linux.">
  • yyrkoon - Thursday, January 01, 2009 - link

    Oh, and sorry, my original point was something like this. While the true innovative companies are squabbling about whose product is superior, Apple sneaks up behind them, and claims to have invented the internet. In other words, whether Apple participated or not, an open standard would have been made. Reply
  • melgross - Friday, January 02, 2009 - link

    You're not very knowledgeable. You ARE very anti-Apple apparently.

    And why do gamers have to be the most beneficial parties? What's so great about gaming? Besides, OpenCL will benefit them, as well as parties that won't be benefitted by DirectX. Is that a bad thing? To you, it seems to be.

    If MS had developed this, you would be jumping up and down, and claiming that it was the next step beyond the now old DirectX methodology, and far more useful.

    Like it or not, this IS a major innovation, otherwise, so many companies of note wouldn't be signing on so quickly.

    Whether Windows users benefit from this, or are left out of it is up to MS, who seems only interested in destroying standards that don't result in MS's increasing dominance. Too bad for them! That doesn't work too well anymore.

    You know nothing about innovation at all. That's sad. Just go on being blinded by your prejudices, we all see it for what it is.
  • yyrkoon - Saturday, January 03, 2009 - link

    Apparently I *am* more knowledgeable than some here. How you can twist the context of comments to your misguided reasoning ( that I favor Microsoft ) is beyond me. Do I prefer Windows to OSX ? Yes. Why? Because maybe Microsoft is not perfect, but at least they do not force unwanted hardware on me to use their software.

    Windows is the only real gaming OS. Period. And I suppose my comment about Cross platform applications, and other good strong possible uses in a *NIX environment fell on deaf ears too( uses for OpenCL ).

    There is nothing wrong with OSX, it is after all based on BSD. However I will not over pay for hardware *just* to use it either. There are too many free operating systems that are just as good. If I need Windows application compatibility, I will just run Windows. Apple offers me *nothing* I have to have.

    Now, who here is truly blind ?

  • melgross - Saturday, January 03, 2009 - link

    You just want to think you are.

    You have gaming on the brain. I guess you must BE a gamer as that's all they think about anyway.
  • Penti - Saturday, January 03, 2009 - link

    Really who cares about the gaming? This isn't a physics framework or engine.

    It can be used in games, but this isn't really about a discussion on Apple gaming. That's not really why it can "speak" to each other.

    Apple got a lot of professional applications that today uses the open standard OpenGL like photo editing, video editing, VFX and others (scientific apps etc) on their platform, for not only graphics but for gpgpu, from not only them selfs but from vendors such as Adobe and Avid. Most of the apps also use OpenGL for acceleration in Windows too. Besides that, OpenCL will be available for handheld devices such as mobile phones. Even though Microsoft does software for phones you won't see DX11 or GPGPU there. Not that I'm an Apple fanboy, but I can see why Apple builds on what's already around and extends OpenGL and free standards. They can't rely on close standards, most of their apps (other vendors for OS X) are to some degree cross platform as they should be. CUDA is already available on the Mac too. But you can't expect them to run DX. This isn't about Apple as an OEM either. It's about software (Microsoft does hardware too). It's engineered to fit a wider picture and a wider array of devices including Windows, there isn't anything bad about that. There isn't anything bad about getting consumer and professional apps a boost in using GPGPU. It's certainly what some ISVs want. Theres more then gaming in the world. Microsoft are free to do whatever and nobody has said that they aren't best on games, but people are also free to criticizes and complain about Microsoft, just as they are about Apple and there certainly is a lot to be criticizing both about. Apple for certain can't just be catering to it selfs, not when they and their software vendors want something else. Microsoft essentially can. As most are already deeply invested in Microsoft tech and soft. That doesn't mean Windows users can't benefit from the Apple developed OpenCL. Their certainly is Windows only apps that will use it. Even non OpenGL ones. It's not only a cross platform library.
  • Atechie - Friday, January 02, 2009 - link

    Drop the Apple-preaching, it's uninteresting as Apple is neither HPC nor the mainstay platform for CUDA/Brook+/OpenCL.

    .oO(I swear, Apple-jocks are like religious zealots, they can stop pushing their religion down everbody elses throat...interested or not.)
  • melgross - Saturday, January 03, 2009 - link

    Yeah, just like people like you who do the opposite?

    Why mention the company who did all the work, as long as it's Apple? Right? That' makes people fanboys if we think a proper mention should be made?
  • Shadowself - Friday, January 02, 2009 - link

    So anyone says anything positive about Apple and immediately that equates to being an Apple zealot? It appears more likely that your personal bias is showing.

    It is absolutely true that Apple's Mac has NEVER been a gamer's platform -- and it probably never will be. Additionally, Apple has never fully supported (or even properly supported, IMHO) any development other than their core groups (K-12, Undergraduate to some extent, graphics and motion picture artist communities, and publishing). Thus Apple supports low to mid range graphics card and very high end 3D cards -- but absolutely nothing for the moderate to high end gamer.

    However, Apple did do the vast majority of OpenCL before submitting it to become an open standard. Apple wants to expand its role in the graphics and motion picture communities. The only way to do this was to do something like OpenCL. Additionally, Apple knew that a completely closed set of APIs was not going to gain any traction. Thus they submitted it as an open standard and gave up control of it.

    Not mentioning that Apple did the majority of OpenCL is wrong. For anyone to claim Apple did this altruistically is wrong. To bash Apple for coming up with something that has become a cross platform standard that can utilize both AMD and nVidia cards as well as a host of other hardware is wrong.
  • yyrkoon - Thursday, January 01, 2009 - link

    I never said it wasn't true. Let us just say that I am less than inspired to even bother looking. OpenGL is very low on my personal list of priorities, and I could care less what Apple does( unless perhaps if someday they compete head to head with Microsoft ).

    Still, no matter how much I like or dislike OpenCL, chances are pretty good that on Windows platforms, it is going to be rendered( pun? ) moot. Maybe it will make the next greatest XGL even more powerful, so all those people who like to play with their application windows in linux can spend all day every day bragging/ making youtube videos about how their desktop UI can do *this*, and *that* while remaining even less productive than before ; )

    Yes, the above is sarcasm to some extent, but it also true to an extent as well. OpenCL will help those who prefer and alternative to Windows do similar things without having to own Windows. Scientists who want to use GPGPU(s) to crunch some serious numbers, etc. What it will not do however is make the majority of gamers out there happy. *Unless* the majority of game developers start using OpenGL/CL on the Windows platform( Which is very unlikely ). Certain cross platform applications however could benefit, sure.
  • Penti - Friday, January 02, 2009 - link

    So OpenCL and OpenGL is bad because it's cross platform and open standard? If you look at who's involved you see companies like ARM and embedded computing companies, they can't really use anything like DX11. This isn't just for games but GPGPU in general.

    It's not like there isn't apps using OpenGL on Windows either. But it's rather about a broader spectrum then owning or not owning Windows. It's for a wider category of devices then DX11 is. You won't have DX11 cellphones. But you will have OpenCL on the next gen Sony and Nintendo consoles, handhelds, settopboxes etc. In HPC too, there will be libraries/frameworks to help you out.

    Of course theres professional apps such as Photo-editing, video-editing and encoding, VFX, CAD / GIS, math and other engineering software that could benefit widely from Open CL. And a lot of them are cross-platform. Or at least would need the OpenCL on for example the Mac. Where they might have many customers.
  • kevinkreiser - Wednesday, December 31, 2008 - link

    a while back i published a paper that involved performing an iterative deconvolution on the GPU. the point of the paper was that we could do it in real-time and use it on videos with arbitrary spatially varying blur kernels.

    anyway the largest overhead was copying the render target (single iteration of the algorithm) to initialize the next iteration. if dx11 and opencl allow the gpu and cpu to work with the same memory, without the need to copy between the two, this will speed up gpgpu apps tremendously.
  • has407 - Monday, January 12, 2009 - link

    OpenCL itself is neutral; it provides both explicit copy and map functions, in both synchronous and asynchronous forms. Obviously what works best will depend on platform capabilities and run-time intelligence (e.g., copy/map optimizations based on platform capabilities and program behavior).

    However, that still doesn't necessarily allow for a large mapped/shared memory between the CPU and CPU. That and its efficacy is going to be implementation dependent and OpenCL has simply defined a model that should be portable and useful, even if suboptimal on a given implementation--but if you know enough about the implementation, gives you sufficient optimization choices.

    That requires some constraints on the memory model, in particular the consistency/correctness of various memory regions with respect to computational elements at different points and times, and especially with respect to mapped memory (NB: sec of the spec).
  • DerekWilson - Wednesday, December 31, 2008 - link

    that is not possible -- when using the CPU to process data you need to copy it off the GPU ... when using the GPU to do processing, you need to copy data onto the GPU.

    What you don't need to do is to worry about copying data from an OpenGL buffer that resides on the GPU to another buffer in order to work on it with OpenCL.

    In DX11, you can share buffers between the Pixel Shader and the Compute Shader. Both of these are processed on the graphics card. You can do graphics work and general purpose compute work on the same data ... this is useful for effects physics, visualization of calculations, or complex shaders that might not be possible in the constraints of HLSL.

    With OpenGL + OpenCL, you can do the same thing -- share data between graphics buffers and OpenCL buffers. But these buffers reside on the GPU.

    In both DX11 and OpenCL, data must be moved off the GPU to process it with the CPU.

    If OpenGL and OpenCL did not have binary level buffer compatibility, worst case we would need to copy OpenGL buffers off the GPU, convert them, copy them into OpenCL buffers in the correct format, and then re-upload the data to the GPU. Alternately, we could modify the buffer on the GPU, but that would still require processing power and incur a performance penalty.
  • Jaybus - Friday, January 02, 2009 - link

    I think kevinkreiser was advocating a shared memory architecture, where CPU and GPU could access the same physical RAM, so that there would be no need to copy buffers. However, I disagree with such an approach, because that is only eliminating the buffer copy overhead by forcing the use of a global mutex or some other method of shared memory arbitration. The bottleneck would then become memory contention, offsetting any performance gained by eliminating the copy. Reply
  • Loki726 - Friday, January 02, 2009 - link

    PCIe latency is incredibly large compared to memory copy latency. For example, a synchronous copy of a single byte from CPU memory to GPU memory on an 8800GT using CUDA takes around 100k cpu cycles to complete, where non-cached CPU memory copies of the same size are in the order of 100-1000s of cycles. PCIe transfers only become fast when copying large chunks of data.

    You are right that a shared memory architecture would require synchronization via some mechanism (mutex or other), but this would still be much faster than a DMA copy over PCIe for small data sizes if it was implemented correctly. There is no reason it should be any slower than sharing data between two threads in an SMP.

    I think the reason why no one builds systems like this is because low latency access to a shared DRAM would require complex protocols between the GPU and CPU memory controllers to ensure memory consistency and coherence, and no one builds CPUs and GPUs that closely integrated.
  • DerekWilson - Saturday, January 03, 2009 - link

    Some people built / build systems like this -- they are called game consoles ;-) Reply
  • Loki726 - Saturday, January 03, 2009 - link

    Good point Derek. The Xbox 360 supports tightly integrated CPU-GPU communication:

    "The bus design and the CPU L2 provide added support that allows the GPU to read directly from the CPU L2 cache."[1]

    [1] Andrews, J. and Baker, N. 2006. Xbox 360 System Architecture. IEEE Micro 26, 2 (Mar. 2006), 25-37

  • Wwhat - Monday, January 05, 2009 - link

    Whatever happened to the hypertransport bus on motherboards and making graphics card for it? That would nicely cover both issues, plus since intel also is going in that direction they might agree with AMD at some far point in the future on a universal direct CPU transport bus connector.

    Or perhaps the graphicscard makers should consider making a universal socket on their graphicscards that connects to the motherboard to a dedicated connector designed for DMA between a shared memory space with the CPU, a cache designed for shared GPU/CPU use, the advantage would be that people would yet again be forced to buy a new motherboard, and chipset, and that will keep the money rolling in ;]

    Personally I think they should sit down in some room alone with themselves and think a bit until they realise having everybody doing their own propriety interfaces and systems is NOT a nice and positive and helpful and even economical way to go about thing and that making a plan then talking in a group with the 'opposition' and then tweaking it before releasing isn't such a bad idea and might actually lead to MORE profit and innovation.
  • Loki726 - Monday, January 05, 2009 - link

    The interface you are thinking of is called HTX and there are some specialized products that use it. Hypertransport may be an open spec, but the memory transfer and coherence protocols used by AMD are not open. So it is not possible for a third party vendor to sit down and implement an HTX card that could work cooperatively with an AMD processor without negotiating a license from AMD. Intel's equivalent Quickpath is similar, but not even an open spec. PCIe is not an open spec either, but is controlled by a consortium that offers third parties pretty much equal opportunities to obtain a license.

    Someone correct me if I'm wrong, but I'm not sure if dramatically reduced CPU/GPU memory copy latency would be useful for graphics applications. Games seem to scale just fine with the PCIe. Obviously there will be specific cases where it will be useful, but in general, the industry hasn't had a problem getting huge speedups over CPUs without it.

Log in

Don't have an account? Sign up now