NVIDIA Announces CUDA 6: Unified Memory for CUDA

by Ryan Smith on November 14, 2013 9:00 AM EST

Posted in
GPUs
CUDA
NVIDIA
Compute
Maxwell

43 Comments | Add A Comment

43 Comments

Kicking off next week will be the annual International Conference for High Performance Computing, Networking, Storage, and Analysis, better known as SC. For NVIDIA, next to their annual GPU Technology Conference, SC is their second biggest GPU compute conference, and is typically the venue for NVIDIA’s summer/fall announcements. To that end NVIDIA has a number of announcements lined up for this year, so many in fact that they’re pushing out some of them ahead of the conference just to keep them from being overwhelming. The most important of those announcements in turn will be the announcement of the next version of CUDA, CUDA 6.

Unlike some prior CUDA releases, NVIDIA isn’t touting a large number of new features for this version of CUDA. But what few elements NVIDIA is working on are going to be very significant.

The big news here – and the headlining feature for CUDA 6 – is that NVIDIA has implemented complete unified memory support within CUDA. The toolkit has possessed unified virtual addressing support since CUDA 4, allowing the disparate x86 and GPU memory pools to be addressed together in a single space. But unified virtual addressing only simplified memory management; it did not get rid of the required explicit memory copying and pinning operations necessary to bring over data to the GPU first before the GPU could work on it.

With CUDA 6 NVIDIA has finally taken the next step towards removing those memory copies entirely, by making it possible to abstract the memory management away from the programmer. This is achieved through the CUDA 6 unified memory implementation, which implements a unified memory system on top of the existing memory pool structure. With unified memory, programmers can access any resource or address within the legal address space, regardless of which pool the address actually resides in, and operate on its contents without first explicitly copying the memory over.

Now to be clear here, CUDA 6’s unified memory system doesn’t resolve the technical limitations that require memory copies – specifically, the limited bandwidth and latency of PCIe – rather it’s a change in who’s doing the memory management. Data still needs to be copied to the GPU to be operated upon, but whereas CUDA 5 required explicit memory operations (higher level toolkits built on top of CUDA withstanding) CUDA 6 offers the ability to have CUDA do it instead, freeing the programmer from the task.

The end result as such isn’t necessarily a shift in what CUDA devices can do or their performance while doing it since the memory copies didn’t go away, but rather it further simplifies CUDA programming by removing the need for programmers to do it themselves. This in turn is intended to make CUDA programming more accessible to wider audiences that may not have been interested in doing their own memory management, or even just freeing up existing CUDA developers from having to do it in the future, speeding up code development.

With that said NVIDIA isn’t talking about the performance impact at this time. Memory abstractions such as these typically have some kind of performance penalty over manual memory management – after all, who knows more about the memory needs of an application than an application itself – but of course manual memory management isn’t going anywhere, as there will still be scenarios where the higher complexity is worth the tradeoff.

Meanwhile it’s interesting to note that this comes ahead of NVIDIA’s upcoming Maxwell GPU architecture, whose headline feature is also unified memory. From what NVIDIA is telling us they developed the means to offer a unified memory implementation today entirely in software, so they went ahead and developed that ahead of Maxwell’s release. Maxwell will have some kind of hardware functionality for implementing unified memory (and presumably better performance for it), though it’s not something NVIDIA is talking about until Maxwell is ready for its full unveiling. In the interim NVIDIA has laid the groundwork for what Maxwell will bring by getting unified memory into the toolkit before Maxwell even ships.

Moving on, there are a pair of further, smaller additions that will be coming to CUDA with CUDA 6. The first of these is that CUDA 6 will come with new BLAS and FFT libraries that are further tuned for multi-GPU scaling, with these new libraries supporting scaling of up to 8 GPUs in a node. Meanwhile NVIDIA will also be releasing drop-in compatible libraries for BLAS and FFTW, allowing applications that use those libraries to use the GPU accelerated version of their respective routines just by replacing the library.

Wrapping things up, NVIDIA will be showing off CUDA 6 and the rest of their announcement at SC13 next week. Meanwhile we’ll be back on Monday with coverage of the rest of NVIDIA’s SC13 announcements.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

43 Comments

View All Comments

AMDshit - Thursday, November 14, 2013 - link
You mean AMD pUTA marketing vs reald world CUDA?
Spunjji - Thursday, November 14, 2013 - link
What a laugh you are! How'd you slip through the mods with a username like that? :D But seriously, this is fairly typical nVidia "no look at me" marketing. It usually works for them, so good luck to them. More tools for GPU programming is a good thing.
wwwcd - Thursday, November 14, 2013 - link
Previously stacked dram was announced like maxwell feature :D
AMDshit - Thursday, November 14, 2013 - link
Cool story, AMD lunatic.
ddriver - Thursday, November 14, 2013 - link
It will boost productivity, but not performance, under the table memory is still being copied around.

Maybe nvidia should scrap pimping their proprietary closed tech and contribute a bit to something open, portable and platform independent .. like OpenCL. The effort will be much more appreciated that those attempts to perpetuate the fragmentation.
AMDshit - Thursday, November 14, 2013 - link
So why Mantle isn't based on OpenCL?
DanNeely - Thursday, November 14, 2013 - link
Because mantle is a GPU rendering API not a compute API?
ddriver - Thursday, November 14, 2013 - link
Because mantle is a GRAPHICS API, and OpenCL is a COMPUTE API. But then again, looking at your screen name I sit and wonder why I even bother acknowledging your pitiful existence :)
AMDshit - Thursday, November 14, 2013 - link
OK. Why not OpenGL?
ddriver - Thursday, November 14, 2013 - link
So, according to your brilliant logic, AMD should use OpenGL to implement a low level alternative to OpenGL? And no, don't bother answering, it was a rhetorical question, if you don't know what that is, look it up.

NVIDIA Announces CUDA 6: Unified Memory for CUDA

Post Your Comment

43 Comments

View All Comments

AMDshit - Thursday, November 14, 2013 - link

Spunjji - Thursday, November 14, 2013 - link

wwwcd - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

DanNeely - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

Log in

Don't have an account? Sign up now