NVIDIA Announces CUDA 4.0

by Ryan Smith on 2/28/2011 9:00 AM EST
POST A COMMENT

45 Comments

Back to Article

  • HangFire - Monday, February 28, 2011 - link

    I'm not clear what previous generations, or even current generations, or nvidia hardware CUDA 4 is supported on (or not).

    Nvidia has already jerked around the audio/video production market by turning off features in their driver for the 8800 generation, even though the h/w is capable. They seem to be doing the same thing with CUDA support, abandoning older hardware before delivering on their promises.
    Reply
  • mmrezaie - Monday, February 28, 2011 - link

    It must be just for fermi generation. Reply
  • seibert - Monday, February 28, 2011 - link

    I don't know anything about the AV issue you describe, but as a CUDA developer, I think NVIDIA has done a good job of backwards compatibility when feasible. CUDA-capable hardware is evolving quickly, but I still have 4 year-old code which happily compiles with CUDA 3.2 and runs with the latest drivers on an 8800 GTX. The CUDA Programmer's Guide is pretty clear about what extra features will work on each generation of card.

    I have no idea what the hardware requirements will be for the Unified Virtual Addressing, but I could imagine that it would require features deployed in the GT200 series, and few G90 integrated chipsets. ("zero copy support", is what I'm thinking of here)

    Full C++ support on the device was not promised until Fermi, so I fully expect that the addition of virtual functions and new/delete will still work on Fermi as well.

    From the perspective of the developer community, this is an exciting release addressing a number of highly requested features, but it's not revolutionary to the point of requiring people to go throw away all their hardware. :)
    Reply
  • seibert - Monday, February 28, 2011 - link

    In case anyone cares: The Unified Virtual Addressing requires 64-bit addressing support on the host (so only 64-bit Linux, Windows, and Mac) and 64-bit on the GPU, which means Fermi and later.

    In retrospect, this seems obvious since you would blow through a 32-bit address space very easy with an average amount of host memory and two decent GPUs.
    Reply
  • habibo - Monday, February 28, 2011 - link

    Since there are 6 GB Tesla cards now, a 32-bit address space is no longer sufficient address even a single GPU :-) Reply
  • GuinnessKMF - Monday, February 28, 2011 - link

    I think you're calling them out on something that is very real world inconsequential. I have played around with CUDA and would love to work with it more, but professionally I'm not in their target market. I can still develop with it on an older card and learn CUDA, but if I were developing software for medical imaging or searching for oil, I would have access to the highest end hardware.

    I think it sucks that I don't have access to the full feature set, but the people they are targeting do. I think the consumer market would really benefit from having the kind of GP computing power (things like folding @Home), but they can't hamstring future versions of CUDA for backwards compatibility. As the architecture matures and consumers start to actually have a market for GPGPU computing they will be more conscious of backwards compatibility.

    The reality is that if you bought a graphics card with CUDA 2.0 support, there wasn't a promise of CUDA 4.0 features, you hoped it might get them, but sometimes the hardware just isn't capable of supporting the features 4.0 brings to the table.

    TLDR version: I'm with you, it sucks, but you can't blame them IMO.
    Reply
  • IceDread - Monday, February 28, 2011 - link

    I strongly dislike nvidia for cuda. Some years ago it might have had a place but it does not today since there are tech that everyone can use available. Nvidia also shuts down their cards if you purchase an nvidia card to use it for cuda but use a card from say AMD as main card. Reply
  • A5 - Monday, February 28, 2011 - link

    Not sure what your point is here - CUDA is still much further along than OpenCL in any reasonable measure (tools, libraries, etc). Reply
  • raddude9 - Monday, February 28, 2011 - link

    Wrong, "Openness" is a very reasonable measure, and in that respect nvidia is far behind OpenCL. Reply
  • Genx87 - Monday, February 28, 2011 - link

    And OpenGL is way behind DX now. Openess imo means nothing if nobody uses it. Reply
  • moozoo - Monday, February 28, 2011 - link

    and everyone uses windows. Or did I miss a DX for linux and Mac OS announcement...
    For those that require the feature "works on platforms other than windows" DX is a massive fail.
    Reply
  • ET - Tuesday, March 01, 2011 - link

    Wine includes DX compatibility. Reply
  • ET - Tuesday, March 01, 2011 - link

    It's not. OpenGL has often been behind DX, but it keeps catching up. OpenGL 4.0 was released quite a few months after DX11, but it did catch up, and OpenGL (currently at 4.1) offers some features not available in DX11. That's not something new, actually. OpenGL has always had some features not supported by Direct3D.

    So while OpenGL will probably never be at the front line of any new 3D generation, it's not "way behind DX now".
    Reply
  • habibo - Monday, February 28, 2011 - link

    NVIDIA has been one of the primary supporters of OpenCL since its inception. They're always the first with drivers and continue to support OpenCL today.

    They continue to develop CUDA to give developers access to new hardware features sooner than they would otherwise have them with OpenCL which, like all open standards, suffers from the "design by committee" and lowest common denominator problems. Futhermore, CUDA and OpenCL have different programming models. Some developers prefer one over the other, so there will possibly always be a demand for both languages. There are many languages for programming CPUs, I'm unclear as to why having more than one language for programming GPUs is heresy. To claim that "NVIDIA is far behind OpenCL" simply shows a shocking ignorance of the GPU computing landscape.
    Reply
  • moozoo - Monday, February 28, 2011 - link

    I hate to break it to you. But Nvidia basically pulled the plug on all their openCL work once they where beaten by AMD to release opencl 1.1 drivers.

    Their opencl 1,1 has been stuck in beta for 8 months now
    http://forums.nvidia.com/index.php?showtopic=19374...

    My guess is that they transferred all their software development resources to Parallel Nsight and apparently CUDA 4.0
    Reply
  • habibo - Monday, February 28, 2011 - link

    You're not breaking anything to me. :-)

    Khronos Group certified NVIDIA's OpenCL 1.1 drivers as the first conformant implementation back in July:
    http://www.khronos.org/adopters/conformant-product...

    Since they were the first available OpenCL 1.1 drivers that were certified conformant, I'm not sure where you got the idea that AMD beat them to it. Is it because AMD "released" theirs and NVIDIA's were just "pre-release"? As for why NVIDIA does not consider these drivers more than pre-release after successfully passing all of the Khronos conformance tests, I have no idea. It certainly seems odd that NVIDIA would not officially release the drivers.

    You're right, though: NVIDIA invests far more heavily in CUDA than in OpenCL because CUDA makes them money and OpenCL does not. This has nothing to do with proprietary vs open, however. It has to do with the fact that no one uses OpenCL. If OpenCL had even the modest adoption rate of OpenGL, I'm sure NVIDIA would invest in it the way they do with OpenGL.
    Reply
  • Namarrgon - Wednesday, March 02, 2011 - link

    True enough, if by "available" you mean "can be only downloaded by registered developers who have GTX 465 hardware or earlier".

    Fact is, after 8 months I still can't use OpenCL 1.1 features in my visual fx app because none of the 90% of my customers who use nVidia hardware can actually get a driver that supports it. Not to mention that many of them have cards that aren't supported by that 8-month-old developer-release driver. What's the use of being "first" if it's on paper only?

    Since 10% of my customers do use ATI cards, CUDA is not an option for me. OpenCL has just as much potential to sell cards for them as CUDA, probably more, but this ongoing "delay" to release final drivers is looking suspiciously deliberate, despite nVidia's claim to be open & API-agnostic.
    Reply
  • B3an - Monday, February 28, 2011 - link

    I wondered how long it would be before morons like yourself start commenting about "openness".

    All you people have one thing in common - you have absolutely no idea what you're talking about. Most of you tools will even be making comments like this on OS's that are open, and happily using proprietary software and hardware.
    Reply
  • raddude9 - Tuesday, March 01, 2011 - link

    Wow, bitter.

    What in my comment makes you think that I don't know what I'm talking about. Please point it out. I in fact do know about the level of openness of the OS's and tools I use.
    Reply
  • ET - Tuesday, March 01, 2011 - link

    I agree and disagree. NVIDIA is catering to the high performance crowd and CUDA users will definitely appreciate this update. A GPGPU programmer will find this CUDA version a lot easier to use than OpenCL, so NVIDIA is doing a good job here.

    On the other hand I agree that for the consumer market NVIDIA is deliberately working against the use of AMD graphics cards alongside NVIDIA ones, which is annoying.
    Reply
  • DanNeely - Monday, February 28, 2011 - link

    Dunno. NVidia's compatibility page hasn't been updated past v2.0.

    http://www.nvidia.com/object/cuda_gpus.html
    Reply
  • mmrezaie - Monday, February 28, 2011 - link

    I wished years for features just presented in cuda 4. It again went another step besting OpenCL simplicity of programming. The new memory model is just a great step, but I am not sure how they are doing it in driver. maybe by reading SDK manual and samples I will get it. Reply
  • StormyParis - Monday, February 28, 2011 - link

    It's nice and all, and nVidia's PR people can certainly get sites to talk about it (ad budgets and linked review samples must help, too). BuT I've never seen CUDA actually used anywhere.

    Except for the 0.5-1% of us who use photo and video editing tools intensively, why should we care about it ?
    Reply
  • Ryan Smith - Monday, February 28, 2011 - link

    I'll be the first person to tell you that the consumer market so far has been close to a flop. Outside of Adobe and Cyberlink/Corel/Arcsoft (and anyone else using video encode APIs) almost nothing on the consumer side of things is GPGPU accelerated in a meaningful manner. So you're correct in that you've never seen CUDA used anywhere.

    But, and this is the key point, it's quite a different story in industry. Engineering firms, research labs, etc are all using GPGPU products from both sides of the isle. These are products developed in-house, or cost thousands of dollars a seats and are used by a small handful of people. So CUDA is being used in a number of places (quite how much, I can't say), but you and I just aren't in environments where we regularly see it.
    Reply
  • haplo602 - Monday, February 28, 2011 - link

    that's because consumer market is a production one. you have to create data out of data which is slow (video/photo processing) once the working set does not fit into the card onboard memory.

    scientific uses are different. you run compute intensive analysis/simulation on a small working set (or one that can be split to fit the memory constaints).

    even the new NUMA like architecture still does not address the memory I/O problems, it just enables to split the workload a bit better.

    I guess once the integrated arm cores start to be used, we'll get something like multi-cell computers with PCIe buses as a backplane. However Infiniband would be much better here. basicaly a new type of industrial/super computer will be born. again nothing consumer friendly.
    Reply
  • formulav8 - Monday, February 28, 2011 - link

    Well a average consumer wouldn't care about it. If you are a Cuda developer then you definitely would. Reply
  • InternetGeek - Monday, February 28, 2011 - link

    In a way you are right, anything performing video encoding using VC-1 will use CUDA to accelerate encoding, otherwise it will take days to finish a half-assed encode. If you like to encode your movies while keeping the cool sound this is the way to go, because even cellphones can play the movie. Otherwise you have to use AAC-LC which is Apple's funky way of calling stereo sound. Reply
  • Genx87 - Monday, February 28, 2011 - link

    If you havent seen it then clearly nobody has. Reply
  • habibo - Monday, February 28, 2011 - link

    You're right that CUDA has not made much of a difference in consumer applications. This is fundamentally a problem of economics for software developers. It's a tough sell to cut out 50% of your potential customers by locking yourself into NVIDIA hardware!

    But it's made a huge difference in lots of industrial applications. NVIDIA claims to have sold around $100 million worth of Tesla computing GPUs last fiscal year. And if you look at last November's "Top 500" supercomputing list, 3 of the top 5 supercomputers in the world are built with NVIDIA GPUs.

    So CUDA is definitely important, though as you mention, not in a lot of places your typical user will notice. :-)
    Reply
  • MrSpadge - Monday, February 28, 2011 - link

    Matlab supports CUDA.

    MrS
    Reply
  • DaveGirard - Monday, February 28, 2011 - link

    I think the Mac CUDA audience is actually video pros, not Unix devs using Mac Pros. There are probably some educational/scientific users coding for CUDA on OS X but it would be dumb to use an expensive desktop to do a dumb Linux cruncher's job. Premiere Pro CS5, some Nuke and AE plug-ins, Da Vinci Resolve, etc are where they see CUDA being used on Macs.

    Sean Kilbride at NVIDIA told me that their focus appeal is on these video pros.
    Reply
  • LTG - Monday, February 28, 2011 - link

    CUDA reminds me of the old Prodigy or GEnie online services which were successful until internet standards took them out.

    GPUs are despately in need of more successful standards so things like OpenCL can flourish.

    Yes CUDA has more advanced capabilities but why shouldn't it when NVidia invests so much more heavily in it? In the case of online services there were so many companies who could benefit from investing in Internet standards that it became a tidal wave and thwarted quite a few proprietary techs.

    However so far GPUs don't have the open standard building investments to match CUDA so customers suffer. Internet standards steamrolled huge companies, but the same motivations don't exist here.

    I don't like CUDA. Not because there is something better but because it diverts resources from something that could be better.
    Reply
  • seibert - Monday, February 28, 2011 - link

    OpenCL exists because of CUDA. NVIDIA is highly involved in the OpenCL process, and the programming model of OpenCL bears a strong resemblance to CUDA.

    You are asking for the end before the beginning. Ultimately, all data parallel hardware (GPUs, AMD Fusion, multicore + AVX, etc) will be programmed with something like OpenCL, but first we need to figure out as a development community what mix of hardware and software features we need. The committee process of OpenCL necessarily limits the feature set to the lowest common denominator. (Why would any company want to put out a standard that their hardware cannot support?) That's fine, and if OpenCL meets your needs, you should use it!

    But these are still early days, and CUDA is an environment where NVIDIA is free to add new hardware features (or new language features) and immediately expose the API to developers. That's a great practical way to learn what is useful. Features which become critical will be adopted by many vendors and later appear in future OpenCL standards. Innovation seldom happens by committee.

    But developers have to be aware that they multivendor hardware compatibility for features when they pick CUDA. That works for some people. (Although there is fascinating research going on at Georgia Tech investigating on-the-fly translation of CUDA to run directly on CPUs and AMD GPUs. This could be very interesting if you want to take advantage of CUDA language features not available in OpenCL yet.)

    Basically, we don't have to operate under Highlander Rules. There can be many solutions to a problem without weakening the community. Standardization is important, but only after you know what the solution ought to look like.
    Reply
  • raddude9 - Tuesday, March 01, 2011 - link

    Wow, CUDA is openly specified.

    That's about as useful as an "open" microsoft word document, or Adobes Flash player, i.e. it's the kind of "openness" that gives "openness" a bad name.

    The fact is that Nvidia is doing its best to use CUDA as a tool to lock people into it's own hardware. That's why I don't trust CUDA, Nvidia are free to make changes in every new version to force you to upgrade, and it will use CUDA as a tool to make money, and supporting users will come second.

    C++ may be open, but once you start using microsoft's proprietary C++ libraries, you are letting yourself in for a world of hurt, I know, I've been there. Years ago Microsoft had promised that it would release MFC 5.0 for the Mac. So people started to upgrade the windows version knowing the mac version was on the way. Did they release it... No. Could a 3rd party port it. No.
    Reply
  • Shining Arcanine - Tuesday, March 01, 2011 - link

    I believe that AMD and Intel are free to implement CUDA support on their own hardware, much like other companies were free to implement FORTRAN. The only thing Nvidia will not do is doing that for them.

    Whether you like it or not, CUDA is the FORTRAN of the GPGPU world. OpenCL is basically ALGOL, which means that aside from some code examples from organizations that do not write production code, no one will use it.
    Reply
  • Shining Arcanine - Tuesday, March 01, 2011 - link

    That phrase should have been "much like how other companies were free to implement FORTRAN". Reply
  • samirsshah - Tuesday, March 01, 2011 - link

    NVIDIA is very strong in mobile but they need to boost their efforts even more, three times more. Yes, PCs are good but the insights that NVIDIA gets designing for PCs may not always give good results for mobile, 'the law of diminishing returns' come to fore as one goes deep and deep into PC based design. A crude analogy is that the insights Intel gets from Core i7 may not always work for Atom. So sometimes you have to say that 'I am going to invert the pyramid' and care for mobile first. Reply
  • ChuckMilic - Wednesday, March 02, 2011 - link

    Dig this: UVM + GPU-Direct 2.0 will allow GTX 590's two processors to share the 3 GB memory though 384-bit busses each.

    This explains all the delays and brings the software and hardware releases together. This totally redefines both compute and graphics capabilities. Imagine e.g. SLI without the need to store the entire image twice in two separate memories. Well worth all the wait!
    Reply
  • IanCutress - Wednesday, March 02, 2011 - link

    Best thing is, I've been reading about multi-GPU programming and host-pinned memory this week. Now I can throw it all out the window(s) with UVM.

    CUDA has been a big boon for my normal work - using an OCed 460, I've got a 4000x speed increase over single thread simulations previously used, and I'm able to probe molecular scales without long, drawn out simulation or scaling. The fact that I don't have to use a driver API also helps quite a bit.

    But I'm a Windows developer, and sometimes trying to get it to work in Visual Studio on a fresh OS install is frustrating. I'd like to see some effort towards that of course.

    Ian
    Reply
  • sallychen - Wednesday, March 02, 2011 - link

    Hey there, I find an amazing web,please click the web ,you can find big pleasantly surprised
    ╭⌒╮WELCOME http://www.busymalls.com
    ----- ~ ¤ ╭⌒╮ ╭⌒╮
    ╭⌒╭⌒╮╭⌒╮~╭⌒╮ HANDBAG 35$
    ,)))),'')~~ ,''~)
    ╱◥█◣ ╱◥█◣ SHOES 35$
    |田|田||田|田| CLOTH 15$
    ╬╬╬╬╬╬╬╬╬╬╬╬╬╬ 2010 NEW

    input this URL:
    (http://www.busymalls.com)
    you can find many cheap and fashion stuff
    jordan air max oakland raiders $30--39;
    Ed Hardy AF JUICY POLO $20;
    Handbags (Coach lv fendi d&g) $30
    T shirts (Polo ,edhardy,lacoste) $15
    Jean(True Religion,edhardy,coogi) $30
    Sunglasses (Oakey,coach,gucci,Armaini) $15
    New era cap $15
    Bikini (Ed hardy,polo) $20
    (http://www.busymalls.com)
    WE ACCEPT PYAPAL PAYMENT
    DELIVERY TO YOU DOOR TO DOOR
    Free Shipping
    Reply
  • (ppshopping) - Wednesday, March 02, 2011 - link

    welcome Reply
  • orionmike - Saturday, March 05, 2011 - link

    Well for me GPU rendering has been great.This release should make it even better.

    If you need the speed then you will by the Hardware to use it.

    See Octane Renderer for the best possible Cuda use.
    Reply
  • lili94 - Wednesday, March 23, 2011 - link

    welcome Reply
  • huran - Sunday, April 03, 2011 - link

    Dear customers, thank you for your support of our company.
    Here, there's good news to tell you: The company recently
    launched a number of new fashion items! ! Fashionable
    and welcome everyone to come buy. If necessary, please
    input :====www.2kuu.com====
    Reply
  • woogitboogity - Friday, November 15, 2013 - link

    Not sure how I feel about this... I am having to adapt a Runge-Kutta based simulation for GPU processing right now on CUDA 5.0 on a Kepler Tesla. I have learned so far that someone really needs a background in hardware, x86-assembly and how they relate to PC architecture to program effectively with GPU's in the first place.

    You want to abstract away as much as possible but treating all threads as being in the same device namespace seems to take it way too far. People who don't much about this sort of thing could end up constantly making design decisions that eliminate most performance gains... decisions that are not easily reversed. It makes about as much sense as adding the filesystem address space to the RAM. One takes nanoseconds and the other takes milliseconds.

    Well, Nvidia has impressed me so far with the documentation and design of CUDA. cuda-gdb is VERY well done in my opinion, because as its name suggests it tries and succeeds in general to be a gdb + other stuff/commands. I have high hopes that whatever they make will follow the "as as simple as possible, but no simpler" axiom (belongs to Einstein, but it definitely applies here).
    Reply

Log in

Don't have an account? Sign up now