Tesla, CUDA, and the Future

We haven't been super excited about the applicability of CUDA on the desktop. Sure, NVIDIA has made wondrous promises and bold claims, but the applications for end users just aren't there yet (and the ones that are are rather limited in scope and applicability). But the same has not been true for CUDA in the workstation and HPC markets.

Tesla, NVIDIA's workstation level GPU computing version of its graphics cards (it has no display output and is spec'd a bit differently) has been around for a while, but we are seeing more momentum in that area lately. As of yesterday, NVIDIA has announced partnerships with Dell, Lenovo, Penguin Computing and others to bring desktop boxes featuring 4 way Tesla action. These 4-Tesla desktop systems, called Tesla Personal Supercomputers, will cost less than $10k US. This is an important number to come in under (says NVIDIA) because this is below the limit for discretionary spending at many major universities. Rather than needing to follow in the footsteps of Harvard, MIT, UIUC, and others who have built their own GPU computing boxes and clusters, universities and businesses can now trust in a reliable computing vendor to deliver and support the required hardware.

We don't have any solid specs on the new boxes yet. Different vendors may do things slightly differently and we aren't sure if NVIDIA is pushing for a heavily standardized box or will give these guys complete flexibility. But regardless of the rest of the box, the Tesla cards themselves are the same cards that have been available since earlier this year.

These personal supercomputers aren't going to end up in homes anytime soon, as they are squarely targeted at workstation and higher level computing. But that doesn't mean this development won't have an impact on the end user. By targeting universities through the retail support of their new partners in this effort, NVIDIA is making it much more attractive (and possible) for universities to teach GPU computing and massively parallel programming using their hardware. Getting CUDA into the minds of future developers will go a long way, not just for the HPC market, but for every market touched by these future graduates.

It's also much easier for an engineer to sell a PHB on picking up "that new Dell system" rather than a laundry list of expensive components to be built and supported either by IT staff or by the engineer himself. Making in roads into industry (no matter the industry) will start getting parts moving, expose more developers to the CUDA environment, and create demand for more CUDA developers. This will also help gently nudge students and universities towards CUDA, and even if the initial target is HPC research and engineering, increased availability of hardware and programs will attract students who are interested in applying the knowledge to other areas.

It's all about indoctrination really. Having a good product or a good API does nothing without having developers and support. The more people NVIDIA can convince that CUDA is the greatest thing since sliced bread, the closer to the greatest thing since sliced bread CUDA will become (in the lab and on the desktop). Yes, they've still got a long long way to go, but the announcement of partners in providing Tesla Personal Supercomputer systems is a major development and not something the industry (and especially AMD) should under appreciate.

Driver Performance Improvements Final Words
Comments Locked

63 Comments

View All Comments

  • Finally - Friday, November 21, 2008 - link

    I could have taken you seriously if it wasn't for your child-like pronounciation of that green firm's name.

    Do you also write about "Micro$oft"?
  • Paratus - Thursday, November 20, 2008 - link

    I always see both camps complaining about the state of each companies drivers.

    IMHO I'll take AMDs bad drivers every month instead of NVs bad drivers every whenever they decide to release them.

    Sorry
  • ggathagan - Thursday, November 20, 2008 - link

    We would like to have seen the performance gains NVIDIA talked about. While we don't doubt that they are in there, it is likely we just didn't look at the right settings or hardware.

    If NVIDIA claims "Up to 38% performance increase in Far Cry 2", they should be able to tell you the exact circumstances where that 38% increase can be seen. If it's reproducible, great. If not, they're lying and should be called on it.

    As for PhysX: I'm all for realizing its potential, but Mirror's Edge strikes me as having PhysX simply for the sake of having Physx.
    Granted, it's just a trailer, but I wasn't that impressed with the look of the game. It looked as if they spent their time on the Physx and ignored the character modeling. The arm/body movement looks rather bizarre.
  • Kode - Thursday, November 20, 2008 - link

    Although I agree that some ATI/AMD driver updates aren't that good, the good thing about a monthly release is that when you have a small bug/glitch in a certain game, this can be updated in a month. If you have the same thing on a NVIDIA card, you don't know when to expect a new driver, and so you are stuck with it untill the next driverrelease unless they release a hotfix or perhaps beta. But installing hotfixes/beta's isn't done often by regular people.
  • Casper42 - Thursday, November 20, 2008 - link

    Title says it all. Driver enhancements and TELSA are great and all, but where are the darn die shrinks?

    I was really hoping nVidia would have their stuff together and have released the GTX 279/290 or whatever they decide to call the 55nm parts when Intel released the i7 processors. When gamers are blowing $1000+ on a new Board/Chip/RAM, whats another $600 for that top of the line nVidia card?

    After all, wasnt the point of allowing SLI on x58 to sell more cards?
  • Casper42 - Thursday, November 20, 2008 - link

    The HPC Market seems to be going more and more toward Blade servers these days as you can cram an awful lot of computer power into a 10U space with hardware from 2 or 3 different vendors.

    I am curious if nVidia is working with HP or Dell or IBM on making a special Blade version of their TESLA cards. The expansion cards in the HP c series are very small which may prohibit TESLA from physically even fitting into the Blade server. BUT, they also have a way of channelling PCI Express lanes into an adjacent blade slot (for instance, to support their "Storage Blade") so if TESLA won't fit inside the blade itself, why not put together a TESLA blade that contains 2/3/4 Cards and connects to the adjacent blade server.

    This would allow you (for instance) to take an HP c7000 chassis and put 8 BL460c Blades with up to 2 Xeon 54xx chips, 64GB of RAM (assuming 8GB DIMMs), and then have 2-4 TESLA cards attached to each, and cram all that into a 10U space. At a minimum that would be 16 Processors, 256GB of RAM (32GB/node) and 16 TESLA Cards.

    You even get your choice of 10GB Ethernet or Infiniband to connect all the nodes.
  • Spoelie - Thursday, November 20, 2008 - link

    This is the first time I've seen someone complain about AMD's driver mantra.

    AMD provides a constant evolution in their drivers, it's the users choice to update the driver or not. You can not fault them for providing lots of updates. Their readme is also very clear and concise in what is fixed and what is not.

    The possible sacrifices do not outweigh the advantages IMO. That comment was a bit of a potshot
  • kilkennycat - Thursday, November 20, 2008 - link

    For at least the last 5 years, ATi's drivers have periodically had the spotty reputation that the next update fixes a bunch of problems with the latest games, but then has newly introduced brand-new problems with earlier "legacy" games. Seemed as if they rushed QC, with only a handful of the latest titles. And for an obvious reason.... the burden of a monthly release cycle is no help in enabling thorough QC at all !!! Much better if the offical releases were at least 3 months apart, with beta updates for the "brave" to try out. The 'next driver breaks something not previously broken' problem was particularly bad when ATi transititioned their architecture with the introduction of the X1800 series. Recently, this ATi legacy problem has got much, much better, but they seem to have slid backwards recently.
  • DerekWilson - Thursday, November 20, 2008 - link

    We have complained about AMD's driver development issues in the past. But we always try and keep it as fair and neutral as possible.

    If all things were equal, I would agree that "you can not fault them for providing lots of updates" ... but that is not what they do.

    NVIDIA regression tests with hundreds of games for every driver release. In fact, comprehensive regression testing was one of the major reasons NVIDIA acquired 3dfx back in the day.

    AMD only regression tests with 25 games. These 25 games change with driver versions so that over time they'll cover many games. The problem is that this doesn't work well. for example ...

    Let's say some x.y driver is regression tested with ... let's pick bioshock. The next month, bioshock falls off the list and x.(y+1) breaks crossfire with bioshock. crossfire isn't as popular as single card performance so there aren't as many users to complain and it will either take them adding bioshock back to their regression test list (which could be never or 6 months or a year), or a large hardware review site will need to go test it an publish an article on how broken it is only to get a hotfix driver 2 days later that fixes the issue.

    that happened by the way. and not only with bioshock. it has happened with other games as well, and most of the time it is an issue that affects crossfire. sometimes its other bugs, but multi-GPU support is the thing that seems to be at highest risk in our experience.

    this is not an infrequent problem.

    and lets say you find a bug in the recently released 8.11 -- no lets say AMD finds a bug in 8.11 ... It will not be fixed until at least 9.1 as they can't push 8.12 back to include more fixes. until then, if its a big name title that has a fix, AMD will put out a hotfix. But then you've got to use a non-WHQL version of 8.11 for upwards of two months, even if there are features in 8.12 you want/need.

    We are currently in a situation where we have to stick with an 8.10 + hotfix until 8.12 comes out.

    I am very conservative in my articles about mentioning problems with driver teams. Driver work is tough, and reviewers tend to hit many more problems than the average gamer. We test much more software on a wide variety of hardware and are more prone to running into issues. While the problems do exist for end users, it's always just a subset of users at a time. It has to be that way to some extent no matter what (there will always be tradeoffs made), but AMDs trade offs do impact us quite a bit. And I also feel like they cut too many corners and make too many tradeoffs to the point where it negatively impacts too many end users. If we hit more problems with one vendor than another, that is a very relevant bit of information for every consumer. Even if it isn't of the same magnitude it is for us, it's still an issue.

    Thus, I am aware that my view of AMD driver development will be more negative than most users out there. But it does still negatively impact end users in a bigger way than NVIDIA's approach in general (though NVIDIA's execution isn't always spot on either).

    Here's the best way I can put it.

    If you find an AMD driver that works, stick with it. Don't change drivers unless something is broken that got fixed that you need. Upgrading when not necessary will likely break something else that you might find you needed.

    On the contrary, I would never recommend against upgrading to an NVIDIA WHQL driver. They are much better about not breaking things that have previously been fixed and are much more hardened by the extensive regression testing. All the fixes that go into one driver (beta or WHQL) will be included in the next beta or WHQL driver, unlike with AMD and their multiple trunk or overlapping branch system or whatever you want to call it.

    There are simply few to no real advantages (other than for marketing purposes) with AMD's driver development approach, so if there are negatives at all they've already outweighed everything else.
  • JonnyDough - Friday, November 21, 2008 - link

    Care to explain to me what happened to Neverwinter Nights 2 and Nvidia then? It doesn't work.

Log in

Don't have an account? Sign up now