Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications
Comments Locked

130 Comments

View All Comments

  • martixy - Thursday, November 1, 2012 - link

    Thank you for this article! It was absolutely awesome to read through it and a nice break from the usual consumer stuff.
    Faith in humanity restored... :)
  • bigboxes - Thursday, November 1, 2012 - link

    I want to see the Performance tab on Windows Task Manager! :o
  • Abi Dalzim - Thursday, November 1, 2012 - link

    We all know the answer is 42.
  • easp - Thursday, November 1, 2012 - link

    For all the people speculating or suggesting that they should have used AMD GPUs or Intel CPUs, I think you need to think more like engineers, and less like "cowboys."

    To get started, reread this:

    "By adding support for ECC, enabling C++ and easier Visual Studio integration, NVIDIA believes that Fermi will open its Tesla business up to a group of clients that would previously not so much as speak to NVIDIA. ECC is the killer feature there."

    Now, why on earth would ECC memory on a GPU (which, apparently, AMD wasn't offering) be important? The answer is simple: because a supercomputer that doesn't produce trustworthy results is worse than useless. Shaving some money off the power and cooling budget, or even a 50% boost to raw performance and/or price performance doesn't really matter if the results of calculations that take weeks or months to run can't be trusted.

    Since this machine gets much of its compute performance from GPU operations, it is essential that it use GPUs that support ECC memory to allow both detection and recovery from memory corruption.

    As to the CPUs, I'm not suggesting that Intel CPUs are significantly less computationally sound than AMDs, but Cray and ORNL already have extensive experience with AMDs CPUs and supporting hardware. Switching to Intel would almost certainly require additional validation work.

    And don't underestimate the effort that goes into validating or optimizing these systems. Street price on the raw components alone has to be tens of millions of dollars. You can bet there is a lot of time and effort spent making sure things work right before things make it to full-scale production use.

    I know a guy, PhD in Mathematics who used to work for Cray. These days, he's working for Boeing, where his full-time-job, as best as I can understand it, is to make sure that some CFD code they run from NASA is used properly so the results can be trusted. When he worked at Cray, his job was much more technical, he hand-optimized the assembly code for critical portions of application code from Cray's clients so it ran optimally on their vector CPU architecture. When doing computation at this scale things that are completely insignificant on individual consumer systems, or even enterprise servers, can be hugely important.
  • CeriseCogburn - Monday, November 5, 2012 - link


    I note, with 225,000 plus AMD cpu's , they get barely over 2 petaflops.

    Add just 18,000 plus nVidia video cards, and ACHIEVE 20+ PETAFLOPS.

    LOL - once again, amd sucks, and nVidia does not.
  • Azethoth - Friday, November 2, 2012 - link

    So you are sitting at home playing monopoly on your iMac?
  • 2kfire - Friday, November 2, 2012 - link

    Can someone ban this joker?
  • Daggarhawk - Friday, November 2, 2012 - link

    Anand I LOVE this post. Breath of fresh air to get to see some of the real world applications for all this awesome tech we love. The interviews with scientists are especially fascinating and eye opening. Love the use of video to hear the insights, affect and passion of the researchers and see them at work. Please more of this sort of thing!!
  • armandc001-tech lover - Saturday, November 3, 2012 - link

    dammm what an article....!
  • philosofa - Saturday, November 3, 2012 - link

    Thank you Anand!

    I've been noting till I'm blue in the face that GK-110 formed Nvidia's backup plan, should the GCN/Kepler power ratio not have worked out as much to AMD's disadvantage as it did (presumably 'Big Fermi' was a similar action plan being enacted).

    It's not something I've seen anyone else say explicitly, so it's (confirmation bias aside) just lovely to hear that's your take too :)

Log in

Don't have an account? Sign up now