Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications
POST A COMMENT

130 Comments

View All Comments

  • Death666Angel - Thursday, November 08, 2012 - link

    When they built this, the Intel stuff wasn't better than AMDs. And now they already have all this hardware which is tuned for the AMD stuff. Wouldn't make sense to update to Intel this time. Reply
  • CeriseCogburn - Saturday, November 10, 2012 - link

    Wouldn't make sense only because nVidia is smoking the daylights out of the barely over 2Pflops from crap AMD cpus, adding multiple DOZENS of Pflops.

    So neither Intel nor AMD can hang.
    Reply
  • wwwcd - Wednesday, October 31, 2012 - link

    Moore's law will be broken...at basement and the above ;) Reply
  • tomek1984 - Wednesday, October 31, 2012 - link

    Thus even four years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer is finally "yes, it can" LOL Reply
  • davegraham - Wednesday, October 31, 2012 - link

    Anand,

    you missed a huge data item in your article. by saying it's "just a bunch of SATA drives" you completely glossed over the WAY those SATA drives are organized (by DDN). DDN uses a wide/shallow bus topology to keep parallel writes going to the drives organized and processed in a VERY optimal manner. consequently, they're able to ingest at over 6GB/s per head...now, multiply that across the requirements from ORNL and you can see why this becomes important.

    next time, don't just skip over it. ;)

    D
    Reply
  • webmastir - Wednesday, October 31, 2012 - link

    9 million $ a year to power it? Yikes.

    Either way, I'm glad to have this in my state :)
    Reply
  • mikato - Wednesday, October 31, 2012 - link

    Lots of hydroelectric in Tennessee :) Reply
  • bill.rookard - Wednesday, October 31, 2012 - link

    I just can't imagine trying to build a program that scales up to that kind of performance, it's just staggering.

    That being said, I have this little program I'd like to run on it... called... SkyNet....
    Reply
  • mfenn - Wednesday, October 31, 2012 - link

    I want more coverage of big iron! Hope you talk about it in depth on the podcast as well. Reply
  • harezzebra - Wednesday, October 31, 2012 - link

    Hi Anand,

    please do a indepth virtualization review, as you did earlier. your review is must for latest virtualization offerings from vmware, microsoft and citrix for unbiased decision making.

    regards
    harsh
    Reply

Log in

Don't have an account? Sign up now