Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications
POST A COMMENT

130 Comments

View All Comments

  • Krysto - Wednesday, October 31, 2012 - link

    It's 46 million GPU cores:

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    This is embarrassing.
    Reply
  • iMacmatician - Wednesday, October 31, 2012 - link

    No, it's not. BSN's numbers are incorrect. Reply
  • MetalManTN - Wednesday, October 31, 2012 - link

    I've read articles on anandtech for years, but I register an account for the first time today to comment on how wonderful this article is. The scope of what is covered in the article is nothing short of fascinating, and the quality of the writing and attention to detail is superb. Thank you! Reply
  • Creig - Wednesday, October 31, 2012 - link

    and we already know the answer is 42. Reply
  • spaghetti_taco - Wednesday, October 31, 2012 - link

    Very interesting article, loved the 30,000 foot explanation of the supernova modeling, really helped me to understand in more concrete detail what types of things scientists are using these supercomputers for.

    One thing I'd love to see is more in depth discussion of the networking. As you pointed out, the networking connectivity is just as important as the data processing, but you really just glossed over it. At least something as simple as vendor, models, host bus adapters, etc.
    Reply
  • WaitingForNehalem - Wednesday, October 31, 2012 - link

    Anand, you should have visited me at the University of Tennessee! Reply
  • zero2dash - Wednesday, October 31, 2012 - link

    I want to work there. :D
    Holy Santa Claus shit. I think I'd need a motorized cart for the tour; I'd be too weak to walk.
    [drool]

    Great article and major kudos for all the photos...a few of these are gonna be desktop wallpapers. ;)
    Reply
  • BSMonitor - Wednesday, October 31, 2012 - link

    If they used Sandy Bridge Xeon's, that'd be about 4 Mega Watts and no giant pipes with coolant!! Reply
  • Braincruser - Wednesday, October 31, 2012 - link

    And new motherboards, memory systems, optimisations... practically they would have to exclude the GPU's to fit it in any realistic spending. Reply
  • JMC2000 - Wednesday, October 31, 2012 - link

    Problem is, there more than likely was a queue of clients that couldn't wait for ORNL/Cray to completely replace every node, which would have taken much longer. Reply

Log in

Don't have an account? Sign up now