Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications


View All Comments

  • mdlam - Wednesday, October 31, 2012 - link

    Will it run Crysis? Reply
  • tspacie - Wednesday, October 31, 2012 - link

    Did you get any information about the network (yarc-2 , gemini) ? Cray's claim to fame has been their network architecture which is supposed to be a key contributor to the actual performance of the supercomputer. Reply
  • thebluephoenix - Wednesday, October 31, 2012 - link

    They should have used Radeons 7970. You can buy 6 for the price of one K20, no ECC though (and for that is Fire Pro S). Reply
  • HighTech4US - Wednesday, October 31, 2012 - link

    Toy GPUs have no place in HPC Computers. Reply
  • thebluephoenix - Wednesday, October 31, 2012 - link

    1TFLOPS Double precision Toy?
  • garadante - Wednesday, October 31, 2012 - link

    You missed the point in the article saying ECC memory was a -must- for a usage scenario like this. With nearly 20,000 GPUs, and all of that information being continuously communicated between the GPU memory and the GPU itself, without ECC, errors would pop up very quickly, and would make useful computation nigh impossible. Reply
  • HighTech4US - Thursday, November 01, 2012 - link

    Can you guaranty that the Toy GPU you recommend would not produce a single error on a software run that takes 6 months?

    You may accept an occasional graphics glitch while gaming but no HPC customer will.
  • RussianSensation - Wednesday, October 31, 2012 - link

    It's also about the specific software that works better with CUDA. GCN GPUs are no toys but the software support is nowhere near as prevalent in the professional GPGPU space compared to what NV has accomplished. This makes a lot of sense since NV essentially invented the GPGPU space starting with G80 in 2006. They spent a lot more money creating the CUDA eco-system and making sure they were the pioneers in this space. Given the higher widespread adoption of CUDA and proven track record of working with NV, larger companies are far more likely to go with Nvidia.

    This is actually no different than what we saw in the Distributed Computing space. For more than half a decade, NV's GPUs were faster in many apps. As the DC community is more dynamic and adopts much quicker to moder code and technologies, in the last 3 years, almost all of the new DC projects are dominated by AMD GPUs.

    On paper, HD7970 GE delivers 1.075 TFlops of DP and an 1200mhz 7970 has 1.23 Tflops. Without software support, for now it doesn't mean much in the professional space but the horsepower is already there.
  • mikato - Wednesday, October 31, 2012 - link

    Is this the supercomputer that will also be crunching away on the massive amount of data NSA is storing on everyone from strategic points in the telecom backbone?
  • Luscious - Wednesday, October 31, 2012 - link

    I'm curious if they ever went near F@H during burn-in and testing to see how much PPD that supercomputer could do. Reply

Log in

Don't have an account? Sign up now