Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications


View All Comments

  • just4U - Wednesday, October 31, 2012 - link

    "The evolution of Cray's XT/XK lines simply stemmed from that point, with Opteron being the supported CPU of choice."


    I would have liked more of explaination here.. Does that mean that Intel's line doesn't work as well? Are there plans by Cray to move to Intel?

    Power draw must be key. I wonder what sort of power use they'd be looking at running Intel's proccessors.

    Great to see AMD in that super computer though.. I just have questions about future plans based on the current situation in the cpu market.
  • Th-z - Wednesday, October 31, 2012 - link

    Very nice article and love your last paragraph, Anand. It's a revelation. It is indeed incredible to think when we wanted that 3D accelerator to play GLQuake, it actually turned the wheel for great things to come. To think back, something as ordinary or insignificant as gaming actually paved the way to accelerate our knowledge today. This goes to show even ordinary things can morph into great things that one can never imagine. It really humbles you to not look down anything, to be respectful in this intertwined world, the same way it humbles us as human beings as we know more about the universe. Reply
  • pman6 - Wednesday, October 31, 2012 - link

    so that's where all of AMD's revenue came from.

    I was wondering who was buying AMD products
  • CeriseCogburn - Saturday, November 10, 2012 - link

    What amd revenue ?

    Just look up and down, left and right here, the amd fanboys are legion - granted they can barely bone up 10 cents a week, but after a few years they can buy 2 generations back.
  • lorribot - Wednesday, October 31, 2012 - link

    Wonder if PC game piracy will be blamed for the failure of the supercomputer industry? Reply
  • Braincruser - Saturday, November 03, 2012 - link

    Well, you see the more someone pirates games, the more money he has to invest in hardware. So the better the hardware gets. <- nothing beats simple logic. Reply
  • ClagMaster - Wednesday, October 31, 2012 - link

    I have been working with supercomputers for 25 years.

    Although parallelism is very important for processing large models, there is one important feature Mr Anand failed to discuss about Titain, choosing instead to obscess about transistor count and CPU's and GPU's.

    And that is how much memory per box is available. 96GB? 256GB? of DDR3-1333 memory?

    Problem is usually memory for those large reactor or coupled neutron-gamma transport problems analyzed with Monte Carlo or Advanced Discrete Ordinates, not the number of processors. Need lots of memory for the geometry, depleteable materials, and cross-section data.

    And once the computing is done, how much space is available for storing the results? I have seen models so large that they run for 2 weeks with over 2000 processors only to fail because the file storage system ran out of space to store the output files.
  • garadante - Wednesday, October 31, 2012 - link

    You failed to read the entire article. Anand stated there was something like 32 GB of RAM per CPU and 6 GB per GPU (if I remember correctly, going off the top of my head) for a grand total of 710 TB RAM total as well as 1 PB of HDD storage available. Check back through the pages to find what exactly he posted. Reply
  • chemist1 - Wednesday, October 31, 2012 - link

    So Sandy Bridge does ~160 GFlops on the LINPACK benchmark, while Titan should do ~20 PFlops, making it 125K times faster. 125K ~ 2^17, so with 17 doublings a PC will be as fast as Titan. If we assume 1.5 years/doubling, that gives us 25 years. And just imagine the capabilities of a 2037 supercomputer.... Reply
  • pandemonium - Wednesday, October 31, 2012 - link

    What a treat, for you, to be able to witness this. Thanks for the adventurous article, Anand! :) Reply

Log in

Don't have an account? Sign up now