Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications
Comments Locked

130 Comments

View All Comments

  • Ryan Smith - Wednesday, October 31, 2012 - link

    We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it.
  • RussianSensation - Wednesday, October 31, 2012 - link

    Hey Ryan, what about this?

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
    46,645,248 CUDA Cores (yes, that's 46 million)
    299,008 x86 cores
    91.25 TB ECC GDDR5 memory
    584 TB Registered ECC DDR3 memory
    Each x86 core has 2GB of memory

    1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

    The Titan supercompute has 18,688 nodes.

    46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card.
  • Ryan Smith - Thursday, November 1, 2012 - link

    Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

    http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

    "The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

    18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

    (Press agencies are bad about using power of 10, hence "710" TB).
  • Ryan Smith - Thursday, November 1, 2012 - link

    The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T...
  • RussianSensation - Wednesday, October 31, 2012 - link

    Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
    http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

    46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688.
  • ypsylon - Wednesday, October 31, 2012 - link

    Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

    @Guspaz

    If we get that kind of performance in phones then it is really scary prospect. :D
  • twotwotwo - Wednesday, October 31, 2012 - link

    We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

    *Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

    I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

    I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical.
  • JAH - Wednesday, October 31, 2012 - link

    Must've been a fun trip for a geek/nerd. I'm jealous!

    Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation?
  • silverblue - Wednesday, October 31, 2012 - link

    I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

    Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch.
  • silverblue - Wednesday, October 31, 2012 - link

    I WAS wondering, rather. Too early for posting, it seems.

Log in

Don't have an account? Sign up now