Final Words

At a high level, the Titan supercomputer delivers an order of magnitude increase in performance over the outgoing Jaguar system at roughly the same energy price. Using over 200,000 AMD Opteron cores, Jaguar could deliver roughly 2.3 petaflops of performance at around 7MW of power consumption. Titan approaches 300,000 AMD Opteron cores but adds nearly 19,000 NVIDIA K20 GPUs, delivering over 20 petaflops of performance at "only" 9MW. The question remains: how can it be done again?

In 4 years, Titan will be obsolete and another set of upgrades will have to happen to increase performance in the same power envelope. By 2016 ORNL hopes to be able to build a supercomputer capable of 10x the performance of Titan but within a similar power envelope. The trick is, you don't get the performance efficiency from first adopting GPUs for compute a second time. ORNL will have to rely on process node shrinks and improvements in architectural efficiency, on both CPU and GPU fronts, to deliver the next 10x performance increase. Over the next few years we'll see more integration between the CPU and GPU with an on-die communication fabric. The march towards integration will help improve usable performance in supercomputers just as it will in client machines.

Increasing performance by 10x in 4 years doesn't seem so far fetched, but breaking the 1 Exaflop barrier by 2020 - 2022 will require something much more exotic. One possibility is to move from big beefy x86 CPU cores to billions of simpler cores. Given ORNL's close relationship with NVIDIA, it's likely that the smartphone core approach is being advocated internally. Everyone involved has differing definitions of what is a simple core (by 2020 Haswell will look pretty darn simple), but it's clear that whatever comes after Titan's replacement won't just look like a bigger, faster Titan. There will have to be more fundamental shifts in order to increase performance by 2 orders of magnitude over the next decade. Luckily there are many research projects that have yet to come to fruition. Die stacking and silicon photonics both come to mind, even though we'll need more than just that.

It's incredible to think that the most recent increase in supercomputer performance has its roots in PC gaming. These multi-billion transistor GPUs first came about to improve performance and visual fidelity in 3D games. The first consumer GPUs were built to better simulate reality so we could have more realistic games. It's not too surprising then to think that in the research space the same demands apply, although in pursuit of a different goal: to create realistic models of the world and universe around us. It's honestly one of the best uses of compute that I've ever seen.

Applying for Time on Titan & Supercomputing Applications
POST A COMMENT

130 Comments

View All Comments

  • galaxyranger - Sunday, November 04, 2012 - link

    I am not intelligent in any way but I enjoy reading the articles on this site a great deal. It's probably my favorite site.

    What I would like to know is how does Titan compare in power to the CPU that was at the center of the star ship Voyager?

    Also, surely a supercomputer like Titan is powerful enough to become self aware, if it had the right software made for it?
    Reply
  • Hethos - Tuesday, November 06, 2012 - link

    For your second question, if it has the right software then any high-end consumer desktop PC could become self-aware. It would work rather sluggishly, compared to some sci-fi AIs like those in the Halo universe, but would potentially start learning and teaching itself. Reply
  • Daggarhawk - Tuesday, November 06, 2012 - link

    Hethos that is not by any stretch certain. Since "self awareness" or "consciousness" has never been engineered or simulated, it is still quite uncertain what the specific requirements would be to produce it. Yet here you're not only postulating that all it would take would be the right OS but also how well it would perform. My guess is that Titan would much sooner be able to simulate a brain (and therefore be able to learn, think, dream, and do all the things that brains do) much sooner than it would /become/ "a brain" It look a 128 core computer a 10hr run render a few-minute simulation of a complete single celled organism . Hard to say how much more compute power it would take to fully simulate a brain and be able to interact with it in real time. as for other methods of AI, it may take totally different kinds of hardware and networking all together. Reply
  • quirksNquarks - Sunday, November 04, 2012 - link

    Thank You,

    this was a perfectly timed article - as people have forgotten why it is important the Technology keeps pushing boundaries regardless of *daily use* stagnation.

    Also is a great example of why AMD does offer 16-core Chips. For These Kinds of Reasons! More Cores on One Chip means Less Chips are needed to be implemented - powered - tested - maintained.

    an AMD 4 socket Mobo offers 64 cores. A personal Supercomputer. (Just think of how many they'll stuff full of ARM cores).

    why Nvidia GPUs ?
    a) Error Correction Code
    b) CUDA

    as to the CPUs...

    http://www.newegg.ca/Product/Product.aspx?Item=N82...
    $599 for every AMD 6274 chip (obvi they don't pay as much when ordering 300k).

    vs

    http://www.newegg.ca/Product/Product.aspx?Item=N82...
    $1329 for an Intel Sandy Bridge equivalent which isn't really an equivalent considering these do NOT run in 4 socket designs. (obvi a little less when ordering in bulk numbers).

    now multiple that price difference (the ratio) in the order of 10's of THOUSANDS!!

    COMMON SENSE people.... Less Money for MORE CORES - or - More Money for LESS CORES ?
    which road would YOU take? if you were footing the $ Bill.

    but the Biggest thing to consider...

    ORNL Upgraded from Jaguar <> Titan - which meant they ONLY needed a CHIP upgrade in that regards (( SAME SOCKET )) .. TRY THAT WITH INTEL > :P
    Reply
  • phoenicyan - Monday, November 05, 2012 - link

    I'd like to see description of logical architecture. I guess it could be 16x16x73 3D Torus. Reply
  • XyaThir - Saturday, November 10, 2012 - link

    Nice article, too bad there is nothing about the storage in this HPC cluster! Reply
  • logain7997 - Tuesday, November 13, 2012 - link

    Imagine the PPD this baby could produce folding. 0.0 Reply
  • hyperblaster - Tuesday, December 04, 2012 - link

    In addition to the bit about ECC, nVidia really made headway over AMD primarily because of CUDA. nVidia specially targeted a whole bunch of developers of popular academic software and loaned out free engineers. Experienced devs from nVidia would actually do most of the legwork to port MPI code to CUDA, while AMD did nothing of the sort. Therefore, there is now a large body of well-optimized computational simulation software that supports CUDA (and not OpenCL). However, this is slowly changing and OpenCL is catching on. Reply
  • Jag128 - Tuesday, January 15, 2013 - link

    I wonder if it could play crysis on full? Reply
  • mikbe - Friday, June 28, 2013 - link

    I was actually surprise at how many actual times the word "actually" was actually used. Actually, the way it's actually used in this actual article it's actually meaningless and can actually be dropped, actually, most of the actual time. Reply

Log in

Don't have an account? Sign up now