Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs

Name: Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs
Item: Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs
Author: Anand Lal Shimpi

by Anand Lal Shimpi on October 31, 2012 1:28 AM EST

130 Comments | Add A Comment

130 Comments

Applying for Time on Titan

The point of building supercomputers like Titan is to give scientists and researchers access to hardware they wouldn't otherwise have. In order to actually book time on Titan, you have to apply for it through a proposal process.

There's an annual call for proposals, based on which time on Titan will be allocated. The machine is available to anyone who wants to use it, although the problem you're trying to solve needs to be approved by Oak Ridge.

If you want to get time on Titan you write a proposal through a program called Incite. In the proposal you ask to use either Titan or the supercomputer at Argonne National Lab (or both). You also outline the problem you're trying to solve and why it's important. Researchers have to describe their process and algorithms as well as their readiness to use such a monster machine. Any program will run on a simple computer, but to need a supercomputer with hundreds of thousands of cores the requirements are very strict. As a part of the proposal process you'll have to show that you've already run your code on machines that are smaller, but similar in nature (e.g. 1/3 the scale of Titan).

Your proposal would then be reviewed twice - once for computational readiness (can it run on Titan) and once for scientific peer review. The review boards rank all of the proposals received, and based on those rankings time is awarded on the supercomputers.

The number of requests outweighs the available compute time by around 3x. The proposal process is thus highly competitive. The call for proposals goes out once a year in April, with proposals due in by the end of June. Time on the supercomputers is awarded at the end of October with the accounts going live on the first of January. Proposals can be for 1 - 3 years, although the multiyear proposals need to renew each year (proving the time has been useful, sharing results, etc...).

Programs that run on Titan are typically required to run on at least 1/5 of the machine. There are smaller supercomputers available that can be used for less demanding tasks. Given how competitive the proposal process is, ORNL wants to ensure that those using Titan actually have a need for it.

Once time is booked, jobs are scheduled in batch and researchers get their results whenever their turn comes up.

The end user costs for using Titan depend on what you're going to do with the data. If you're a research scientist and will publish your findings, the time is awarded free of charge. All ORNL asks is that you provide quarterly updates and that you credit the lab and the Department of Energy for providing the resource.

If, on the other hand, you're a private company wanting to do proprietary work you have to pay for your time on the machine. On Jaguar the rate was $0.05 per core hour, although with Titan ORNL will be moving to a node-hour billing rate since the addition of GPUs throws a wrench in the whole core-hour billing increment.

Supercomputing Applications

In the gaming space we use additional compute to model more accurate physics and graphics. In supercomputing, the situation isn't very different. Many of ORNL's supercomputing projects model the physically untestable (either for scale or safety reasons). Instead of getting greater accuracy for the impact of an explosion on an enemy, the types of workloads run at ORNL use advances in compute to better model the atmosphere, a nuclear reactor or a decaying star.

I never really had a good idea of specifically what sort of research was done on supercomputers. Luckily I had the opportunity to sit down with Dr. Bronson Messer, an astrophysicist looking forward to spending some time on Titan. Dr. Messer's work focuses specifically on stellar decay, or what happens immediately following a supernova. His work is particularly important as many of the elements we take for granted weren't present in the early universe. Understanding supernova explosions gives us unique insight into where we came from.

For Dr. Messer's studies, there's a lot of CUDA Fortran that's used although the total amount of code that runs on GPUs is pretty small. There may be 20K - 1M lines of code, but in that complex codebase you're only looking at tens of lines of CUDA code for GPU acceleration. There are huge speedups from porting those small segments of code to run on GPUs (much of that code is small because it's contained within a loop that gets pushed out in parallel to GPUs vs. executing serially). Dr. Messer tells me that the actual process of porting his code to CUDA isn't all that difficult, after all there aren't that many lines to worry about, but it's changing all of the data around to make the code more GPU friendly that is time intensive. It's also easy to screw up. Interestingly enough, in making his code more GPU friendly a lot of the changes actually improved CPU performance as well thanks to improved cache locality. Dr. Messer saw a 2x improvement in his CPU code simply by making data structures more GPU friendly.

Many of the applications that will run on Titan are similar in nature to Dr. Messer's work. At ORNL what the researchers really care about are covers of Nature and Science. There are researchers focused on how different types of fuels combust at a molecular level. I met another group of folks focused on extracting more efficiency out of nuclear reactors. These are all extremely complex problems that can't easily be experimented on (e.g. hey let's just try not replacing uranium rods for a little while longer and see what happens to our nuclear reactor). Scientists at ORNL and around the world working on Titan are fundamentally looking to model reality, as accurately as possible, so that they can experiment on it. If you think about simulating every quark, atom, molecule in whatever system you're trying to model (e.g. fuel in a combustion engine), there's a ton of data that you have to keep track of. You have to look at how each one of these elementary constituents impacts one another when exposed to whatever is happening in the system at the time. It's these large scale problems that are fundamentally driving supercomputer performance forward, and there's simply no letting up. Even at two orders of magnitude better performance than what Titan can deliver with ~300K CPU cores and 50M+ GPU cores, there's not enough compute power to simulate most of the applications that run on Titan in their entirety. Researchers are still limited by the systems they run on and thus have to limit the scope of their simulations. Maybe they only look at one slice of a star, or one slice of the Earth's atmosphere and work on simulating that fraction of the whole. Go too narrow and you'll lose important understanding of the system as a whole. Go too broad and you'll lose fidelity that helps give you accurate results.

Given infinite time you'll be able to run anything regardless of hardware, but for researchers (who happen to be human) time isn't infinite. Having faster hardware can help shorten run times to more manageable amounts. For example, reducing a 6 month runtime (which isn't unheard of for many of these projects) to something that can execute to completion in a single month can have a dramatic impact on productivity. Dr. Messer put it best when told me that keeping human beings engaged for a month is a much different proposition than keeping human beings engaged for half a year.

There are other types of applications that will run on Titan without the need for enormous runtimes, instead they need lots of repetitions. Doing hurricane simulation is one of those types of problems. ORNL was in between generations of supercomputers at one point and donated some compute time to the National Tornado Center in Oklahoma during that transition. During the time they had access to the ORNL supercomputer, their forecasts improved tremendously.

ORNL also has a neat visualization room where you can plot, in 3D, the output from work you've run on Titan. The problem with running workloads on a supercomputer is the output can be terabytes of data - which tends to be difficult to analyze in a spreadsheet. Through 3D visualization you're able to get a better idea of general trends. It's similar to the motivation behind us making lots of bar charts in our reviews vs. just publishing a giant spreadsheet, but on a much, much, much larger scale.

The image above is actually showing some data run on Titan simulating a pressurized water nuclear reactor. The video below explains a bit more about the data and what it means.

Physical Architecture, OS, Power & Cooling Final Words

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

130 Comments

View All Comments

galaxyranger - Sunday, November 4, 2012 - link
I am not intelligent in any way but I enjoy reading the articles on this site a great deal. It's probably my favorite site.

What I would like to know is how does Titan compare in power to the CPU that was at the center of the star ship Voyager?

Also, surely a supercomputer like Titan is powerful enough to become self aware, if it had the right software made for it?
Hethos - Tuesday, November 6, 2012 - link
For your second question, if it has the right software then any high-end consumer desktop PC could become self-aware. It would work rather sluggishly, compared to some sci-fi AIs like those in the Halo universe, but would potentially start learning and teaching itself.
Daggarhawk - Tuesday, November 6, 2012 - link
Hethos that is not by any stretch certain. Since "self awareness" or "consciousness" has never been engineered or simulated, it is still quite uncertain what the specific requirements would be to produce it. Yet here you're not only postulating that all it would take would be the right OS but also how well it would perform. My guess is that Titan would much sooner be able to simulate a brain (and therefore be able to learn, think, dream, and do all the things that brains do) much sooner than it would /become/ "a brain" It look a 128 core computer a 10hr run render a few-minute simulation of a complete single celled organism . Hard to say how much more compute power it would take to fully simulate a brain and be able to interact with it in real time. as for other methods of AI, it may take totally different kinds of hardware and networking all together.
quirksNquarks - Sunday, November 4, 2012 - link
Thank You,

this was a perfectly timed article - as people have forgotten why it is important the Technology keeps pushing boundaries regardless of *daily use* stagnation.

Also is a great example of why AMD does offer 16-core Chips. For These Kinds of Reasons! More Cores on One Chip means Less Chips are needed to be implemented - powered - tested - maintained.

an AMD 4 socket Mobo offers 64 cores. A personal Supercomputer. (Just think of how many they'll stuff full of ARM cores).

why Nvidia GPUs ?
a) Error Correction Code
b) CUDA

as to the CPUs...

http://www.newegg.ca/Product/Product.aspx?Item=N82...
$599 for every AMD 6274 chip (obvi they don't pay as much when ordering 300k).

vs

http://www.newegg.ca/Product/Product.aspx?Item=N82...
$1329 for an Intel Sandy Bridge equivalent which isn't really an equivalent considering these do NOT run in 4 socket designs. (obvi a little less when ordering in bulk numbers).

now multiple that price difference (the ratio) in the order of 10's of THOUSANDS!!

COMMON SENSE people.... Less Money for MORE CORES - or - More Money for LESS CORES ?
which road would YOU take? if you were footing the $ Bill.

but the Biggest thing to consider...

ORNL Upgraded from Jaguar <> Titan - which meant they ONLY needed a CHIP upgrade in that regards (( SAME SOCKET )) .. TRY THAT WITH INTEL > :P
phoenicyan - Monday, November 5, 2012 - link
I'd like to see description of logical architecture. I guess it could be 16x16x73 3D Torus.
XyaThir - Saturday, November 10, 2012 - link
Nice article, too bad there is nothing about the storage in this HPC cluster!
logain7997 - Tuesday, November 13, 2012 - link
Imagine the PPD this baby could produce folding. 0.0
hyperblaster - Tuesday, December 4, 2012 - link
In addition to the bit about ECC, nVidia really made headway over AMD primarily because of CUDA. nVidia specially targeted a whole bunch of developers of popular academic software and loaned out free engineers. Experienced devs from nVidia would actually do most of the legwork to port MPI code to CUDA, while AMD did nothing of the sort. Therefore, there is now a large body of well-optimized computational simulation software that supports CUDA (and not OpenCL). However, this is slowly changing and OpenCL is catching on.
Jag128 - Tuesday, January 15, 2013 - link
I wonder if it could play crysis on full?
mikbe - Friday, June 28, 2013 - link
I was actually surprise at how many actual times the word "actually" was actually used. Actually, the way it's actually used in this actual article it's actually meaningless and can actually be dropped, actually, most of the actual time.

Inside the Titan Supercomputer: 299K AMD x86 Cores and 18.6K NVIDIA GPUs

Applying for Time on Titan

Supercomputing Applications

Post Your Comment

130 Comments

View All Comments

galaxyranger - Sunday, November 4, 2012 - link

Hethos - Tuesday, November 6, 2012 - link

Daggarhawk - Tuesday, November 6, 2012 - link

quirksNquarks - Sunday, November 4, 2012 - link

phoenicyan - Monday, November 5, 2012 - link

XyaThir - Saturday, November 10, 2012 - link

logain7997 - Tuesday, November 13, 2012 - link

hyperblaster - Tuesday, December 4, 2012 - link

Jag128 - Tuesday, January 15, 2013 - link

mikbe - Friday, June 28, 2013 - link

Log in

Don't have an account? Sign up now