Physical Architecture

The physical architecture of Titan is just as interesting as the high level core and transistor counts. I mentioned earlier that Titan is built from 200 cabinets. Inside each cabinets are Cray XK7 boards, each of which has four AMD G34 sockets and four PCIe slots. These aren't standard desktop PCIe slots, but rather much smaller SXM slots. The K20s NVIDIA sells to Cray come on little SXM cards without frivolous features like display outputs. The SXM form factor is similar to the MXM form factor used in some notebooks.

There's no way around it. ORNL techs had to install 18,688 CPUs and GPUs over the past few weeks in order to get Titan up and running. Around 10 of the formerly-Jaguar cabinets had these new XK boards but were using Fermi GPUs. I got to witness one of the older boards get upgraded to K20. The process isn't all that different from what you'd see in a desktop: remove screws, remove old card, install new card, replace screws. The form factor and scale of installation are obviously very different, but the basic premise remains.

As with all computer components, there's no guarantee that every single chip and card is going to work. When you're dealing with over 18,000 computers as a part of a single entity, there are bound to be failures. All of the compute nodes go through testing, and faulty hardware swapped out, before the upgrade is technically complete.

OS & Software

Titan runs the Cray Linux Environment, which is based on SUSE 11. The OS has to be hardened and modified for operation on such a large scale. In order to prevent serialization caused by interrupts, Cray takes some of the cores and uses them to run all of the OS tasks so that applications running elsewhere aren't interrupted by the OS.

Jobs are batch scheduled on Titan using Moab and Torque.

AMD CPUs and NVIDIA GPUs

If you're curious about why Titan uses Opterons, the explanation is actually pretty simple. Titan is a large installation of Cray XK7 cabinets, so CPU support is actually defined by Cray. Back in 2005 when Jaguar made its debut, AMD's Opterons were superior to the Intel Xeon alternative. The evolution of Cray's XT/XK lines simply stemmed from that point, with Opteron being the supported CPU of choice.

The GPU decision was just as simple. NVIDIA has been focusing on non-gaming compute applications for its GPUs for years now. The decision to partner with NVIDIA on the Titan project was made around 3 years ago. At the time, AMD didn't have a competitive GPU compute roadmap. If you remember back to our first Fermi architecture article from back in 2009, I wrote the following:

"By adding support for ECC, enabling C++ and easier Visual Studio integration, NVIDIA believes that Fermi will open its Tesla business up to a group of clients that would previously not so much as speak to NVIDIA. ECC is the killer feature there."

At the time I didn't know it, but ORNL was one of those clients. With almost 19,000 GPUs, errors are bound to happen. Having ECC support was a must have for GPU enabled Jaguar and Titan compute nodes. The ORNL folks tell me that CUDA was also a big selling point for NVIDIA.

Finally, some of the new features specific to K20/GK110 (e.g. Hyper Q and GPU Direct) made Kepler the right point to go all-in with GPU compute.

Power Delivery & Cooling

Titan's cabinets require 480V input to reduce overall cable thickness compared to standard 208V cabling. Total power consumption for Titan should be around 9 megawatts under full load and around 7 megawatts during typical use. The building that Titan is housed in has over 25 megawatts of power delivered to it.

In the event of a power failure there's no cost effective way to keep the compute portion of Titan up and running (remember, 9 megawatts), but you still want IO and networking operational. Flywheel based UPSes kick in, in the event of a power interruption. They can power Titan's network and IO for long enough to give diesel generators time to come on line.

The cabinets themselves are air cooled, however the air itself is chilled using liquid cooling before entering the cabinet. ORNL has over 6600 tons of cooling capacity just to keep the recirculated air going into these cabinets cool.

Oak Ridge National Laboratory Applying for Time on Titan & Supercomputing Applications
POST A COMMENT

130 Comments

View All Comments

  • Ryan Smith - Wednesday, October 31, 2012 - link

    We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it. Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Hey Ryan, what about this?

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
    46,645,248 CUDA Cores (yes, that's 46 million)
    299,008 x86 cores
    91.25 TB ECC GDDR5 memory
    584 TB Registered ECC DDR3 memory
    Each x86 core has 2GB of memory

    1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

    The Titan supercompute has 18,688 nodes.

    46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card.
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

    http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

    "The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

    18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

    (Press agencies are bad about using power of 10, hence "710" TB).
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T... Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
    http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

    46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688.
    Reply
  • ypsylon - Wednesday, October 31, 2012 - link

    Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

    @Guspaz

    If we get that kind of performance in phones then it is really scary prospect. :D
    Reply
  • twotwotwo - Wednesday, October 31, 2012 - link

    We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

    *Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

    I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

    I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical.
    Reply
  • JAH - Wednesday, October 31, 2012 - link

    Must've been a fun trip for a geek/nerd. I'm jealous!

    Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation?
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

    Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch.
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I WAS wondering, rather. Too early for posting, it seems. Reply

Log in

Don't have an account? Sign up now