The launch of Oak Ridge National Laboratory’s Titan Supercomputer was in many ways a turning point for NVIDIA’s GPU compute business. Though already into their third generation of Tesla products by that time, getting Tesla into the world’s most powerful supercomputer is as much of a singular mark of “making it” as there can be. Supercomputer contracts are not just large orders in and of themselves, but they indicate that the HPC industry has accepted GPUs as reliable and performant, and is ready to significantly invest in them. Since then Tesla has ended up in several other supercomputer contracts, with Tesla K20 systems powering 2 of the world’s top 10 supercomputers, and Tesla sales overall for this generation have greatly surpassed the Fermi generation.

Of course while landing their first supercomputer contract was a major accomplishment for NVIDIA, it’s not the only factor in making the current success of Tesla a sustainable success. To steal a restaurant analogy, NVIDIA was able to get customers in the door, but could they get them to come back? As announced by the US Department of Energy at the end of last week the answer to that is yes. The DoE is building 2 more supercomputers, and it will be NVIDIA and IBM powering them.

The two supercomputers will be Summit and Sierra. At a combined price tag of $325 million, the supercomputers will be built by IBM for Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. They will be the successors to the laboratories respective current supercomputers, Titan and Sequoia.

Hardware

Both systems will be of similar design, with Summit being the more powerful of the two. Powering the systems will be a triumvirate of technologies; IBM POWER9 CPUs, NVIDIA Volta-based Tesla GPUs, and Mellanox EDR Infiniband for the system interconnect.

Starting with the CPU, at this point this is the first real attention POWER9 has received. Relatively little information is available on the CPU, though IBM has previously mentioned that POWER9 is going to emphasize the use of accelerators (specialist hardware), which meshes well with what is being done for these supercomputers. Otherwise beyond this we don’t know much else other than that it will be building on top of IBM’s existing POWER8 technologies.

Meanwhile on the GPU side, this supercomputer announcement marks the reintroduction of Volta by NVIDIA since going quiet on it after the announcement of Pascal earlier this year. Volta was then and still remains a blank slate, so not unlike the POWER9 CPU we don’t know what new functionality is due with Volta, only that it is a distinct product that is separate from Pascal and that it will be building off of Pascal. Pascal of course introduces support for 3D stacked memory and NVLink, both of which will be critical for these supercomputers.

Speaking of NVLink, as IBM’s POWER family is the first CPU family to support NVLink it should come as no surprise that NVLink will be the CPU-GPU and GPU-GPU interconnect for these computers. NVIDIA’s high-speed PCIe replacement, NVLink is intended to allow faster, lower latency, and lower energy communication between processors, and is expected to play a big part in NVIDIA’s HPC performance goals. While GPU-GPU NVLink has been expected to reach production systems from day one, the DoE supercomputer announcement means that the CPU-GPU implementation is also becoming reality. Until now it was unknown whether an NVLink equipped POWER CPU would be manufactured (it was merely an option to licensees), so this confirms that we’ll be seeing NVLink CPUs as well as GPUs.

With NVLink in place for CPU-GPU communications these supercomputers will be able to offer unified memory support, which should go a long way towards opening up these systems to tasks that require frequent CPU/GPU interaction, as opposed to the more homogenous nature of systems such as Titan. Meanwhile it is likely – though unconfirmed – that these systems will be using NVLink 2.0, which as originally announced was expected for the GPU after Pascal. NVLink 2.0 introduces cache coherency, which would allow for further performance improvements and the ability to more readily execute programs in a heterogeneous manner.

Systems

US Department of Energy Supercomputers
  Summit Titan Sierra Sequoia
CPU Architecture IBM POWER9 AMD Opteron
(Bulldozer)
IBM POWER9 IBM BlueGene/Q
GPU Architecture NVIDIA Volta NVIDIA Kepler NVIDIA Volta N/A
Performance (RPEAK) 150 - 300 PFLOPS 27 PFLOPS 100+ PFLOPS 20 PFLOPS
Power Consumption ~10MW ~9MW N/A ~8MW
Nodes 3,400 18,688 N/A N/A
Laboratory Oak Ridge Oak Ridge Lawrence Livermore Lawrence Livermore
Vendor IBM Cray IBM IBM

Though similar in design, the total computational power and respective workloads will differ for Summit and Sierra. Sierra, the smaller of the systems, is to be delivered to Lawrence Livermore National Laboratory to replace their current 20 PetaFLOP Sequoia supercomputer. LLNL will be using Sierra for the National Nuclear Security Administration’s ongoing nuclear weapon simulations, with LLNL noting that “the machine will be dedicated to high-resolution weapons science and uncertainty quantification for weapons assessment.”


Sierra: 100+ PetaFLOPS

Due to its use in nuclear weapons simulations, information on Sierra is more restricted than it is for Summit. Publicly, Sierra is being quoted as offering 100+ PFLOPS of performance, over five-times the performance of Sequoia. As these supercomputers are still in development the final performance figures are unknown – power consumption and clockspeed cannot be guaranteed this early in the process, not to mention performance scaling on such a large system – and it is likely Sierra will exceed its 100 PFLOPS performance floor.

Meanwhile the more powerful of the systems, Summit, will be delivered to the Oak Ridge National Laboratory. In building their current Titan supercomputer, ORNL expected to get 4-5 years out of Titan, and adhering to that schedule Summit will be Titan’s replacement.


Summit: 150 to 300 PetaFLOPS

Summit’s performance is expected to be in the 150-300 PFLOPS range, once again varying depending on the final clockspeeds and attainable performance of the cluster. In 2012 ORNL wanted their next system to offer 10x the performance of Titan, and at this point Summit’s performance estimates range from 5x to 10x Titan, so while not guaranteed at this time it is still a possibility that Summit will hit that 10x goal.

As Summit is geared towards public work, we know quite a bit more about its construction than we do Seirra. Summit will be built out of roughly 3400 nodes, with each node containing multiple CPUs and GPUs (as opposed to 1 of each per Titan node). Each node in turn will be backed by at least 512GB of memory, most likely composed of 512GB of DDR4 and a to-be-determined amount of High Bandwidth Memory (stacked memory) on each GPU. Backing that in turn will be another 800GB of NVRAM per node.

From a power standpoint Summit is expected to draw 10MW peak, roughly 10% higher than Titan’s 9MW. However despite the slight increase in power consumption Summit is expected to physically be far smaller than Titan. With Summit nodes taking up roughly the same amount of space as Titan nodes, Summit’s nodes will occupy around only 20% of the volume of Titan’s nodes. Key to this of course is increasing the number of processors per node; along with multiple CPUs per node, NVIDIA’s new mezzanine form factor GPUs would play a large part here, as they allow for GPUs to be installed and cooled in a fashion similar to socketed CPUs, as opposeded to bulky PCIe cards.


NVIDIA Pascal Test Vehicle Showing New GPU Form Factor

Like Titan before it, Summit will be dedicated to what ORNL calls “open science.” Time on the supercomputer will be granted to researchers through application proposals. Much of the science expected to be done on Summit is similar to the science already done on Titan – climate simulations, (astro)physics, nuclear, etc – with Summit’s greater performance allowing for more intricate simulations.

Finally, Summit is expected to come online in 2017, with trials and qualifications leading up to the machine being opened to users in 2018. As it stands, when Summit launches it will be the most powerful supercomputer in the world. Its 150 PLFOPS lower bound being roughly 3x faster than the current record holder, China’s Xeon Phi powered Tianhe-2, and no other supercomputers have been announced (yet) that are expected to surpass that number.

Wrapping things up, for both IBM and NVIDIA securing new supercomputer contracts is a major win. With IBM indirectly scaling back its role in the supercomputer race – BlueGene/Q being the last of the BlueGenes – IBM will continue providing supercomputers by providing heterogeneous powered by a mix of their own hardware and NVIDIA GPUs. NVIIDA of course is no less thrilled to be in not only the successor to Titan, but in another DoE lab’s supercomputer as well, and with a greater share of the underlying technology than before.

Though with that said, it should be noted that this is not the last major supercomputer order the DoE will be placing. The CORAL project for these supercomputers also includes a supercomputer for Argonne National Laboratory, who will be replacing their Mira supercomputer in the same timeframe. The details for that supercomputer will be announced at a later date, so there is still one more supercomputer contract to be awarded.

Comments Locked

29 Comments

View All Comments

  • wurizen - Monday, November 17, 2014 - link

    I do have a PowerPC ala 12" Apple Powerbook G4!!! Still works, too. And, stable as shit. The only time I arose to instabilities using my Mac and OS X was when I updated her with an intel Mac. Although, I am not an engineer or expert, so as to know what I am talking about. Just an observation from a pseudo-tech observer.
  • wurizen - Monday, November 17, 2014 - link

    CLI Display? I guess I have always been fond of the word "Super Computers" since the 90's when there were RISC based workstations from Silicon Grafx running UNIX that I could never afford. Until, of course, I got a 12" G4 Apple Powerbook because it was cheaper than the 15" and had "RISC" cpu guts. Seemed cool then and seems even cooler now this Powerbook. So much stabler and seems not as sensitive as the new MBP's. I even dropped her a couple times and one big one when my shoulder bag broke while running to the train. It left a dent on her but she still ticks. I even disassembled her to try to bang the dent back into place from the inside, reapplied thermal paste and just out of curiosity and for its own sake since I got a 15" MBP unibody replacement. And I didn't even break her and still works to this day. The mid-2010 MBP on the other hand... is working now... but, oh boy, was she a headache. Hope she will last as long as her smaller sister.... but I'm rambling on now for too long. So, bye....
  • tipoo - Monday, November 17, 2014 - link

    Yeah, usually on mainframes or headless supercomputers like this, what you'll see when you plug in a display or remote into one of the individual systems is something akin to this, very basic, not much put into the UI, but functional as heck and never a hiccup with all the power behind it.

    http://upload.wikimedia.org/wikipedia/commons/2/29...
  • tipoo - Monday, November 17, 2014 - link

    Also those SGI "supercomputers" you were probably thinking of were desktop personal-ish computers, while the thing in the image above is the size of multiple rooms.

    Something like this I'm assuming? I think the IBM Supercomputers would be larger than your house :P

    http://upload.wikimedia.org/wikipedia/commons/5/55...
  • wurizen - Tuesday, November 18, 2014 - link

    Yep. They were desktops. Mid tower size. And, I think they were in grey, which were totally different than the beige window pc's or the off-white apple quadras....

    i would assume that if I have one of these things, being an IBM Power9 running this architecture that I'd be able to interconnect as many pc boxes that I can afford, accordingly to the size of the room where the computers will be?....
  • alxxx - Tuesday, November 25, 2014 - link

    You can buy a ppc board from freescale or Applied micro
    http://www.freescale.com/webapp/sps/site/homepage....
    500MHz -1.8GHz
    or buy a board with an older xilinx virtex2 fpga with 2 ppc hardcores
    Embedded planet (and others element14) sell ppc sbc's
    http://www.embeddedplanet.com/product/single-board...

    Should no problems running linux(ubuntu, fedora ,deb) all have ports.
    A very few set top boxes are ppc.

    I think you mean you want a power not ppc or powerpc.

    You could always run a ppc softcore in an fpga
  • mamisano - Monday, November 17, 2014 - link

    All of this is still 2+ years away!
  • dragonsqrrl - Tuesday, November 18, 2014 - link

    At least. Actually I'm kind of surprised this will be coming online in 2017 since it's supposed to be based on Volta, which you wouldn't expect to see until 2018 given Nvidia's 2 year architecture schedule. In any case this is a long way out. We basically have 2 generations worth of Nvidia architectures to go before we hit Volta (big Maxwell ~2015, Pascal ~2016).

    ... actually this seems like a really aggressive (borderline unrealistic) schedule.
  • Ktracho - Monday, November 17, 2014 - link

    This is certainly a win for IBM/NVIDIA, but I wonder what it says about Cray/Intel? Has Cray lost its competitive edge? It would be one thing for LLNL to go for IBM, since they have traditionally used IBM supercomputers due to the work load they tend to run, but ORNL has most often procured Cray supercomputers. The problem is that Cray has been using x86 CPUs (AMD in the past, and now Intel), and Intel is certainly not keen on using NVLINK, or anything NV related, for that matter. That means Cray's only hope to stay competitive is for Xeon Phi to do better than NVIDIA Tesla. It doesn't look like they were able to make it in this round. Can they catch up and overtake IBM/NVIDIA? If not, the consequences for Cray could be very significant.

Log in

Don't have an account? Sign up now