El Capitan Installation Begins: First APU-based Exascale System Shaping Up For 2024by Anton Shilov on July 6, 2023 8:30 AM EST
Lawrence Livermore National Laboratory had received the first components of its upcoming El Capitan supercomputer and begun to install them, the laboratory announced on Wednesday. The system is set to come online in mid-2024 and is expected to deliver performance of over 2 ExaFLOPS.
LLML's El Capitan is based on Cray's Shasta supercomputer architecture and will be built by HPE, just like two other exascale systems in the U.S., Frontier and Aurora. Unlike the first two exascale machines, which use a traditional discrete CPU plus discrete GPU configuration, the El Capitan supercomputer will be the first one based on AMD server-grade APUs that integrate both processor types in to a single, highly connected package.
AMD's Instinct MI300A APU incorporates both CPU and GPU chiplets, offering 24 general-purpose Zen 4 cores, compute GPUs powered by the CDNA 3 architecture, and 128 GB of unified on-package HBM3 memory. AMD has been internally evaluating its Instinct MI300A APU for months, and it appears that AMD and HPE are now ready to start installing the first pieces of hardware that make up El Capitan.
According to pictures released by the Lawrence Livermore National Laboratory, its engineers have already put a substantial number of servers into racks. Though LLNL's announcement leaves it unclear whether these are "completed" servers with production-quality silicon, or pre-production servers that will be filled out with production silicon at a later date. Notably, parts of Aurora were initially assembled with pre-production CPUs, which were only swapped out for Xeon CPU Max chips over the past couple of months. Given the amount of validation work required to stand-up a world-class supercomputer, AMD and HPE may be employing a similar strategy here.
"We have begun receiving & installing components for El Capitan, first #exascale #supercomputer," a Tweet by LLNL reads. "While we are still a ways from deploying it for national security purposes in 2024, it is exciting to see years of work becoming reality."
When it comes online in 2024, LLNL is expecting El Capitan to be the fastest supercomputer in the world. Though with its full specifications still being held back, it's not clear how much faster it is on paper compared to the 2 EFLOPS Aurora – let alone real-world performance. Part of the design goal of AMD's MI300A APU is to exploit additional performance efficiency gains that come from placing CPU and GPU blocks so close together, so it will be interesting to see what the software development teams programming for El Capitan can achieve, especially as they get their software further optimized.
LLNL's El Capitan is expected to cost $600 million. The system will be used nuclear weapons simulations and will be crucial for the U.S. national security. It replaces Sierra, a supercomputer based on IBM Power 9 and NVIDIA Volta accelerators, and promises to offer performance that is 16 times higher.
Source: LLNL Twtter