The Aurora Supercomputer Is Installed: 2 ExaFLOPS, Tens of Thousands of CPUs and GPUs

by Anton Shilov on June 22, 2023 12:15 PM EST

39 Comments | Add A Comment

39 Comments

Argonne National Laboratory and Intel said on Thursday that they had installed all 10,624 blades for the Aurora supercomputer, a machine announced back in 2015 with a particularly bumpy history. The system promises to deliver a peak theoretical compute performance over 2 FP64 ExaFLOPS using its array of tens of thousands of Xeon Max 'Sapphire Rapids' CPUs with on-package HBM2E memory as well as Data Center GPU Max 'Ponte Vecchio' compute GPUs. The system will come online later this year.

"Aurora is the first deployment of Intel's Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world," said Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group.

The Aurora supercomputer looks quite impressive, even by the numbers. The machine is powered by 21,248 general-purpose processors with over 1.1 million cores for workloads that require traditional CPU horsepower and 63,744 compute GPUs that will serve AI and HPC workloads. On the memory side of matters, Aurora has 1.36 PB of on-package HBM2E memory and 19.9 PB of DDR5 memory that is used by the CPUs as well as 8.16 PB of HBM2E carried by the Ponte Vecchi compute GPUs.

The Aurora machine uses 166 racks that house 64 blades each. It spans eight rows and occupies a space equivalent to two basketball courts. Meanwhile, that does not count the storage subsystem of Aurora, which employs 1,024 all-flash storage nodes offering 220PB of storage capacity and a total bandwidth of 31 TB/s. For now, Argonne National Laboratory does not publish official power consumption numbers for Aurora or its storage subsystem.

The supercomputer, which will be used for a wide variety of workloads from nuclear fusion simulations to whether prediction and from aerodynamics to medical research, uses HPE's Shasta supercomputer architecture with Slingshot interconnects. Meanwhile, before the system passes ANL's acceptance tests, it will be used for large-scale scientific generative AI models.

"While we work toward acceptance testing, we are going to be using Aurora to train some large-scale open-source generative AI models for science," said Rick Stevens, Argonne National Laboratory associate laboratory director. "Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models."

Even though Aurora blades have been installed, the supercomputer still has to undergo and pass a series of acceptance tests, a common procedure for supercomputers. Once it successfully clears these and comes online later in the year, it is projected to attain a theoretical performance exceeding 2 ExaFLOPS (two billion billion floating point operations per second). With vast performance, it is expected to secure the top position in the Top500 list.

The installation of the Aurora supercomputer marks several milestones: it is the industry's first supercomputer with performance higher than 2 ExaFLOPS and the first Intel'-based ExaFLOPS-class machine. Finally, it marks the conclusion of the Aurora saga that began eight years ago as the supercomputer's journey has seen its fair share of bumps.

Originally unveiled in 2015, Aurora was initially intended to be powered by Intel's Xeon Phi co-processors and was projected to deliver approximately 180 PetaFLOPS in 2018. However, Intel decided to abandon the Xeon Phi in favor of compute GPUs, resulting in the need to renegotiate the agreement with Argonne National Laboratory to provide an ExaFLOPS system by 2021.

The delivery of the system was further delayed due to complications with compute tile of Ponte Vecchio due to the delay of Intel's 7 nm (now known as Intel 4) production node and the necessity to redesign the tile for TSMC's N5 (5 nm-class) process technology. Intel finally introduced its Data Center GPU Max products late last year and has now shipped over 60,000 of these compute GPUs to ANL.

Source: Intel

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

39 Comments

View All Comments

speedping - Friday, June 23, 2023 - link
At this point it's reckless to not find a way to use all that waste heat to generate electricity.
The Von Matrices - Friday, June 23, 2023 - link
The waste heat is too cool to be of use for electrical generation. It may be useful for heating buildings and water though.
Samus - Saturday, June 24, 2023 - link
Read about Argonne's CHP plant. It is effectively the worlds largest heatpump. It reclaims heat in the Summer to produce electricity, and generates steam as a byproduct of electricity in the Winter to create heat that is piped to multiple buildings throughout the campus. It is an experiment\proof of concept that went online in 2016 to use the existing steam pipe infrastructure left over from the original steam production facility that used to heat the site dating back a century.
Vendicar - Tuesday, June 27, 2023 - link
They should use it to generate electricity.
duploxxx - Friday, June 23, 2023 - link
compared with the TOP Frontier which is 4Y old....they wil get estimated 2Tflops vs 1.2TFlops

however they need more than double resources

cpu
9408 vs 21248
gpu 37632 vs 63744

The Frontier consumes 21MW de aurora is estimated on 60MW

Poor team that ordered this...
zsdersw - Monday, June 26, 2023 - link
Your math is wrong. This thing won't be 2Tflops.. it'll be 2 exaflops. 1 exaflop is 1000 petaflops and 1 petaflop is 1000 teraflops.
jvl - Wednesday, June 28, 2023 - link
Meh, their math is not wrong, just their prefix. And their *point* regarding CPUs is spot on I believe.. Frontier uses ~606k cores, this one needs ~1.1M.
wanderer66 - Friday, June 23, 2023 - link
That's like asking about the fuel efficiency of a top fuel dragster.
3dgaming - Wednesday, July 5, 2023 - link
POWER is you big issue here? It is interesting to see how much they use, and the megawatt per PETAflop ration has to be going up - quite a bit - but then again we are getting incredible computing power to model many things and push forward in incredible way - and remain competative in the market.
shplatt - Thursday, June 22, 2023 - link
I believe the storage is PB not TB.

The Aurora Supercomputer Is Installed: 2 ExaFLOPS, Tens of Thousands of CPUs and GPUs

Post Your Comment

39 Comments

View All Comments

speedping - Friday, June 23, 2023 - link

The Von Matrices - Friday, June 23, 2023 - link

Samus - Saturday, June 24, 2023 - link

Vendicar - Tuesday, June 27, 2023 - link

duploxxx - Friday, June 23, 2023 - link

zsdersw - Monday, June 26, 2023 - link

jvl - Wednesday, June 28, 2023 - link

wanderer66 - Friday, June 23, 2023 - link

3dgaming - Wednesday, July 5, 2023 - link

shplatt - Thursday, June 22, 2023 - link

Log in

Don't have an account? Sign up now