It was recently announced that the Fugaku supercomputer, located at Riken in Japan, has scored the #1 position on the TOP500 supercomputer list, as well as #1 positions in a number of key supercomputer benchmarks. At the heart of Fugaku isn’t any standard x86 processor, but one based on Arm – specifically, the A64FX 48+4-core processor, which uses Arm’s Scalable Vector Extensions (SVE) to enable high-throughput FP64 compute. At 435 PetaFLOPs and 7.3 million cores, Fugaku beat the former #1 system by 2.8x in performance. Currently Fugaku has been used for COVID-19 related research, such as modelling tracking rates or virus in liquid droplet dispersion.

The Fujitsu A64FX card is a unique piece of kit, offering 48 compute cores and 4 control cores, each with monumental bandwidth to keep the 512-bit wide SVE units fed. The chip runs at 2.2 GHz, and can operate in FP64, FP32, FP16 and INT8 modes for a variety of AI applications. There is 1 TB/sec of bandwidth from the 32 GB of HBM2 on each card, and because there are four control cores per chip, it runs by itself without any external host/device situation.

It wasn’t ever clear if the A64FX module would be available on a wider scale beyond supercomputer sales, however today confirms that it is, with the Japanese based HPC Systems set to offer a Fujitsu PrimeHPC FX700 server that contains up to eight A64FX nodes (at 1.8 GHz) within a 2U form factor. Each note is paired with 512 GB of SSD storage and gigabit Ethernet capabilities, with room for expansion (Infiniband EDR etc). The current deal at HPC Systems is for a 2-node implementation, at a price of ¥4,155,330 (~$39000 USD), with the deal running to the end of the year.

The A64FX card already has listed support for quantum chemical calculation software Gaussian16, molecular dynamics software AMBER, non-linear structure analysis software LS-DYNA. Other commercial packages in the structure and fluid analysis fields will be coming on board in due course. There is also Fujitsu’s Software Compiler Package v1.0 to enable developers to build their own software.

Source: HPC Systems, PDF Flyer

Related Reading

 

POST A COMMENT

32 Comments

View All Comments

  • saratoga4 - Friday, June 26, 2020 - link

    >This architecture kind of begs the question, what does an x86 CPU with HBM on-package perform like?

    Probably similar to standard DDR4. Even the big Skylake-SP dies have a more limited number of cores than typical high bandwidth applications like GPUs or these ARM vector accelerators, so having huge numbers of parallel memory channels doesn't make as much sense. You just need enough channels to keep up with your demand, having more than you need doesn't make individual accesses any faster.
    Reply
  • MenhirMike - Friday, June 26, 2020 - link

    The naming of the CPU still makes me do a double take on whether AMD just came out with a new Athlon 64 FX-series :)

    I think it's interesting how in the span of a month or so, we went from "ARM in the cloud is nice and all, but there's no real desktop systems to develop on" to several options (including whatever Apple's gonna do), though of course, a $40000 machine is not a developer desktop machine. Then again, for essentially having a slice of a TOP500 supercomputer, it's not bad pricing?

    Any way, good times for ARM outside of just mobile devices ahead.
    Reply
  • MenhirMike - Friday, June 26, 2020 - link

    (Sidenote: I do wish that CPU companies would offer defective chips as souvenir/decorative pieces. I wouldn't mind wallmounting one of these next to an Itanium and Opteron, but I doubt these will show up on eBay for less than 100 bucks anytime soon :P) Reply
  • Deicidium369 - Friday, June 26, 2020 - link

    https://www.youtube.com/watch?v=rUieSdFbLA4 - goes a little past Itanium and Opteron Reply
  • thetrashcanisfull - Friday, June 26, 2020 - link

    Seems less appealing without the built in TOFU interconnect. Unless that is (hopefully) used for nodes within a single chassis? Reply
  • Stele - Saturday, June 27, 2020 - link

    Nah, this card's meant to interface with a string of others in a self-contained computing pod, so it uses the EDAMAME interconnect instead. Reply
  • SuperiorSpecimen - Saturday, June 27, 2020 - link

    Fantastic! I lol'd Reply
  • ozzuneoj86 - Friday, June 26, 2020 - link

    I still read it as "Athlon 64 FX". I just can't help it. Reply
  • Oxford Guy - Saturday, June 27, 2020 - link

    Obviously, it was intentional. Creative naming is not this company's strong suit, clearly.

    Ripping someone else off, though, is.

    I am strongly reminded of a certain company's laptops that look almost exactly like a MacBook Pro.
    Reply
  • ravyne - Sunday, June 28, 2020 - link

    2 nodes for 40k seems like not a great deal TBH, and I'm very much a fan of the A64FX otherwise. That's only rough parity to the cost of a a modest server and a couple high-end Tesla cards, and effectively not much different since both platforms are effectively SIMD architectures. Tough to sell a newcomer at the same price as an incumbant without offering something very different, or at least significantly undercutting their price. 4 Nodes at 40-50k and you'd really be talking; 8 nodes at 65-70k even better. Reply

Log in

Don't have an account? Sign up now