For at least four years now, Arm has been pushing its efforts to be a big part of the modern day server, the modern day data center, and in the cloud as a true enterprise player. Arm cores are found in plenty of places in the server world, with big deployments for its smartphone focused Cortex core family in big chips. However, for those same four years, we have been requesting a high-performance core, to compete in single threaded workloads with x86. That core is Ares, due out in 2019, and while Arm hasn’t officially lifted the lid on the details yet, Huawei has already announced it has hardware with Ares cores at its center.

Huawei Is A BIG Company

Normally at AnandTech when we discuss Huawei, it is in the context of smartphones and devices such as the Mate 20, or smartphone chips like the Kirin family. These both fall under Huawei’s ‘Consumer Business Group’, which accounts for just under half of the company’s revenue. One of Huawei’s other groups is its Enterprise wing, which is almost as big, and it creates a lot of custom hardware and silicon using its in-house design team, HiSilicon. HiSilicon’s remit goes all the way from smartphones to modems to SSD controllers to PCIe controllers and also high-performance enterprise compute processors.

...And It Makes Server CPUs

Last month, Huawei’s Enterprise Group lifted the lid on its fourth generation data center processor. Part of the TaiShan family, the Hi1620 would follow hardware such as the Hi1616 in being built using Arm IP. The new Hi1620 was announced as the world’s first 7nm processor for the data center, with the Ares cores being what would drive high-performance for its deployments.

While Huawei didn’t have any Hi1620 at the show, it was promoting the fact that it will be a cornerstone in its portfolio, and lifted the lid on a number of key parts of the chip.

Huawei Hi16xx Family
  Hi1620 Hi1616 Hi1612 Hi1610
Announced 2018 2017 2016 2015
Cores 24 to 64 32 32 16
Architecture Ares Cortex-A72 Cortex-A57 Cortex-A57
Frequency (GHz) 2.4 to 3.0 2.4 GHz 2.1 GHz 2.1 GHz
L1 64 KB L1-I
64 KB L1-D
48 KB L1-I
32 KB L1-D
48 KB L1-I
32 KB L1-D
48 KB L1-I
32 KB L1-D
L2 512 KB Private 1MB/4 cores 1MB/4 cores 1MB/4 cores
L3 1MB/core Shared 32MB CCN 32MB CCN 16MB CCN
Memory 8x DDR4-3200 4x DDR4-2400 4x DDR4-2133 2x DDR4-1866
Interconnect Up to 4S
240 Gbps/port
Up to 2S
96 Gbps/port
? ?
IO 40 PCIe 4.0
2 x 100 GE
46 PCIe 3.0
8 x 10GE
16 PCIe 3.0 16 PCIe 3.0
Process TSMC 7nm TSMC 16nm TSMC 16nm TSMC 16nm
Power 100 to 200 W 85W ? ?

The new Hi1620 will feature 24-64 cores per socket, running from 2.4-3.0 GHz. Each of these cores will have a 64KB L1-Data cache and a 64 KB L1-Instruction cache, with 512KB of private L2 cache per core. L3 would run at 1MB/core of shared cache, up to 64MB. On a scale of a consumer Skylake core, that means more L2 cache per core, but less L3. No word on associativity, however. One of the key question marks is on performance: a lot of vendors are hoping for an Arm core with Skylake-levels of raw performance.

Memory is set at 8 channels up to DDR4-3200, and the chip will support a multi-socket configuration up to 4S, with the coherent SMP interface capable of 240 GB/s for each chip-to-chip communication. The 4S layout would be a fully connected design.

IO for the Hi1620 is set at 40 PCIe 4.0 lanes, which is less than the 46 lanes on the Hi1616, but those ones were rated for PCIe 3.0. The Hi1620 will also have CCIX support, as well as dual 100GbE MACs, some USB 3.0, and some SAS connectivity.

The package listed is 60x75 mm BGA, which gives no real indication to the chip inside. But that’s a lot of balls on the back, and the package is larger than the 57.5x57.5 mm design from the last generation. Huawei states that the Hi1620 will be offered in TDP ranges from 100W to 200W, with the varying core count, but chips will be offered that can be fine-tuned for memory bound workloads.

There are still plenty of unanswered questions, such as the interconnect, but we really want to get to grips with the microarchitecture of Ares to see what is under the hood. A number of journalists at the show were predicting that Arm should be having an event in the first half of 2019 to lift the lid on the design of the core.

Related Reading

 

POST A COMMENT

12 Comments

View All Comments

  • Kevin G - Monday, January 07, 2019 - link

    Diversification can help a company weather through a storm that takes out their cash crop. In the case of Intel, yes, FPGA and Optane can help the company weather the CPU threat in the server area from AMD, ARM and IBM/nVidia. What is worrisome is that Optane DIMMs are late (supposed to launch alongside Sky Lake-SP but delayed due to errata). Their FGPA road inherited from Altera has been steady with the exception of the combo Xeon + FPGA parts. There is the Gold 6138P and some prototypes floating around under NDA. The idea of putting an FPGA into a Xeon package predates Intel buying Altera. Cascade Lake and Intel finally pushing out 10 nm chips in volume will address these two concerns this year.

    Intel also has their Nervena accelerators coming this year can be another pillar to support the company when during the increased time of CPU competition.
    Reply
  • peevee - Wednesday, November 21, 2018 - link

    "Arm core with Skylake-levels of raw performance"

    What is "raw performance"? Adding 2 AVX (or NEON for ARM) registers to each other on 64 cores in a loop gives you one performance, real life usually something 100 times lower, simply because to do anything actually useful it needs to read and write memory (and fight starts on 1st shared cache actually, not even memory), and mostly in patterns very different from what those vector instructions suppose...
    Reply

Log in

Don't have an account? Sign up now