First Thoughts & End Remarks

2020 was indeed a super-exciting year for Arm’s server ambitions, and one can easily claims that then Neoverse N1 has been a resounding success and implementations can be seen as being on the same playing field as the best that AMD and Intel are able to achieve, even against today’s newest generations.

The new Neoverse V1 and N2 continue the story in a 2-prong approach. For the Neoverse V1, back when the design was initially teased back in September, I was quite amazed at the claim of +50% IPC. After today’s figures, while the design is still very impressive, the disclosures of the power, area, and resulting power efficiency requirements have somewhat dulled my expectation of the new CPU microarchitecture.

What’s clear about the Neoverse V1 is that this seems to really be an HPC-oriented design. Alongside the known SiPearl Rhea chip, backed by the European Processor Initiative’s goals for HPC uses, Korea’s ETRI (Electronics and Telecommunications Research Institute) also has a V1 designed dubbed “K-AB21” in the works, also with hybrid HBM2E and DDR5 memory. Along with today’s announcement of the V1, India’s Center for Development of Advanced Computing has also announced that they’re a V1 licensee and be using it in an exascale supercomputer project.

Essentially, it seems the V1 will serve as the foundation of many new custom HPC projects, which is a great win both for Arm as an IP vendor, as well as their licensees which are able to build something to their exact needs.

For enterprise and cloud usages, given the CPU’s power efficiency, I now doubt that we’ll somehow see implementations from cloud or merchant silicon vendors such as Amazon or Ampere, particularly because the N2 will be available.

The Neoverse N2 is a more straightforward migration from the N1. IPC is improved by significant amounts which should result in good generational performance increases. I have concerns about power efficiency as the performance increases come at a linear cost of increased power. There’s a one-time opportunity to increase performance in many workloads by closing the power-gap for workloads which do not fully fill the TDP of a system today (while throttling others), however any further performance increases beyond that are dependent on actual good physical implementations by the vendors to fully take advantage of the next-generation process nodes and to execute on those theoretical gains. We’ll see how that will pan out – for now I’ll give the Arm the benefit of doubt, however we’ll also see similar gains in 5nm designs from the likes of AMD. How the competitive situation will end up in 2022 remains to be seen.

Arm had also made a note that while the N2 is a newer generation IP than the V1, roughly a year apart in design, the company actually expects for N2 products to come out only shortly after V1 products, sometime by end of this year. This further enforces my view that we’ll probably not see much V1 designs outside of the HPC market, and that Amazon and Ampere are likely to follow up with N2 based Gravitons and Altras. I want to be explicit here that none of the usual cloud vendors / CSPs / hyperscalers have yet officially commented on what kind of IP they'll be using in the next-generation designs.

The star of the show today was I think the CMN-700, and the vast new flexibility it allows vendors to achieve. The new architectural improvements and the move towards CCIX 2.0 and CXL are definitive big advances that will allow licensees to create more exotic designs. At the very least, it allows for effective usage of chiplet architecture designs, which is a much-needed feature that vendors need to adopt to be able to ensure affordability and manufacturability of products on leading edge nodes.

I’ll be looking forward to new V1 and N2 designs in 2022, and hope we’ll hear more details from licensees through the course of the year.

Eventual Design Performance Projections
Comments Locked

95 Comments

View All Comments

  • nandnandnand - Tuesday, April 27, 2021 - link

    Looking at Cortex-X-next. It seems like Arm can put out a new Cortex-X for every new Cortex-A78 successor, since the Cortex-X is very similar but bigger.
  • mode_13h - Tuesday, April 27, 2021 - link

    Form an earlier article:

    > The Cortex-X1 was designed within the frame of a new program at Arm,
    > which the company calls the “Cortex-X Custom Program”.
    > The program is an evolution of what the company had previously
    > already done with the “Built on Arm Cortex Technology” program
    > released a few years ago. As a reminder, that license allowed
    > customers to collaborate early in the design phase of a new
    > microarchitecture, and request customizations to the configurations,
    > such as a larger re-order buffer (ROB), differently tuned prefetchers,
    > or interface customizations for better integrations into the SoC designs.
    > Qualcomm was the predominant benefactor of this license,
  • Alistair - Tuesday, April 27, 2021 - link

    I just want to be able to use ARM in standard DIY with an Asus motherboard and a socket, just like AMD and Intel.
  • mode_13h - Tuesday, April 27, 2021 - link

    I wonder if Nvidia will put out a Jetson-style board in something like a mini-ITX form factor.
  • Alistair - Wednesday, April 28, 2021 - link

    i sure hope so, and something not massively overpriced like right now
  • mode_13h - Thursday, April 29, 2021 - link

    Yeah, because Nvidia is known for their bargain pricing!
    ; )

    Although, if they wanted to create a whole new product segment, it's conceivable they might keep prices rather affordable for a couple generations.
  • nandnandnand - Wednesday, April 28, 2021 - link

    I want it. You want it. Some people seem to want it. Maybe demand is forming? Get on it, China.

    16-core Cortex-X2 please.
  • mode_13h - Wednesday, April 28, 2021 - link

    They already did, sort of. See: https://e.huawei.com/us/products/servers/kunpeng/k...

    Whoops! Had to get this out of Google cache, because the page 404'd:

    Board Model D920S10
    Processors 1 Kunpeng 920 processor, 4/8 cores, 2.6 GHz
    Internal Storage 6 SATA 3.0 hard drive interfaces, 2 M.2 SSD slots
    Memory 4 DDR4-2666 UDIMM slots, up to 64 GB
    PCIe Expansion 1 PCIe 3.0 x16, 1 PCIe 3.0 x4, and 1 PCIe 3.0 x1 slots
    LOM Network Ports 2 LOM NIC, supporting GE network ports or optical ports
    USB 4 USB 3.0 and 4 USB 2.0
  • mode_13h - Tuesday, April 27, 2021 - link

    Do any of the current x86 cores pair up SSE operations for >= 4x throughput per cycle?

    AVX2 has been around for long enough that a lot of the code which could benefit from it has already been written to do so, yet *most* people are still compiling to baseline x86-64 (or just above that), since Intel is still making low-power cores without any AVX. So, I'm sure there's still *some* code that could benefit from >= 4x SSEn execution.
  • AntonErtl - Wednesday, April 28, 2021 - link

    Zen has 4 128-bit FP units (2 FMA and 2 FADD). Not sure if that's what you are interested in.

Log in

Don't have an account? Sign up now