2020 has been an extremely successful year for Arm’s infrastructure and enterprise endeavours, as it was the year where we’ve seen fruition of the company’s “Neoverse” line of CPU microarchitectures hit the market in the form of Amazon’s new Graviton2 design as well as Ampere’s Altra server processor. Arm had first introduced the Neoverse N1 back in early 2019 and if you weren’t convinced of the Arm server promise with the Graviton2, the more powerful and super-sized Altra certainly should have turned some heads.

Inarguably the first generation of Arm servers that are truly competitive at the top end of performance, Arm is now finally achieving a goal the company has had in their sights for several years now, gaining real market share against the x86 incumbents.

Fast-forward to 2021, the Neoverse N1 design today employed in designs such as the Ampere Altra is still competitive, or beating the newest generation AMD or Intel designs – a situation that which a few years ago seemed farfetched. We recommend catching up on these important review pieces over the last 2 years to get an accurate picture of today’s market:


(Note: Y axis left chart starts at 50%)

Arm is very open that their main priority with the Neoverse line of products is gaining cloud footprint deployment market share, and as an example of the new-found success is an estimate into Amazon’s own AWS instance additions throughout 2020, where the new Arm-based Graviton2 is said to be the dominant hardware deployment, picking up the majority of share that’s being lost by Intel.

Looking towards 2022 and Beyond

Today, we’re pivoting towards the future and the new Neoverse V1 and Neoverse N2 generation of products. Arm had already tested the new products last September, teasing a few characteristics of the new designs, but falling short of disclosing more concrete details about the new microarchitectures. Following last month’s announcement of the Armv9 architecture, we’re now finally ready to dive into the two new CPU microarchitectures as well as the new CMN-700 mesh network.

As presented back in September, this generation of Neoverse CPU microarchitectures differ themselves in that we’re talking about two quite different products, aimed at different goals and market segments. The Neoverse V1 represents a new line-up for Arm, with a CPU microarchitecture that is aiming itself for more HPC-like workloads and designs oriented towards such markets, while the Neoverse N2 is more of a straight-up successor to the Neoverse N1 and infrastructure and cloud deployments in the same way that the N1 sees itself today in products such as the Graviton or Altra processors.

For readers who are familiar with Arm’s mobile CPU microarchitectures, there’s definitely very large similarities between the designs – even though Arm’s marketing seems to be oddly reluctant to make such kind of comparisons, which is why I made the above chart which more clearly tries to depict the similarities between design generations.

The original Neoverse N1 as seen in the Graviton2 and Altra Q processors had been a derivative, or better said, a sibling microarchitecture, to the Cortex-A76, which had been employed in the 2019 generation of Cortex-A76 mobile SoCs such as the Snapdragon 855. Naturally, the Neoverse designs had server-oriented features and changes that aren’t present in the mobile counterparts.

Similarly to how the N1 was related to the A76, the new generation V1 and N2 microarchitectures are related to newer designs in the Cortex-portfolio. The V1 is related to the Cortex-X1 which we’ve seen in this year’s new mobile SoCs such as the Snapdragon 888 or Exynos 2100. The Neoverse N2 on the other hand is related to an upcoming new Cortex-A microarchitecture which we expect to hear more about in the following few months. Throughout the piece today we’ll make a few more references to this generational disconnect between the V1 and N2, and it’s important to remember that the N2 is a newer design, albeit aimed at different performance and efficiency points.

This decoupling of design goals between the V1 and N2 for Arm comes through the company’s attempt to target more specific markets where the end products might have different priorities, much like how in the mobile space the new Cortex-X series prioritises per-core performance while the Cortex-A series continues to focus on the best PPA. Similarly, the V1 focuses on maximised performance at lower efficiency, with features such as wider SIMD units (2x256b SVE), while the N2 continues the scale-out philosophy of having the best power-efficiency while still moving forward performance through generational IPC improvements.

In today’s piece, we’ll be diving into the new microarchitectural changes of the V1, N2, as well as Arm’s newest generation mesh interconnect IP, the CMN-700, which is expected to serve as the foundation of the next-generation Arm infrastructure processors.

Table of contents:

The Neoverse V1 Microarchitecture: X1 with SVE?
POST A COMMENT

95 Comments

View All Comments

  • yeeeeman - Tuesday, April 27, 2021 - link

    what about a cortex a55 successor? Reply
  • SarahKerrigan - Tuesday, April 27, 2021 - link

    I'd expect to see one next month launching alongside Matterhorn. Reply
  • eastcoast_pete - Tuesday, April 27, 2021 - link

    Hi Sarah, can you post any links (including rumors) about that? Given ARM's focus on bigger, high performance-oriented designs, the LITTLE cores haven't gotten a lot of love in recent years. The persistence of the in-order designs for ARM LITTLE cores is one of the reasons why I find the dominance of ARM troubling; that clearly stood still because there is nowhere else to turn to for many, i.e. they didn't have to change it. In x86, at least we have two larger players having their own, yet compatible designs. Reply
  • SarahKerrigan - Tuesday, April 27, 2021 - link

    I've seen it reported in a few places, including on RWT which is a pain to search - but since task migration generally requires compatible instruction sets between big and little cores, it's pretty clear that Matterhorn will bring a small, low-power friend when it arrives. Reply
  • Raqia - Tuesday, April 27, 2021 - link

    I wonder if they could simply repurpose either a refresh of the A73 or A75 as the little core. Surely with the new fabrication processes available, die area relative to a big Matterhorn core should be comparable to A55 vs A78/X1, but the question becomes performance / energy. Integer performance of A75/73 vs. Ice Storm is comparable with the former winning by a bit in FP, but efficiency is light years apart:

    https://images.anandtech.com/doci/13614/SPEC-perf-...

    https://images.anandtech.com/doci/16192/spec2006_A...
    Reply
  • SarahKerrigan - Tuesday, April 27, 2021 - link

    I think use of a refreshed A65 without multithreading and with the new ops seems more plausible to me. Reply
  • Raqia - Tuesday, April 27, 2021 - link

    That could make sense; there's fairly little information on the micro-architecture of the A65 or A65AE at present except that it does do OoOE, and it's unclear what clocks and efficiency it can achieve as well:

    https://developer.arm.com/documentation/100439/010...

    It does sport a bigger maximum L2 configuration than the A55. They do need to up their game here as the A55 makes a pretty poor showing for efficiency compared to Apple's small core (which got even worse in the A14 generation):

    https://images.anandtech.com/doci/14072/SPEC2006ef...

    At least wattage and hence current draw is low.
    Reply
  • SarahKerrigan - Tuesday, April 27, 2021 - link

    A65 is E1, which has had a uarch dive on this site. Reply
  • Raqia - Wednesday, April 28, 2021 - link

    Got it, thanks for that! The A65 is interesting, without SMT they are quoting a pretty modest bump in integer performance < 20% at a bit more than half the power of A55 at 7nm:

    https://images.anandtech.com/doci/13959/07_Infra%2...

    https://images.anandtech.com/doci/13959/07_Infra%2...

    They could probably tune this to be better without SMT, but are you against having SMT for security reasons?

    It's still not close to Apple's small cores in performance, but efficiency might be in the same ballpark now. ARM designs are quite good in terms of PPA but even their performance oriented X1 is likely only 70% the die area as a Firestorm core, and their cache hierarchies are more complex as core designs pull double duty for servers parts too.

    It probably made sense to have fewer transistors per CPU core as quite a few Android SoC vendors integrated modems on die, but this may change once Qualcomm digests its Nuvia purchase and move to a smaller node. All parties may hit a wall for per core improvements as slowing SRAM density improvements at new nodes bottleneck what gains are gotten from logic density improvements.
    Reply
  • Kangal - Thursday, April 29, 2021 - link

    TL;DR - ARM needs to focus on a new product stack. It needs to have a diverse ARMv9 lineup of small, medium, large chipset options. With the small chipset being very scalable down to Tiny IoT Sensor level. Whereas the large chipset being scalable up to large supercomputers and servers. Whilst the medium chipset focusing on phones and tablets. As this covers full SoC, it includes both CPUs and GPUs.

    Long version:
    I know making these architectures is a huge challenge, but ARM has been a little lazy in some scenarios. I know they're basically following the money in the industry, and that means chasing the "phablet" market for CPUs and GPUs. But they've been leaving themselves vulnerable to gaps, in either smaller power or larger power systems, that can be exploited by competitors, such as RISC-V. If not, even x86 might poke some wins here and there.

    Ages ago, like 2013, they had the A7 (tiny), A15 (small), and A57 (medium) core designs. Basically covering most bases. Along with the Mali-400 iGPU, and 1GB-2GB Shared-RAM, to do some compute tasks. To say ARM was innovative would be a disservice to the technology they brought forward. That's in contrast to x86 Intel's Atom (small) and Intel's Core-i7Y/M (large), as well as Intel Iris Pro iGPU with 8GB Shared-RAM in systems of the time. Then ARM made the leap into 64bit processing around 2016. The lineup evolved into the A35 (tiny), A53 (small), A73 (medium) core designs, running with 1GB-2GB-4GB sRAM, and used modest G31 (tiny) to G51 (small) to G71 (medium) iGPU options. Again, this lineup was very innovative and impressive. Contrast that to the new x86 competition in AMD's 16nm Vega Large-iGPU, and Zen1 Large-CPU.

    However... There hasn't been any upgrades for the "tiny" portfolio, being stuck to the offerings of Cortex A35 CPU and G31 GPU ever since. There has been only a slight refresh to the "small" portfolio, upgrading to the Cortex A55 CPU, and the G52 and G57 iGPUs. To the point that they're a joke, and easily surpassable by the competitors. ARM really needs a revolutionary new design here, it needs to be super-efficient. Perhaps something that can scale between both tiny and small categories: with performance ranging from the A55 (or more) at the "tiny" power-level, to the A73 (or more) at the "small" power-level. Basically catching up to Apple, if unable to surpass them.

    Whereas the "medium" portfolio has seen very frequent upgrades, in the CPU-side to the Cortex A75, A76, A77, and A78. In the GPU-side we've seen G72, G76, G77, G78 which have been mostly competitive, surpassing some custom implementations (Samsung/MediaTek) and losing to others (Apple/Qualcomm). Not much needs to change here to be honest. We've also seen the emergence of a new "large" category of ARM processors. Firstly popularised by custom implementations from Apple (A10 and onwards), then Samsung (Mongoose M3, and onwards). Now it's supported officially by ARM in the form of the Cortex A77+ and the Cortex A78X / X1. This has been mostly underwhelming and uncompetitive, with Apple being the only one implementing good designs. There hasn't been any new "large" category for iGPUs from ARM or competitors, with the only Large-iGPU exception actually being inside the Apple Silicon M1. ARM (without counting Apple) needs to do better here, and it looks like ARM might already be focussing here in the future with ARMv9. Again contrast this to the x86 markets offering 7nm Large-CPUs of Zen2 and Zen3, with RDNA-1 and RDNA-2 Large-GPUs.
    Reply

Log in

Don't have an account? Sign up now