Arm’s ambitions for the server market has been a very long journey that’s taken years to materialise. After many doubts and false start attempts, today in 2020 nobody can deny that sever chips powered by the company’s CPU IP are not only competitive, but actually class-leading on several metrics.

Amazon’s Graviton2 64-core Neoverse N1 server chip is the first of what should become a wider range of designs that will be driving the Arm server ecosystem forward and actively assaulting the infrastructure CPU market share that’s currently dominated by the x86 players such as Intel and AMD.

The journey has been a long one, but has had its roots back in roadmaps publicly planned laid out by the company back in 2018. Fast-forward to 2020, not only have we seen products with the first-generation Neoverse N1 infrastructure CPU IP hit the market in commercial and publicly available form, but we’ve seen the company exceed their targeted 30% generational gain by a factor of 2x.

The Neoverse V1: A New Maximum Performance Tier Infrastructure CPU

Today, we’re ready to take the next step towards the next generation of the Neoverse platform, not only revealing the CPU microarchitecture previously known as Zeus, but a whole new product category that goes beyond the Neoverse N-series: Introducing the new Neoverse V-series and the Neoverse V1 (Zeus), as well as a new roadmap insertion in the form of the Neoverse N2 (Perseus).

The new Neoverse V1 introduces the new V-series into Arm’s infrastructure IP portfolio, and essentially this represents the company’s push for higher absolute performance, no matter the cost.

Earlier this spring we covered the company’s new mobile Cortex-X1 CPU IP which represented significant business model change for Arm: Instead of offering only a single one-fits-all CPU microarchitecture which licensees had to make due with in a wider range of designs and performance points, we’ve now seen a divergence of the microarchitectures, with one IP offering now focusing on pure maximum performance (Cortex-X1), no matter the area or power cost, while the other design (Cortex-A78) focuses on Arm’s more traditional maximised PPA (Power, Performance, Area) design philosophy.

The Zeus microarchitecture in the form of the Neoverse V1 is essentially the infrastructure counterpart to what Arm has achieved in the mobile IP offering with the Hera Cortex-X1 CPU IP: A focus on maximum performance, with a lesser regard to power and area.

This means that the V1 has significantly larger caches, cores structures, using up more area and power to achieve unprecedented performance levels.

In terms of generational performance uplift, it’s akin to Arm throwing down the gauntlet to the competition, achieving a ground-breaking +50 IPC boost compared to Neoverse N1 that we’re seeing in silicon today. The performance uplift potential here is tremendous, as this is merely a same-process ISO-frequency upgrade, and actual products based on the V1 will also in all likelihood also see additional performance gains thanks to increased frequencies through process node advancements.

If we take the conservatively clocked Graviton2 with its 2.5GHz N1 cores as a baseline, a theoretical 3GHz V1 chip would represent an 80% uplift in per-core single-threaded performance. Not only would such a performance uptick vastly exceed any current x86 competition in the server space in terms of per-core performance, it would be enough to match the current best high-performance desktop chips from AMD and Intel today (Though we have to remember it’ll compete against next-gen Zen3 Milan and Sapphire Rapids products).

Neoverse N2 is Perseus – Continues the PPA Focus

Alongside the Neoverse V1 platform, we’ve seen a roadmap insertion that previously wasn’t there. The Perseus design will become the Neoverse N2, and will be the effective product-positioning successor to the N1. This new CPU IP represents a 40% IPC uplift compared to the N1, however still maintains the same design philosophy of maximising performance within the lowest power and smallest area.

It can be a bit confusing when it comes to the microarchitectural generations that we’re talking about here, so I made a graph to illustrate what we could call generational siblings between Arm’s mobile, and server CPU IP:

Although this is just a general rough outline of Arm’s products, the important thing to note that there’s similarities between generations of Cortex and Neoverse products as they’ve being developed in tandem at similar moments in time during their design. The Neoverse N1 was developed in conjunction with the Cortex-A76, and thus the two microarchitectures can be regarded as sibling designs as they share a lot of similarities.

The Neoverse V1 can be regarded as a sibling design to the Cortex-X1, likely sharing a lot of the supersized core structures that had been developed for these two flagship CPUs.

The Neoverse N2 is a bit more special as it represents the sibling design to a next-generation Cortex-A core which is the follow-up to the A78. Arm says they’ll be licensing out this “Perseus” design by the end of the year and that customers already are engaging on beta RTL – we’re likely to hear more about this generation of products at next year’s TechDay event. The N2 would be lagging behind the V1 by one year and subsequently it'll take more time to see this in products.

As a note, all of the above designs are all based in Austin and can be regarded as in the same microarchitecture family that had been started off with the Cortex-A76. If I’m not mistaken, next-generation “Poseidon” designs will be on a fresh new microarchitecture started by Arm’s Sophia-Antipolis design team – although Arm does note that there’s a lot more collaboration and blur between the different teams nowadays. Here Arm already notes a +30% IPC uplift for this generation of designs, likely to hit products in 2023.

An Undisclosed Architecture with SVE: Armv9?

One very notable characteristic of both the Neoverse V1 and N2 are the fact that these now support SVE (Scalable Vector Extensions), with the V1 having two native 256-bit pipelines and the N2 being a 2x128-bit design. The advantage of SVE over other SIMD ISAs is the fact that code written in it can scale with the varying execution width of a microarchitecture, something that’s just not possible with today’s Neon or AVX SIMD instructions.

Fujitsu’s A64FX chip and custom core microarchitecture had been to date the only CPU announced and available with SVE, meaning the V1 and N2 will be Arm’s first own designs actually implementing SVE.

Today’s announcements around this part of the V1 and N2 CPUs raised more questions than it answered, as the company wasn’t willing to disclose whether this support referred to the first-generation SVE instruction set, or whether they already supported SVE2.

In fact, the company wouldn’t confirm even the base architecture of the designs, whether this were Armv8 designs or one of the subsequent iterations. This is extremely unusual for the company as it’s traditionally transparent on such basic aspects of their IPs.

What I think is happening here is that the V1 and N2 might be both Armv9 designs, and the company will be publicly revealing the new ISA iteration sometime between today’s announcement and mid next year at the latest – of course this is all just my own interpretation of the situation as Arm refused to comment on the topic.

Update: Actually it does seem that Arm had already publicly upstreamed the initial compiler entries to GCC for Zeus back in June, confirming that at least the Neoverse V1 is an Armv8.4+SVE(1) design. I still think the N2 might be a v9+SVE2 design.

At the end of the day, what we end up are two extremely compelling new microarchitectures that significantly push Arm’s positioning in the infrastructure market. The Neoverse N2 is an obvious design that focuses on Arm’s PPA metrics, and the company sees customers designing products that are primarily focused on “scale-out” workloads that requite a lot of CPU cores. Here we could see designs up to 128 cores.

The Neoverse V1 will see designs with lesser core-counts as the CPUs are just bigger and more power hungry. Arm sees the 64-96 range being what’s most likely to be adopted by licensees. These are the premier products that will be going against the best of what Intel and AMD have to offer- and if the performance projections pan out (as they usually do for Arm), then we’re in for a brutally competitive fight unlike we’ve seen before.

The first publicly known design confirmed to employ the new Neoverse V1 cores is SiPearl’s “Rhea” chip that looks to feature 72 cores in a 7nm TSMC process node. Ampere’s “Siryn” design would also be a candidate for applying the V1 microarchitecture, targeted for a 2022 release on TSMC’s 5nm node.

Today’s announcement has been more of a teaser or unveiling, with the company planning to go into more details about the architecture and microarchitectures of the designs at a later date. Arm's DevSummit is scheduled for October 6-8th - and might be where we'll hear a bit more about the new architecture.

Related Reading:

POST A COMMENT

73 Comments

View All Comments

  • Tabalan - Tuesday, September 22, 2020 - link

    For vast majority of society x86 could die and they wouldn't see difference if they could use browser, office, some simple programmes and games on their PC/laptop. And you already have those programmes for ARM (iPads, Android). Reply
  • serendip - Wednesday, September 23, 2020 - link

    Linux already runs native ARM code using WSL on Windows on ARM, like on the Surface Pro X. The new Edge browser is ARM native and Office 365 is also ARM native with an x86 plug-in interface.

    x86-32 translation is usable but slow. x86-64 support is expected to arrive next year so that should take care of older programs that can't be recompiled for ARM. Bit by bit, piece by piece, an ARM native ecosystem is appearing on the consumer side. On servers, Linux and most open source tools have been available on ARM for years.
    Reply
  • Railander - Tuesday, September 22, 2020 - link

    the backwards compatibility IMO is the strongest argument in favor of x86. it's literally why windows is king on the desktop. frankly windows only didn't dominate the server market for coming late to the party, having a good chunk of your software just break every 1-5 years is a nightmare to every sysadmin and software dev. Reply
  • michael2k - Wednesday, September 23, 2020 - link

    5nm parts? Intel at least can't release a 5nm x86 part and are stuck at 10nm. AMD's Zen3 is currently manufactured at 7nm, so we won't be seeing 5nm parts until next year, possibly later if the rumors of Apple buying all of TSMC's 5nm capacity is true.

    So the question is who is buying Samsung's 5nm capacity. Those are the ARM chips most likely to compete with Intel and AMD in the short term. Even without Samsung's 5nm capacity, there is still Samsung and TSMC's 7nm/8nm capacity, which still beat's Intel and matches AMD.
    Reply
  • dotjaz - Sunday, September 27, 2020 - link

    Apple can't buy 100% of TSMC's 5nm capacity for more than a quarter or two. TSMC can push more than 50k wafers per month. Apple can get around 500 A14 per wafer. Maybe 300 "A14X". Let's say 450 per wafer on average because A14 is clearly the vast majority, that's over 22 million chips per month. Don't forget by mid next year TSMC could increase capacity to 100k per month. Reply
  • Samus - Wednesday, September 23, 2020 - link

    Those monolithic CPU designs didnt have to compete with the scalability things do now. At the moment and the foreseeable future, it's all about density and performance\watt.

    x86 inherently cannot compete without ditching its legacy roots, at which point, what is the point?
    Reply
  • abufrejoval - Wednesday, September 23, 2020 - link

    "It needs to be drastically better and execute all x86 software notably faster in order to replace the x86"

    That's where you err: x86 binary software compatibility is no issue for native cloud workloads (vs. cloud hosting of legacy), because all of that stuff is just re-compiled for ARM.

    And while the inherent architectural advantages may not be huge, they seem to be significant enough to allow squeezing significantly more computing out of the same transistor and Watt budget: How much of that is ARM, the architecture vs. ARM the CPU designer I don't know, but for the success vs. x86, that's what counts, power being the main operational expense in any cloud.

    And I can't help thinking that there must be something in the ISA, too, because Intel failed to deliver similar amounts of computing power in the <5Watt mobile handset space: It was either more power or less performance, often enough both. Don't think it was because Intel engineers are lazy or stupid, so that leaves something 'inherent' like the ISA still on the table.

    And SVE is really, really helpful creating portable libraries for HPC-type code.
    Reply
  • grant3 - Wednesday, September 23, 2020 - link

    Trying to compare Alpha, Sparc, PowerPC, fates etc. with ARM is flawed. The IT landscape is nothing like it was back then.

    Now we have billions of pre-existing ARM devices deployed into the hands of consumers, an army of millions of software engineers who are already making arm-compatible software, linux is a seasoned, enterprise-level server platform, and cloud-computing has abstracted hardware so completely that the underlying CPU is irrelevant to most organizations.

    No one needs to convince people to reduce their dependence on x86 processors- the inevitable march of progress has already done it.
    Reply
  • sing_electric - Wednesday, September 23, 2020 - link

    The difference between ARM and PPC, Alpha, SPARC, MIPS, etc. is that for the past decade, WAY more ARM CPUs have been sold than x86. Sure, a lot end up in phones, tablets, smart toasters, etc., but for consumer-facing applications, a lot already run on ARM. In fact, we're nearing a point where the boxes a developer needs to check are frequently iOS, Android, web, eschewing desktop entirely.

    x86 emulation is not NEARLY as important as it was even 5 years ago. Server and DC is a different story (but you generally don't trust 'mission critical' apps to emulation if you don't have to).
    Reply
  • 0ldman79 - Tuesday, September 22, 2020 - link

    The biggest thing holding back ARM is legacy x86 software.

    If they get emulation in hardware or software running good enough, doesn't have to match performance just 100% accurate emulation, then Intel and AMD have a problem.

    We're at the point where even low end x86 is plenty for 99% of the population. If they can beat a quad core without SMT enabled then it'll be good enough.

    That being said 90% of the population only use apps and browse the web, legacy software support is just keeping them out of the mainstream mostly because it has in the past, it's not even an issue until you start talking about commercial applications. Most businesses have some low volume custom software running their business. They can still run ARM for their personal computers.
    Reply

Log in

Don't have an account? Sign up now