Ampere Roadmap Update: Switching to In-House CPU Designs, 128+ 5nm Cores in 2022

Name: Ampere Roadmap Update: Switching to In-House CPU Designs, 128+ 5nm Cores in 2022
Item: Ampere Roadmap Update: Switching to In-House CPU Designs, 128+ 5nm Cores in 2022
Author: Andrei Frumusanu

by Andrei Frumusanu on May 19, 2021 11:00 AM EST

160 Comments | Add A Comment

160 Comments

Today we’re covering some news of the more unusual type, and that is a roadmap update from Ampere, and having a closer look what the company is planning in terms of architectural and microarchitectural choices of their upcoming next-generation server CPUs in 2022 and onwards.

For people not familiar with Ampere, the company was founded back in 2017 by former Intel president Renée James, notably built upon a group of former Intel engineers who had left along with her to the new adventure. Initially, the company had relied on IP and design talent from former AppliedMicro’s X-Gene CPUs and still supporting legacy products such as the eMAG line-up.

With Arm having starting a more emphasised focus on designing and releasing datacentre and enterprise CPU IP line-ups in the form of the new Neoverse core offerings a few years back, over the last year or so we had finally seen the fruits of these efforts in the form of the release of several implementations of the first generation Neoverse N1 server CPU cores products, such as Amazon’s Graviton2, and more importantly, Ampere’s “Altra Quicksilver” 80-core server CPU.

The Altra Q line-up, for which we reviewed the flagship Q80-33 SKU last winter, was inarguably one of the most impressive Arm server CPU executions in past years, with the chip being able to keep up or beat the best AMD and Intel had to offer, even extending that positioning against the latest generation Xeon and EPYC generation.

Ampere’s next generation "Mystique" Altra Max is the next product on the roadmap, and is targeted to be sampling in the next few months and released later this year. The design relies on the same first generation Arm Neoverse N1 cores, at the same maximum 250W TDP as a drop-in replacement on the same platform, however with an optimised implementation that now allows for up to 128 CPU cores – 60% more cores than the first iteration of Altra we have today, and double the amount of cores of competitor systems from AMD or Amazon’s Graviton2.

For the future for designs beyond the Altra Max, Ampere is promising that they will be continuing emphasis of what they consider “predictable performance” for workloads with scaling socket load, increasing core counts with a linear increase in performance, and what I found interesting as a metric, to continue to reduce power per core – something to keep in mind as we’re discussing the next big news today:

Replacing Neoverse with Full Custom Cores

Today’s big reveal comes in regard to the microarchitecture choices that Ampere is going to be using starting in their next generation 2022 “Siryn” design, successor to the Altra Max, and relates to the CPU IP being used:

Starting with Siryn, Ampere will be switching over from Arm’s Neoverse cores to their new in-house full custom CPU microarchitecture. This announcement admittedly caught us completely off-guard, as we had largely expected Ampere to continue to be using Arm’s Neoverse cores for the foreseeable future. The switch to a new full custom microarchitecture puts Ampere on a completely different trajectory than we had initially expected from the company.

In fact, Ampere explains that what the move towards a full custom microarchitecture core design was actually always the plan for the company since its inception, and their custom CPU design had been in the works for the past 3+ years.

In terms of background - the design team leading the effort is lead by Ampere’s CTO Atiq Bajwa, who is also acting as the chief architect on the project. Bajwa and the team surrounding him appear to be mostly comprised of high-profile ex-Intel engineers and veterans which had left the company along with Renée James in 2017, topped-off with talent from a slew of other companies in the industry who joined them in the effort. The pedigree and history of the team is marked by achievements such as working on Intel’s Haswell and Broadwell processors.

Ampere’s explanation and rationale for designing a full custom core from the ground up, is that they are claiming they are able to achieve better performance and better power efficiency in datacentre workloads compared to what Arm’s Neoverse “more general purpose” designs are able to achieve. This is quite an interesting claim to make, and contrasts Arm’s projections and goals for their Neoverse cores. The recent Neoverse V1 and N2 cores were unveiled in more detail last month and are claimed to achieve significant generational IPC gains.

For Ampere to relinquish the reliance on Arm’s next-gen cores, and instead to rely on their own design and actually go forward with that switch in the next-gen product, shows a sign of great confidence in their custom microarchitecture design – and at the same time one could interpret it as a sign of no confidence in Arm’s Neoverse IP and roadmap. This comes at a great juxtaposition to what others are doing in the industry: Marvell has stopped development of their own ThunderX CPU IP in favour of adopting Arm Neoverse cores. On the other hand, not specifically related to the cloud and server market, Qualcomm earlier this year have acquired Nuvia, and their rationale and explanation was similar to Ampere’s in that they’re claiming that the new in-house design capabilities offered performance that otherwise wouldn’t have been possible with Arm’s Cortex CPU IP.

In our talks with Jeff Wittich, Ampere’s Chief Product Officer, he explains that today’s announcement should hopefully help paint a better picture of where Ampere is heading as a company – whether they’d continue to be content on “just” being an Arm IP integrator, or if they had plans for more. Jeff was pretty clear that in a few years’ time they’re envisioning and aiming for Ampere to be a top CPU provider for the cloud market and major player in the industry.

In terms of technical details as to how Ampere’s CPU microarchitecture will be different in terms of approach and how and why they see it as a superior performer in the cloud, are questions to which we’ll have to be a bit more patient for hearing answers to. The company wouldn’t comment on the exact status of the Siryn design right now – on whether it’s been taped in or taped out yet, but they do retierate that they’re planning customer sampling in early 2022 in accordance to prior roadmap disclosures. By the tone of the discussions, it seems the design is mostly complete, and Ampere is doing the finishing touches on the whole SoC. Jeff mentioned that in due time, they also will be doing microarchitectural disclosures on the new core, explaining their design choices in things like front-end or back-end design, and why they see it as a better fit for the cloud market.

Altra Max later this year, more cloud customer disclosures

Beyond the longer-term >2022 plans, today’s roadmap updates also contained a few more performance claim reiterations of Ampere’s upcoming 128-core Altra Max product, which is planned to hit the market later in the second half of the year and customers being sampled in the next few months.

The “Mystique” code-named Altra Max design will be characterised in that it’s able to increase the core-count by 60% versus the current generation Altra design, all while remaining at and below the same 250W TDP. The performance slides here are showcasing comparisons and performance claims against what is by now the previous generation competitor products, Ampere here simply explains they haven’t been able to get their hands on more recent Milan or Ice Lake-SP hardware to test. Nevertheless, the relative positioning against the Altra Q80-30 and the EPYC 7742 would indicate that the new chip would easily surpass the performance of even AMD’s latest EPYC 7763.

In the slide, Ampere actually discloses the SKU model name being used for the comparison, which is the "Altra Max M128-30" – meaning for the first time we have confirmation that all 128 cores are running at up to 3GHz clock speed, which is impressive given that we’re supposed to be seeing the same TDP and power characteristics between it and the Q80-33. We’ll be verifying these figures in the next few months once we get to review the Altra Max.

Today’s announcement also comes with an update on Ampere’s customers. Oracle was notably one of the first Altra adopters, but today’s disclosure also includes a wider range of cloud providers, with big names such as ByteDance and Tencent Cloud, two of the biggest hyperscalers in China.

Microsoft in particular is a big addition to the customer list, and while Ampere’s Jeff Wittich couldn’t comment on whether Microsoft has other internal plans in the works, he said that today’s announcement should give more clarity around the rumours of the Redmond company working on Arm-based servers, reports of which had surfaced back in December. Microsoft’s Azure cloud service is only second to Amazon’s AWS in terms of size and scale, and the company onboarding Altra products is a massive win for Ampere.

Taking control of one’s own future

Today’s announcements by Ampere of them deploying their own microarchitecture in future products is a major change in the company’s prospects. The news admittedly took us by surprise, but in the grand scheme of things it makes a lot of sense given that the company aims to be a major industry player in the next few years – taking full control of one’s own product future is critical in terms of assuring that success.

While over the years we’ve seen many CPU design teams be disbanded, actually having a new player and microarchitecture pop up is a much welcome change to the industry. While the news is a blow to Arm’s Neoverse IP, the fact that Ampere continues to use the Arm architecture is a further encouragement and win for the Arm ecosystem.

160 Comments

View All Comments

mode_13h - Friday, May 21, 2021 - link
> that doesn't change the performance gain of Neoverse V1

But it uses a lot more power. So, it's not the typical way we're accustomed to looking at perf gains, where it's at most ISO-power. By ARM's own admission, it's 0.7x to 1x as efficient, which means 1.5x to 2.14x the power!

Also, 1.7x the area, which means > 1.7x the cost (at ISO process).

So, it's really disingenuous to talk about the V1 as an example of uArch gains. Again, your heavy bias is clear for all to see.

And even the N2 doesn't look so great, once the power estimates are taken into account. 1.4x the IPC at 1.45x the power (ISO frequency). It starts to look like ARM has finally hit a wall on efficiency.
Wilco1 - Friday, May 21, 2021 - link
No, in fact larger, faster cores are less efficient (just like large heavy cars are less efficient than small cars). Small in-order cores like Cortex-A55 are the most efficient.

So higher IPC typically comes at a cost in area and power. V1 is still smaller and use less power than the latest x86 cores. So I'm not sure what your point is?

And it is not disingenuous in any way to mention the fact that Neoverse V1 has 50% higher IPC. It's simply Arm's fastest core, you can't argue with that achievement. Should we ignore Milan because it is larger and less efficient than Rome? Milan is still a great achievement. Or would you argue it is not?
mode_13h - Friday, May 21, 2021 - link
> larger, faster cores are less efficient (just like large heavy cars are less efficient than small cars).

yes.

> V1 is still smaller and use less power than the latest x86 cores.

But you didn't compare it to an x86 core. You compared it to the N1, which is like comparing a sports car to a previous year's family sedan, if we use your automotive analogy.

> it is not disingenuous in any way to mention the fact that Neoverse V1 has 50% higher IPC.

While it's an accurate repetition of a credible claim (i.e. something we can treat as a fact), it's what you're implying by it that makes it disingenuous. It's comparing cores in 2 different product lines, with different optimization targets. That's why it's called V1 and not N2. It's not a like-for-like comparison, which makes it difficult to infer much of anything from it. All it really tells us is how much faster ARM can make a server core on 7 nm, if they don't care as much about power or area.
Wilco1 - Saturday, May 22, 2021 - link
No. Both Neoverse N2 and V1 are not only successors of N1 but their microarchitectures are related to N1. So it is completely reasonable to compare N1 with V1. And the timeframe and performance gains are matching Cortex-A76 to Cortex-X1.

> All it really tells us is how much faster ARM can make a server core on 7 nm, if they don't care as much about power or area.

And that refutes the original claim that Arm has ran out of steam in terms of IPC gains. Indeed, V1 uses more power and loses efficiency but that's the cost of pushing IPC hard. Hence the split in product lines as different licencees likely want different PPA targets.
mode_13h - Sunday, May 23, 2021 - link
> Both Neoverse N2 and V1 are not only successors of N1 but their microarchitectures are related to N1.

Only in the most roundabout of ways. Andre's graphic shows the relationship:

https://images.anandtech.com/doci/16640/siblings.j...

> So it is completely reasonable to compare N1 with V1.

No, that's nonsense. The V1 is 33% bigger than the N2 and 70% bigger than the N1, and burns up to 2.14x as much power. A lot of its performance comes from the same places as the X1 vs. A78, as well as SVE. So, it makes as much sense as comparing a car with an 8-cylinder engine to a cheaper & more fuel-efficient 4-cylinder car of the previous model year, from the same manufacturer. One can certainly make such a comparison, but it's unclear what practical relevance it would have.

If you want to look at microarchitectural efficiency improvements, then you'd best focus on an apples-to-apples comparison, like A76 -> A78 or N1 -> N2.

> that refutes the original claim that Arm has ran out of steam in terms of IPC gains.

It would, if anyone had made such a claim. What I said was: "It starts to look like ARM has finally hit a wall on efficiency."

As that was before I noticed the 1.45x power figure was at ISO process with the N1, I'll walk that back a couple steps.
ikjadoon - Wednesday, May 19, 2021 - link
I think this move genuinely justifies Arm’s business model: make the ISA—not your architectures—the product.

Startup vendors can…

1. License stock core IP so anyone with money & some of silicon expertise can whip up chips. Start making money quickly and build a reputation (and customer list).

2. Then, if vendors feel confident and have the money & expertise, let us license the ISA alone. Keep all our current customers, sell them something even better and customized to their needs, and put us in near-total control.

It’s like the franchise mode on steroids: imagine if owning a Subway restaurant franchise let “upgrade” to an international logistics contract. Imagine the innovation markets it’d create.

If only Intel understood that 25+ years ago. Today, we get vague IDM “2.0” promises to allow some vague, non-committal licensing options in the unspecified future (and absolutely not the ISA because Intel shamelessly believes it does x86 the best).
Blastdoor - Wednesday, May 19, 2021 - link
Building a business based on Intel's alleged commitment to licensing sounds about as safe a bet as the companies that tried to license MacOS back in the 90s. Intel now, like Apple then, is being forced to adopt a model they really don't believe in. It's just not who they are. Apple ultimately returned to its true self and that has clearly worked out well for them. I have no idea if Intel can pull off something similar.
Silver5urfer - Wednesday, May 19, 2021 - link
So Omega gg ? lol

We will see what Sapphire Rapids and Genoa will have, if AMD increases the cores past 64 to near 80-90s then AMD will be on the moon.
mode_13h - Friday, May 21, 2021 - link
> if AMD increases the cores past 64 to near 80-90s then AMD will be on the moon.

Depends if their interconnect can continue to scale. Milan is really getting hurt by it, but some of that is due to the old process node of their IO die. Still, even on a newer node, more cores *and* higher speeds could continue to take a big bite out of their power budget. Meshes scale best.
Linustechtips12#6900xt - Wednesday, May 19, 2021 - link
So I wasn't the only one who had a tear of joy that AnandTech was going back to GPU reviews when i read ampere?

Ampere Roadmap Update: Switching to In-House CPU Designs, 128+ 5nm Cores in 2022

Replacing Neoverse with Full Custom Cores

Altra Max later this year, more cloud customer disclosures

Taking control of one’s own future

Related Reading:

Post Your Comment

160 Comments

View All Comments

mode_13h - Friday, May 21, 2021 - link

Wilco1 - Friday, May 21, 2021 - link

mode_13h - Friday, May 21, 2021 - link

Wilco1 - Saturday, May 22, 2021 - link

mode_13h - Sunday, May 23, 2021 - link

ikjadoon - Wednesday, May 19, 2021 - link

Blastdoor - Wednesday, May 19, 2021 - link

Silver5urfer - Wednesday, May 19, 2021 - link

mode_13h - Friday, May 21, 2021 - link

Linustechtips12#6900xt - Wednesday, May 19, 2021 - link

Log in

Don't have an account? Sign up now