First Thoughts & End Remarks

2020 was indeed a super-exciting year for Arm’s server ambitions, and one can easily claims that then Neoverse N1 has been a resounding success and implementations can be seen as being on the same playing field as the best that AMD and Intel are able to achieve, even against today’s newest generations.

The new Neoverse V1 and N2 continue the story in a 2-prong approach. For the Neoverse V1, back when the design was initially teased back in September, I was quite amazed at the claim of +50% IPC. After today’s figures, while the design is still very impressive, the disclosures of the power, area, and resulting power efficiency requirements have somewhat dulled my expectation of the new CPU microarchitecture.

What’s clear about the Neoverse V1 is that this seems to really be an HPC-oriented design. Alongside the known SiPearl Rhea chip, backed by the European Processor Initiative’s goals for HPC uses, Korea’s ETRI (Electronics and Telecommunications Research Institute) also has a V1 designed dubbed “K-AB21” in the works, also with hybrid HBM2E and DDR5 memory. Along with today’s announcement of the V1, India’s Center for Development of Advanced Computing has also announced that they’re a V1 licensee and be using it in an exascale supercomputer project.

Essentially, it seems the V1 will serve as the foundation of many new custom HPC projects, which is a great win both for Arm as an IP vendor, as well as their licensees which are able to build something to their exact needs.

For enterprise and cloud usages, given the CPU’s power efficiency, I now doubt that we’ll somehow see implementations from cloud or merchant silicon vendors such as Amazon or Ampere, particularly because the N2 will be available.

The Neoverse N2 is a more straightforward migration from the N1. IPC is improved by significant amounts which should result in good generational performance increases. I have concerns about power efficiency as the performance increases come at a linear cost of increased power. There’s a one-time opportunity to increase performance in many workloads by closing the power-gap for workloads which do not fully fill the TDP of a system today (while throttling others), however any further performance increases beyond that are dependent on actual good physical implementations by the vendors to fully take advantage of the next-generation process nodes and to execute on those theoretical gains. We’ll see how that will pan out – for now I’ll give the Arm the benefit of doubt, however we’ll also see similar gains in 5nm designs from the likes of AMD. How the competitive situation will end up in 2022 remains to be seen.

Arm had also made a note that while the N2 is a newer generation IP than the V1, roughly a year apart in design, the company actually expects for N2 products to come out only shortly after V1 products, sometime by end of this year. This further enforces my view that we’ll probably not see much V1 designs outside of the HPC market, and that Amazon and Ampere are likely to follow up with N2 based Gravitons and Altras. I want to be explicit here that none of the usual cloud vendors / CSPs / hyperscalers have yet officially commented on what kind of IP they'll be using in the next-generation designs.

The star of the show today was I think the CMN-700, and the vast new flexibility it allows vendors to achieve. The new architectural improvements and the move towards CCIX 2.0 and CXL are definitive big advances that will allow licensees to create more exotic designs. At the very least, it allows for effective usage of chiplet architecture designs, which is a much-needed feature that vendors need to adopt to be able to ensure affordability and manufacturability of products on leading edge nodes.

I’ll be looking forward to new V1 and N2 designs in 2022, and hope we’ll hear more details from licensees through the course of the year.

Eventual Design Performance Projections
Comments Locked

95 Comments

View All Comments

  • GeoffreyA - Friday, April 30, 2021 - link

    "This is in comparison to x86 which seems to live in (probably justified) terror that any change they make, no matter how low level"

    P6, Netburst, Sandy Bridge, and Bulldozer seem like pretty big changes.
  • name99 - Friday, April 30, 2021 - link

    (a) Sandy Bridge was the last such.
    (b) Look at the relative spacing (in time) for the two cases.

    Look, I'm not interested in "x86 vs ARM. FIGHT!!!"
    I'm simply pointing out various patterns I've noted that strike me as interesting and significant. If other people have similar such patterns to point out -- interesting and non-obvious aspects of new x86 micro-architectures, or patterns in how those micro-architectures have evolved over the past few years, they should add a comment.
    But to this outsider the micro-architectures look stagnant -- utterly so in the case of Intel, mostly so in the case of AMD. In particular slight scaling up of an existing micro-architectures because a new process is more dense is not interesting! What is interesting is a new way of conceptualizing the problem that allows for a step change in the micro-architecture; and that is what I am not seeing on the x86 side.
    I do see it in IBM (though for purposes that are, to me, uninteresting, both for POWER and for z/)
    I do see it in ARM Ltd.
  • mode_13h - Friday, April 30, 2021 - link

    > What is interesting is a new way of conceptualizing the problem that allows for a step change in the micro-architecture

    Yes, but I think that largely depends on the ISA. And there, ARM has indeed been rather stagnant. Besides SVE and their new security features, most of their ISA changes have been tweaking around the margins. Not a fundamental rethink, or anything close to it.

    What we need is more willingness to rethink the SW/HW divide and look at what more software can do to make hardware more efficient. Whenever I say this, people immediately seem to think I mean doing a VLIW-like approach, but that's too extreme for most workloads. You just have to look at an energy breakdown of a modern CPU and think creatively about where compilers could make the hardware's job a little bit easier or simpler, for the same or better result.

    You can also flip it around, and ask where the primitives CPUs provide don't quite match up with what software is trying to do. I think TSX/HLE stands as an interesting example of that, and probably one where Intel doesn't get enough credit (granted, partly due to their own missteps).
  • name99 - Friday, April 30, 2021 - link

    Architecture and micro-architecture are two different things.
    You want to fantasize about different architectures, be my guest. But I'm interested in MICRO-ARCHITECTURE and that was the content of my comments.
  • mode_13h - Saturday, May 1, 2021 - link

    > Architecture and micro-architecture are two different things.

    The principle manifestation of the HW/SW divide is the ISA. That's why I talk about it rather than "architecture", which is a word that can mean different things to different people and in different contexts.

    > You want to fantasize about different architectures, be my guest.

    It's about as on-topic here as ever, given that we've gotten our most detailed look at ARMv9, yet. And performance + efficiency numbers!

    > But I'm interested in MICRO-ARCHITECTURE and that was the content of my comments.

    There's only so much you can do, within the constraints of an ISA. ARM had a chance to think really big, but they chose to play it safe and be very incremental. That could turn out to be a very costly mistake, for them and some of their licensees.

    I just want what I think we all want, which is another decade of progress in performance and efficiency like the last one. So far, I'm not very hopeful. I guess we need to really hit the wall, before people are ready to get serious about embracing options to push it back, a bit further.

Log in

Don't have an account? Sign up now