The Neoverse N2 µArch: First Armv9 For Enterprise

Moving from the performance oriented Neoverse V1 to the more balanced Neoverse N2 core, we’re seeing a different approach to performance, more akin to the Cortex-A78’s PPA focus versus the X1’s performance focus.

Arm makes note of the “balance” keyword here – the microarchitecture only adopts features and design changes if those changes actually contribute to an increase of the PPA (Performance, Power, Area) equation of the IP. In contrast, the V1 would opt for performance increasing features even if that meant a disproportionate increase in power and area, reducing the total PPA of the design.

Architecturally, the N2 is a newer core than the V1 and takes a higher architectural baseline as the foundation of its capabilities. It’s Arm’s first disclosed Armv9 capable core, including important new features such as SVE2. It’s to be noted that although Arm talked a lot about Armv9 CCA (Confidential Compute Architecture) last month, the Neoverse N2 core does not feature this capability, which is an extension we’re told to expect in future microarchitecture designs.

Arm’s microarchitectural disclosures on the N2 were rather limited compared to the details we’ve seen on the V1. This being a sibling core to the yet undisclosed next generation Cortex-A78 successor, we’ll have to wait a few more months to see exactly what differentiates this newer iteration compared to the Cortex-A78, besides the notable Armv9 features and new SVE2 pipelines.

Arm at least confirms that it’s a narrower microarchitecture in the sense that there’s only a 5-wide dispatch (compared to 8-wide in the V1), and the design features 2x128b native SVE2 and NEON pipelines.

The company states that the new design should still achieve an impressive +40% increase in IPC compared to the Neoverse N1, which is actually substantial given the fact that we’re promised only a linear increase in power and area.

In terms of “smarts”, or better said, microarchitectural innovations, the N2 is a super-set of the V1, just with a more conservative approach to block and structure sizes.

System side features, on top of MPMM and DT, PDP, or Performance Defined Power Management is a feature newer to the N2 that promises to vary the CPU’s microarchitectural features depending on workloads, in order to reduce power consumption without impacting performance. I imagine here that we’re talking about smarter workload dependent clock-gating of microarchitectural features, for example narrowing of the execution resources in low-IPC workloads.

The Neoverse V1 Microarchitecture: Platform Enhancements The SVE Factor - More Than Just Vector Size
Comments Locked

95 Comments

View All Comments

  • GeoffreyA - Friday, April 30, 2021 - link

    "This is in comparison to x86 which seems to live in (probably justified) terror that any change they make, no matter how low level"

    P6, Netburst, Sandy Bridge, and Bulldozer seem like pretty big changes.
  • name99 - Friday, April 30, 2021 - link

    (a) Sandy Bridge was the last such.
    (b) Look at the relative spacing (in time) for the two cases.

    Look, I'm not interested in "x86 vs ARM. FIGHT!!!"
    I'm simply pointing out various patterns I've noted that strike me as interesting and significant. If other people have similar such patterns to point out -- interesting and non-obvious aspects of new x86 micro-architectures, or patterns in how those micro-architectures have evolved over the past few years, they should add a comment.
    But to this outsider the micro-architectures look stagnant -- utterly so in the case of Intel, mostly so in the case of AMD. In particular slight scaling up of an existing micro-architectures because a new process is more dense is not interesting! What is interesting is a new way of conceptualizing the problem that allows for a step change in the micro-architecture; and that is what I am not seeing on the x86 side.
    I do see it in IBM (though for purposes that are, to me, uninteresting, both for POWER and for z/)
    I do see it in ARM Ltd.
  • mode_13h - Friday, April 30, 2021 - link

    > What is interesting is a new way of conceptualizing the problem that allows for a step change in the micro-architecture

    Yes, but I think that largely depends on the ISA. And there, ARM has indeed been rather stagnant. Besides SVE and their new security features, most of their ISA changes have been tweaking around the margins. Not a fundamental rethink, or anything close to it.

    What we need is more willingness to rethink the SW/HW divide and look at what more software can do to make hardware more efficient. Whenever I say this, people immediately seem to think I mean doing a VLIW-like approach, but that's too extreme for most workloads. You just have to look at an energy breakdown of a modern CPU and think creatively about where compilers could make the hardware's job a little bit easier or simpler, for the same or better result.

    You can also flip it around, and ask where the primitives CPUs provide don't quite match up with what software is trying to do. I think TSX/HLE stands as an interesting example of that, and probably one where Intel doesn't get enough credit (granted, partly due to their own missteps).
  • name99 - Friday, April 30, 2021 - link

    Architecture and micro-architecture are two different things.
    You want to fantasize about different architectures, be my guest. But I'm interested in MICRO-ARCHITECTURE and that was the content of my comments.
  • mode_13h - Saturday, May 1, 2021 - link

    > Architecture and micro-architecture are two different things.

    The principle manifestation of the HW/SW divide is the ISA. That's why I talk about it rather than "architecture", which is a word that can mean different things to different people and in different contexts.

    > You want to fantasize about different architectures, be my guest.

    It's about as on-topic here as ever, given that we've gotten our most detailed look at ARMv9, yet. And performance + efficiency numbers!

    > But I'm interested in MICRO-ARCHITECTURE and that was the content of my comments.

    There's only so much you can do, within the constraints of an ISA. ARM had a chance to think really big, but they chose to play it safe and be very incremental. That could turn out to be a very costly mistake, for them and some of their licensees.

    I just want what I think we all want, which is another decade of progress in performance and efficiency like the last one. So far, I'm not very hopeful. I guess we need to really hit the wall, before people are ready to get serious about embracing options to push it back, a bit further.

Log in

Don't have an account? Sign up now