The Neoverse N2 µArch: First Armv9 For Enterprise

Moving from the performance oriented Neoverse V1 to the more balanced Neoverse N2 core, we’re seeing a different approach to performance, more akin to the Cortex-A78’s PPA focus versus the X1’s performance focus.

Arm makes note of the “balance” keyword here – the microarchitecture only adopts features and design changes if those changes actually contribute to an increase of the PPA (Performance, Power, Area) equation of the IP. In contrast, the V1 would opt for performance increasing features even if that meant a disproportionate increase in power and area, reducing the total PPA of the design.

Architecturally, the N2 is a newer core than the V1 and takes a higher architectural baseline as the foundation of its capabilities. It’s Arm’s first disclosed Armv9 capable core, including important new features such as SVE2. It’s to be noted that although Arm talked a lot about Armv9 CCA (Confidential Compute Architecture) last month, the Neoverse N2 core does not feature this capability, which is an extension we’re told to expect in future microarchitecture designs.

Arm’s microarchitectural disclosures on the N2 were rather limited compared to the details we’ve seen on the V1. This being a sibling core to the yet undisclosed next generation Cortex-A78 successor, we’ll have to wait a few more months to see exactly what differentiates this newer iteration compared to the Cortex-A78, besides the notable Armv9 features and new SVE2 pipelines.

Arm at least confirms that it’s a narrower microarchitecture in the sense that there’s only a 5-wide dispatch (compared to 8-wide in the V1), and the design features 2x128b native SVE2 and NEON pipelines.

The company states that the new design should still achieve an impressive +40% increase in IPC compared to the Neoverse N1, which is actually substantial given the fact that we’re promised only a linear increase in power and area.

In terms of “smarts”, or better said, microarchitectural innovations, the N2 is a super-set of the V1, just with a more conservative approach to block and structure sizes.

System side features, on top of MPMM and DT, PDP, or Performance Defined Power Management is a feature newer to the N2 that promises to vary the CPU’s microarchitectural features depending on workloads, in order to reduce power consumption without impacting performance. I imagine here that we’re talking about smarter workload dependent clock-gating of microarchitectural features, for example narrowing of the execution resources in low-IPC workloads.

The Neoverse V1 Microarchitecture: Platform Enhancements The SVE Factor - More Than Just Vector Size
Comments Locked

95 Comments

View All Comments

  • nandnandnand - Tuesday, April 27, 2021 - link

    Looking at Cortex-X-next. It seems like Arm can put out a new Cortex-X for every new Cortex-A78 successor, since the Cortex-X is very similar but bigger.
  • mode_13h - Tuesday, April 27, 2021 - link

    Form an earlier article:

    > The Cortex-X1 was designed within the frame of a new program at Arm,
    > which the company calls the “Cortex-X Custom Program”.
    > The program is an evolution of what the company had previously
    > already done with the “Built on Arm Cortex Technology” program
    > released a few years ago. As a reminder, that license allowed
    > customers to collaborate early in the design phase of a new
    > microarchitecture, and request customizations to the configurations,
    > such as a larger re-order buffer (ROB), differently tuned prefetchers,
    > or interface customizations for better integrations into the SoC designs.
    > Qualcomm was the predominant benefactor of this license,
  • Alistair - Tuesday, April 27, 2021 - link

    I just want to be able to use ARM in standard DIY with an Asus motherboard and a socket, just like AMD and Intel.
  • mode_13h - Tuesday, April 27, 2021 - link

    I wonder if Nvidia will put out a Jetson-style board in something like a mini-ITX form factor.
  • Alistair - Wednesday, April 28, 2021 - link

    i sure hope so, and something not massively overpriced like right now
  • mode_13h - Thursday, April 29, 2021 - link

    Yeah, because Nvidia is known for their bargain pricing!
    ; )

    Although, if they wanted to create a whole new product segment, it's conceivable they might keep prices rather affordable for a couple generations.
  • nandnandnand - Wednesday, April 28, 2021 - link

    I want it. You want it. Some people seem to want it. Maybe demand is forming? Get on it, China.

    16-core Cortex-X2 please.
  • mode_13h - Wednesday, April 28, 2021 - link

    They already did, sort of. See: https://e.huawei.com/us/products/servers/kunpeng/k...

    Whoops! Had to get this out of Google cache, because the page 404'd:

    Board Model D920S10
    Processors 1 Kunpeng 920 processor, 4/8 cores, 2.6 GHz
    Internal Storage 6 SATA 3.0 hard drive interfaces, 2 M.2 SSD slots
    Memory 4 DDR4-2666 UDIMM slots, up to 64 GB
    PCIe Expansion 1 PCIe 3.0 x16, 1 PCIe 3.0 x4, and 1 PCIe 3.0 x1 slots
    LOM Network Ports 2 LOM NIC, supporting GE network ports or optical ports
    USB 4 USB 3.0 and 4 USB 2.0
  • mode_13h - Tuesday, April 27, 2021 - link

    Do any of the current x86 cores pair up SSE operations for >= 4x throughput per cycle?

    AVX2 has been around for long enough that a lot of the code which could benefit from it has already been written to do so, yet *most* people are still compiling to baseline x86-64 (or just above that), since Intel is still making low-power cores without any AVX. So, I'm sure there's still *some* code that could benefit from >= 4x SSEn execution.
  • AntonErtl - Wednesday, April 28, 2021 - link

    Zen has 4 128-bit FP units (2 FMA and 2 FADD). Not sure if that's what you are interested in.

Log in

Don't have an account? Sign up now