First N1 Silicon: Enabling the Ecosystem with SDPs

A little known fact about Arm is that the company designs its own silicon test platform – actually deploying them on development board to enable validation and software development on hardware that Arm and developers have full control of. The latest generation was the Juno platform, which in its first revision started off with a Cortex A57 and served as the fundamental silicon testbed for ARMv8 software.

Ever since Arm started the programme in 2014, Arm has shipped over 1400 boards both internally and to its partners. The amount of chips we’re talking about here sounds paltry, however we have to keep in mind we’re talking about very limited shuttle runs on MPW (multi-project wafers) where Arm shares wafer space with numerous other companies.

For today’s announcement, Arm had the pleasure to reveal that it received back the first working Neoverse N1 silicon back in December – with the chips meant to be integrated into the new Neoverse System Development Platform (SDP).

The N1 SDP represents major step for Arm as it not only is the first silicon to come back with the N1 CPU, but also is Arm’s first own 7nm silicon. The platform represents a major proof of concept of the IP, as well as interoperability with third-party IP, employing a lot of the peripheral IP such as PCIe and DDR PHY supplied by Cadence.

The actual hardware is a limited implementation of an N1 SoC – we find a 4-core N1 CPU with 1MB L2 configuration in the form of 2xMP2 connected to a CMN-600 with an 8MB SLC setup.

The board includes a CCIX compatible PCIe 4.0 x16 slot which serves the crucial role of enabling development and demonstrating cache-coherent integration with CCIX hardware such as Xilinx’s FPGA.

The N1 SoC actually doesn’t contain dedicated I/O IP, rather Arm implements all connectivity via a dedicated FPGA which serves as the I/O hub, supporting various connectivity options such as Ethernet, USB, SATA and so on.

Naturally the big selling point of the SDP is its completely open-source firmware stack from not only the OS drivers, but more importantly the SCP and MCP firmware.

An important new feature that is first employed by the new N1 CPU is the introduction of statistical profiling extensions (SPE). The new extension enables the first ever self-hosted profiling capability in an Arm CPU – meaning we don’t require a separate CPU or system having to read out microarchitectural counters. Instead the new SPE can be configured to directly write this information into memory. The tool is extremely useful for tracing code and analysing core behaviour, identifying possible performance issues and further squeezing out the maximum performance out of a platform, something Arm is taking very seriously if it wants to succeed and gain adoption in HPC.

Finally, the N1 SDP will be available later this quarter – although don’t expect the board to be easily attainable for the average user.

E1 Implementation & Performance Targets End Remarks: Strengthening the Infrastructure Ecosystem
Comments Locked

101 Comments

View All Comments

  • Andrei Frumusanu - Wednesday, February 20, 2019 - link

    > It also shows a result showing Zen roughly half the performance of Intel

    The W-3175X was at 4.5GHz with the whole 38MB of L3 for the one thread, while the 7601 ran at a peak of 3.2GHz.
  • Meteor2 - Wednesday, February 20, 2019 - link

    I wish you’d normalised for frequency!
  • Andrei Frumusanu - Wednesday, February 20, 2019 - link

    That's not the point of the article.
  • ZolaIII - Wednesday, February 20, 2019 - link

    Next time read twice before posting. AVX on integer benchmark really?
  • Wilco1 - Wednesday, February 20, 2019 - link

    Of course. Never heard of how SIMD hugely affects libquantum for example?
  • Andrei Frumusanu - Wednesday, February 20, 2019 - link

    AVX works on integer ...
  • ZolaIII - Wednesday, February 20, 2019 - link

    The era of general purpose core's being used for HPC is long time gone. While general purpose core's are hire to stay they will do that with modest number of core's per system, the real push is towards special purpose and multi purpose accelerators. FPGA's being put in the first row because their reprogrammable nature. The ARM actually have an edge over the CISC (X86) because it's simply more efficient which having stellar integer performance for the size of the core. If you look at the development bord it's very clear ARM is pushing into right direction.
  • Meteor2 - Wednesday, February 20, 2019 - link

    Kind of. But bottom line is the 20-odd codes used predominantly in the world still run best on general purpose CPUs. Bending software to work on specialised architectures is really hard.
  • ZolaIII - Thursday, February 21, 2019 - link

    On the FPGA you bend hardware. That's the whole idea.
  • wumpus - Thursday, February 21, 2019 - link

    HPC traditionally meant double precision FLOPS. AI work or similar might want FPGAs until GPUs are sufficiently ready for such things (then FPGA can't keep up).

    FPGAs are painfully slow at what they do, but can take an entirely new architecture on the fly. We saw that with cryptomining as things went CPU->GPU->FPGA->ASIC. And if you need a lot of multiply-accumulate (like most AI), don't expect anything between GPU and ASIC.

Log in

Don't have an account? Sign up now