First N1 Silicon: Enabling the Ecosystem with SDPs

A little known fact about Arm is that the company designs its own silicon test platform – actually deploying them on development board to enable validation and software development on hardware that Arm and developers have full control of. The latest generation was the Juno platform, which in its first revision started off with a Cortex A57 and served as the fundamental silicon testbed for ARMv8 software.

Ever since Arm started the programme in 2014, Arm has shipped over 1400 boards both internally and to its partners. The amount of chips we’re talking about here sounds paltry, however we have to keep in mind we’re talking about very limited shuttle runs on MPW (multi-project wafers) where Arm shares wafer space with numerous other companies.

For today’s announcement, Arm had the pleasure to reveal that it received back the first working Neoverse N1 silicon back in December – with the chips meant to be integrated into the new Neoverse System Development Platform (SDP).

The N1 SDP represents major step for Arm as it not only is the first silicon to come back with the N1 CPU, but also is Arm’s first own 7nm silicon. The platform represents a major proof of concept of the IP, as well as interoperability with third-party IP, employing a lot of the peripheral IP such as PCIe and DDR PHY supplied by Cadence.

The actual hardware is a limited implementation of an N1 SoC – we find a 4-core N1 CPU with 1MB L2 configuration in the form of 2xMP2 connected to a CMN-600 with an 8MB SLC setup.

The board includes a CCIX compatible PCIe 4.0 x16 slot which serves the crucial role of enabling development and demonstrating cache-coherent integration with CCIX hardware such as Xilinx’s FPGA.

The N1 SoC actually doesn’t contain dedicated I/O IP, rather Arm implements all connectivity via a dedicated FPGA which serves as the I/O hub, supporting various connectivity options such as Ethernet, USB, SATA and so on.

Naturally the big selling point of the SDP is its completely open-source firmware stack from not only the OS drivers, but more importantly the SCP and MCP firmware.

An important new feature that is first employed by the new N1 CPU is the introduction of statistical profiling extensions (SPE). The new extension enables the first ever self-hosted profiling capability in an Arm CPU – meaning we don’t require a separate CPU or system having to read out microarchitectural counters. Instead the new SPE can be configured to directly write this information into memory. The tool is extremely useful for tracing code and analysing core behaviour, identifying possible performance issues and further squeezing out the maximum performance out of a platform, something Arm is taking very seriously if it wants to succeed and gain adoption in HPC.

Finally, the N1 SDP will be available later this quarter – although don’t expect the board to be easily attainable for the average user.

E1 Implementation & Performance Targets End Remarks: Strengthening the Infrastructure Ecosystem
Comments Locked

101 Comments

View All Comments

  • surt - Thursday, February 21, 2019 - link

    That raw power comes at a .... power cost. And as soon as you try to start z-stacking your cpus that power is going to be the most important factor.
  • peevee - Tuesday, February 26, 2019 - link

    "The future is way more related to modularity than the chip architecture."

    Debatable. Both ARM and x64 are essentially the same in terms of efficiency if the same levels of performance are required. A breakthrough can only come from in-memory computing, which neither ARM nor x64 can sustain for many reasons.
  • rahvin - Thursday, February 21, 2019 - link

    ARM is not "more efficient at every level". That's just plain fanboi BS. The architecture is the least important aspect of any processor these days.

    ARM processors were traditionally designed for power efficiency above all else, now that Intel is designing down for efficiency and ARM is designing up for power there will likely be some real competition but so far ARM has not demonstrated that they can provide equivalent power for the same power budget at the high end and Intel has had difficulty matching the lower power budget and performance on the low end (though this is likely due to them wishing to avoid cannibalizing higher end products with performant low power versions).

    As ARM tries to enter the server market we'll finally see if they can provide something equivalent, but it's not been a hopeful showing given that all but one ARM server design has been canceled and it's not equivalent to an x86 server processor of the same character in either power or performance.
  • Wilco1 - Thursday, February 21, 2019 - link

    Today you can buy Arm-based servers like Operon A1100, Centriq, ThunderX, ThunderX2, eMAG and HiSilicon. The first Arm supercomputer entered the TOP500 list recently, and Fujitsu has prototypes of their Post-K computer. You can buy Arm compute time from several cloud vendors today, including AWS. That all adds up to one Arm server in your book?
  • rahvin - Thursday, February 21, 2019 - link

    ThunderX is gone, displaced by the ThunderX2 which is the Centriq processor after it was abandoned by it's creator. eMAG, A1100 and the HiSilicon Last I saw are all canceled.

    Commercially you can buy one ARM server, the ThunderX2. Go ahead, TRY to buy one.
  • Wilco1 - Thursday, February 21, 2019 - link

    How could you be so clueless? ThunderX2 is based on Vulcan made by Broadcomm, no relation with Centriq at all. ThunderX is still being used and sold. Centriq is still being sold, a few months ago Gigabyte announced a brand new motherboard for it. eMAG is just announced. HiSilicon/Huawei has 2 generations of Arm servers already and is working on several more. That's the only one that isn't for sale outside of China according to AnandTech.

    What's next? Are you going to tell us that Arm servers did not beat Xeon and SkyLake in various benchmarks, eventhough the evidence was published in an article on AnandTech?
  • rahvin - Thursday, February 21, 2019 - link

    Your right I confused the Vulcan and the Centriq. The Centriq is dead, the design teams gone,and there is no plan to even spin the silicon from what I've seen. Qualacom abandoned the product under threat from an activist investor. Yea there was a motherboard at CES but that doesn't mean anything at all and there is literally no way to buy one.

    ThunderX is depreciated (show me where you can buy one, they depreciated the silicon over a year ago, there may still be some inventory out there but I seriously doubt it), ThunderX2 is available, and from everything I saw it's awful. The best work case was as a nginx master server because the compute capacity was so awful. Basically you need a workload with a lot of threads and no actual work to even make it worth anything at all, especially considering the price.

    The Huawei junk is a nonstarter, you can't buy it anywhere but China that I've seen and it's not exactly flying off the shelves either. I've seen more ARM servers announced and canceled a year later than any that made it off the shelves into an actual product. So there is an eMag, that's great show me where you can buy it.

    That's my point, you can't buy them, other than the ThunderX2 or the Huawei if you want to go to china to get it. The Arm server has been a flash in the pan and I have no doubt it will continue to be so.
  • FunBunny2 - Wednesday, February 20, 2019 - link

    one has to wonder: given the existence of C compilers for any ISA, and thus *nix OS for said ISA, when (or already?) will the maths dictate both the 'optimal' ISA and underlying microarch? both, after all, are just maths optimization problems. to some delta, there is a unique solution.
  • zmatt - Wednesday, February 20, 2019 - link

    Baring any major design flaws there shouldn't really be a difference in theoretical performance between ISAs. Its important to note that the ISA isn't the actual logic of the chip, its better thought of as a paper standard a given chip needs to conform to if it wants to be binary compatible. The real determination in performance is the microarchitecture. People conflate this with ISA a lot because they are both architectures but the Micro arch is what describes the actual logic design in the circuit. That is what Intel and AMD apply codenames to. So things like Skylake, Thunderbird, Cortex A53 etc are micro architectures.
  • Wilco1 - Wednesday, February 20, 2019 - link

    There certainly are differences between ISAs which cannot be overcome with micro-architecture no matter how much money, power or transistors you throw at it. Given equal resources, the best possible implementations of various ISAs will exhibit major performance differences.

Log in

Don't have an account? Sign up now