End Remarks: Strengthening the Infrastructure Ecosystem

If there’s one thing that readers should take away from today’s presentations, it’s the fact that Arm is taking the infrastructure and server push extremely seriously. The last year in particular has been transformative for the Arm ecosystem as we’ve for the first time seen Arm vendor platforms be competitive with the major incumbents such as Intel and AMD.

The elephant in the room is Amazon, and last year’s reveal of a new AWS instance based on their own-in house ARMv8 Graviton processors marked a significant moment showcasing that Arm is now irrefutably becoming mainstream in the industry.

While Arm did not divulge any information on who will be employing the new Neoverse N1 platforms first – I would not be surprised if the next generation Graviton processor will based on the N1 CPU.

The N1 CPU looks to be an excellent CPU that targets a sweet-spot point between peak compute performance, overall throughput. And most importantly it maintains the leading power efficiency that is already found in Arm's mobile products. Arm has high hopes for N1 and its eventual successors, and for good reason: they're looking to steal market share away from the likes of Intel (and x86 servers in general), which has proven to be an entrenched market full of very high performance processors. For that reason Arm is bringing their best to the table, and while N1 isn't going to be a core-for-core competitor with flagship x86, it stands to pose a significant threat, especially in workloads that can easily scale up to a larger number of cores.

Meanwhile the new E1 CPU targets the expanding market for high throughput processors, which with the upcoming shift to 5G will require more throughput performance at low power levels. Here Arm seems to have custom-tailored a CPU specifically to serve such use-cases. This is a move that's arguably less about stealing market share from any one player, and more about being in the right place at the right time to secure their place in what should be a rapidly growing market. In that sense the E1 is a very traditional Arm move – focus on cost and simpler processors – and this has been a move that's continued to serve Arm well over the years.

Although the new hardware IP is impressive, what also matters greatly is Arm’s efforts into strengthening the Arm software ecosystem. Working with various industry hardware and software partners in trying to facilitate the software stack and interoperability with Arm not only benefits vendors using Arm’s own hardware IP, but also vendors who chose the route of employing their own custom CPU and SoC designs. Similarly, those vendors who are trying to improve and strengthen their own products will inevitably feed back into strengthening the Arm ecosystem as well – creating essentially what is a group effort between many companies that in the future will continue to gain momentum.

It's said that the Neoverse N1 will be commercially deployed by partners in the next 12-18 months, and I think this will be a crucial moment for Arm and the company’s server endeavours. If the major breakthrough in mind-share hasn’t already happened, if all goes well and Arm and partners deliver on the promised improvements, the next 1-2 years will certainly represent a major shift in the industry.

First N1 Silicon: Enabling the Ecosystem with SDPs
Comments Locked

101 Comments

View All Comments

  • Antony Newman - Thursday, February 21, 2019 - link

    (Arbitrary example)

    If a SoC can run at 5GHz when 8 cores single core, but throttles down to 2.5GHz when 16 cores are active - then it cannot scale (due to the TDP limit).

    If ARM are designing their CPUs so that 128 (ie all) of them can run flat out without requiring throttling, then ARMs single core performance is indicative of the overall performance.

    If ARM increase their single core performance by 1.7 times in two years - and keep this same MO (of no Throttling cause to keep within the TDP) - it will be more than just data centres that want to buy into this new architecture.

    AJ
  • wumpus - Thursday, February 21, 2019 - link

    Very few problems scale without penalty. Having high single core performance (for each core in a multichip server CPU, obviously. The Intel result using all of its cache on one core is obviously irrelevant and why it was so anomalous vs. AMD) means far less cores are needed, when scaling up. Also adding more and more cores require as much cache or more. If not, your bandwidth will scale even worse.

    Single core is absolutely critical for servers, and why it is taking ARM so long to break in. IBM is the exception that proves the rule: but they rely on weird licensing rules and making sure all the threads can access the same cache.
  • eastcoast_pete - Thursday, February 21, 2019 - link

    I actually think we are in agreement. While this borders on semantics, per core performance is, of course, very important for servers, while high single (one) core is not. As you point out, Intel getting really high one core performance from a 18 core Xeon by running a strictly single core/thread test while allocating all the cache and much of the thermal envelope to that one core is an artificial situation for a server.
  • The_Assimilator - Wednesday, February 20, 2019 - link

    Remember when "system on chip" meant IO too? Apparently Arm doesn't.

    Remember when Arm chips didn't need HSFs to run? Pepperidge Farm remembers.

    I'm going to enjoy it when this, like all of Arm's previous attempts at the high-end, fails once again. Or when Lakefield eats Arm's lunch, whichever comes first.
  • wumpus - Wednesday, February 20, 2019 - link

    When your volume is 1400 chips (not all the same design) over 4 years, you use FPGA for anything you can. Doing anything else is pretty dumb. I'm surprised they bothered with an actual layout, but I suspect that they've been bitten by tiny details in FPGA simulation that never quite worked the same at speed.

    HSF? You want the MIPS, you burn the Watts. Presumably this is your "tell" in your troll.

    When has ARM made a previous attempt at the high-end? Certainly more than a few of their architectural licensees have, but there's a huge difference between a server architecture backed by ARM and even one backed by Qualcom. For one thing, they pretty much need to standardize remote adminstration to Intel levels (possibly circa ~2008ish to get off the ground). That's a lot of pesky little details, but something they absolutely need standardized to allow server use in the datacenter (yes, the Big Boys can roll their own, but everybody else needs a common server definition.
  • Antony Newman - Wednesday, February 20, 2019 - link

    Fascinating article.

    Do you think Ampere, Huawei, Cavium and Amazon will all switch to the Neoverse?

    In terms of IPC - do you have a view on if ARM have Caught up with Apples Vortex yet?

    Is there any reason why a mobile phone (or Tablet) maker would’t use the ARM ‘server’ chip in a fondleslab?

    AJ
  • ballsystemlord - Wednesday, February 20, 2019 - link

    Spelling and grammar corrections:
    ...the actual real-life performance improvements will higher due other SoC-level improvements as well as software improvements that aren't available in existing actual A72 silicon products.
    Missing be:
    ...the actual real-life performance improvements will behigher due other SoC-level improvements as well as software improvements that aren't available in existing actual A72 silicon products.

    The figured weren't run actual silicon but rather estimated on Arm's server farm in an emulation environment with RTL.
    Miswritten sentence:
    The figures weren't calculated on actual silicon but rather estimated on Arm's server farm in an emulation environment with RTL.

    The E1's CPU pipeline actually represents a brand new-design which (besides the A65) haven't seen employed before.
    Missing we:
    The E1's CPU pipeline actually represents a brand new-design which (besides the A65) we haven't seen employed before.

    Here we have to clusters of 8 cores in a small CMN-600 2x4 mesh network, ...
    Wrong 2:
    Here we have two clusters of 8 cores in a small CMN-600 2x4 mesh network, ...

    I was half asleep when I read it so there might be more.
  • sohntech43 - Wednesday, February 20, 2019 - link

    Could someone help me understand why the Spec CPU2006 results are so different from those recorded for the AMD 7601 (1000 - 1200 vs. 690.63) and Xeon Platinum results (1300+ vs 730) in the Spec data base?

    https://www.spec.org/cpu2006/results/cpu2006.html

    They are also different from what AMD was boasting at the time of the original EPYC launch:

    https://www.microway.com/download/whitepaper/AMD-E...

    I'm probably missing something obvious...
  • Wilco1 - Wednesday, February 20, 2019 - link

    Yes you're missing the fact these are GCC8 scores using -Ofast as mentioned in the article - ie. like when you build code yourself.

    Official SPEC scores are quite different and use special trick compilers to get the highest score. For example libquantum shows a completely unrealistic result in most SPEC submissions which artificially inflates the integer score by 30+%.
  • sohntech43 - Wednesday, February 20, 2019 - link

    Thanks - was surprised by the sheer magnitude of the delta caused by the compilers. Impressive results for N1 and will be interesting to see when silicon is available.

Log in

Don't have an account? Sign up now