SPEC - Single Threaded Performance

We have some great expectations for the single-threaded performance of the Graviton2 and the Neoverse N1 CPU. In the mobile space, we’ve already seen the Cortex-A76 showcase some extremely competitive performance when compared to x86 platforms running at server frequencies. In particular, the comparison against the first-generation Graviton SoC and its Cortex-A72 cores should be interesting, so I also went ahead and also included comparison numbers on that platform – these figures should put better context into the massive generational uplift that Arm has achieved.

The performance figures tested here are not on a full vCPU instance of the platforms, but rather on “xlarge” variants with only 4 vCPUs, reason for this was simply we didn’t feel too much like paying 95% more for the computing time while the rest of the cores were sitting idle. This isn’t exactly the most optimal method for testing single-threaded performance though, depending on the platform.

One thing to consider in such a small vCPU instance is that you’re only using a fraction of the hardware platform for yourself, while there’s a possibility that there’s other users on other VMs running on the same platform. Such a setup is called having “noisy neighbours”, essentially meaning you’re co-hosted with other users on the same hardware. I did try to verify the figures by running them a few times, and the numbers were consistent on the Graviton2 and AMD platforms. The Graviton2 is still on preview availability so I don’t expect many users using up Amazon’s current deployments, and the AMD unit seemingly didn’t have issues and looked to remain at 2.9GHz throughout most of the testing. On the Intel Xeon platform however, I did see some larger variations, and I think that was mostly due to noisy neighbours brining down the boost clocks of the system down from its 3.2GHz peak. The published numbers here is the higher result set which should be running at around 3.2GHz.

SPECint2006 Speed Estimated Scores

Starting off with SPECint2006, the Graviton2 and N1 CPU are doing extremely well. It’s showcasing almost double the ST performance across the table compared to the A72 based SoC, and it’s even beating the EPYC 7571 across most benchmarks, slightly lagging behind the Xeon instance in some benchmarks.

The Graviton2 is doing particularly well in the memory tests, and latency sensitive tests like 429.mcf are faring significantly better than what we see on the mobile Cortex-A76 SoCs.

SPECfp2006(C/C++) Speed Estimated Scores

In the C/C++ tests of SPECfp2006 (identical set to what se test on mobile, no Fortran compiler available on those platforms), we see the Graviton2 do even better. The delta to the Cortex-A72 platform is even bigger thanks to the more memory sensitive nature of these tests. Here, the Graviton2 is also a lot closer to the x86 competition, staying neck-in-neck with the AMD and Intel platforms.

SPEC2006 Speed Estimated Total (xlarge)

For the aggregate stores in SPEC2006, the performance uplift compared to the first-gen Graviton is 2x in integer workloads, and 2.2x in FP workloads. Intel is slightly ahead in integer ST performance here, but that gap is reduced to a very thin margin on the FP tests. It’s a great showcase of the Neoverse N1’s IPC capabilities, as the cores are only running at 2.5GHz compared to ~2.9GHz for the AMD system and ~3.2GHz for the Intel system.

Compared to a mobile Cortex-A76 such as in the Kirin 990 (which is the best A76 implementation out there), the resulting IPC is 32% better for the Graviton2 in SPECint2006, and 10% better for SPECfp2006. This goes to show what kind of a massive difference the memory subsystem can have on a system that is otherwise similar in terms of the CPU microarchitecture. We must not forget that the N1 here has the whole 32MB L3 cache available all to itself, even when using a smaller two core vCPU instance.

SPECint2017 Rate-1 Estimated Scores

We’re also covering the SPEC2017 results. In general, the new suite slightly changes up the workloads and, in some cases, increases their complexity, but in SPECint2017, there’s also tests which are laxer compared to their 2006 variants, for example 505.mcf is only using half the memory footprint compared to 429.mcf.

Still, the Graviton2 again here is showcasing some extremely good performance across the board, and is largely mimicking the 2006 results.

SPECfp2017 Rate-1 Estimated Scores

The fp2017 results are definitely a more complex set, but again, the Graviton 2 doesn’t have issues keeping up, although this time around it does more often than not lose out to the x86 parts.

SPEC2017 Rate-1 Estimated Total (xlarge)

In SPECint2017 the Graviton2 is able to showcase a better relative positioning compared to the 2006 tests, just shy of keeping up with the 3.2GHz Cascade Lake system, however in the fp2017 results it’s faring a bit worse than the 2006 system, showcasing a larger margin where it falls behind the competition.

Again, compared to the A1 based Graviton1 instances, the new chip essentially showcases double the single-thread performance, signifying that Arm is now able to compete amongst the big boys in the courtyard.

The results here are a bit shy of what Arm had projected for the N1 platform last year, but the reason for that is that Amazon was quite conservative in terms of the clock frequencies of the Graviton2, as well as only employing 32MB of L3 cache versus the 64MB that Arm had envisioned for a 64-core part. At least on the frequency side, Ampere’s new Altra system running at 3GHz should see scores 20% higher than the figures presented by the Graviton2.

Lastly, let’s again not forget that this isn’t the whole competitive landscape as we don’t have AMD Rome-based instances available to us at this point of time, I’m pretty sure those figures will be a larger leap ahead of the pack presented here.

Compiler Setup, GCC vs LLVM SPEC - Multi-Core Performance Scaling
Comments Locked

96 Comments

View All Comments

  • eastcoast_pete - Tuesday, March 10, 2020 - link

    While I am currently not in the market for such cloud computing services aside from maybe some video processing, I for one welcome the arrival of a competitive non-x86 solution! Can only make life better and cheaper when and if I do. Also, ARM N1 arch lighting a fire under the x86 makers in their easy chairs will keep AMD and Intel on their feet, and that advance will filter down to my future desktops and laptops.
  • eastcoast_pete - Tuesday, March 10, 2020 - link

    Thanks Andrei! Just out of curiosity, that "noisy neighbor" behavior you saw on the Xeon? I know it's mostly speculation, but would you expect this if someone is running AVX512 on neighboring cores? AVX512 is very powerful if applications can make use of it, but things get very toasty fast. Care to speculate?
  • willgart - Tuesday, March 10, 2020 - link

    where are the real life benchmarks???
    video encoding / decoding ?
    database performance ?
    web performance ?
    https encryption ?
    etc...
  • The_Assimilator - Thursday, March 12, 2020 - link

    Agreed 100%. Without figures of actual real-world applications compiled with actual real-world compilers handling actual real-world workloads, this essentially amounts to an advertorial for Amazon, Graviton2 and Arm.
  • Danvelopment - Wednesday, March 11, 2020 - link

    This may sound stupid as I'm just getting into AWS as backup throughput for local servers on my web project that releases April.

    "If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant."

    How do you know whether what you're using is Intel, AMD or Graviton(1/2)? (I'm using T2s right now with no weighting and if our release gets hit hard, will give it weight and and increase its capacity).

    As they're not actually doing anything, then I'd have no issue switching over, but can't tell what I'm on.
  • CampGareth - Wednesday, March 11, 2020 - link

    There's a list here: https://aws.amazon.com/ec2/instance-types/

    If you're on T2 instances you're on Intel chips at the moment.
  • Quantumz0d - Wednesday, March 11, 2020 - link

    No real benchmark. Another SPEC Whiteknighting. I see the AT forums Apple CPU thread being getting creamed over this again.

    ARM is a lockdown POS. You can't even buy them in this case. Altera CPU didn't even came to STH for comparision where it had so many cores against x86 parts. You cannot get them running majority of the consumer workload. One can claim Power from IBM has SMT8 and first Gen4 and all but if its not consumer centric it won't generate much of profit.

    Author seems to love ARM for some reason and hate x86. Its been since Apple articles but in real time we saw how iPhone gets decimated in speed comparison against Android Flagships running the stone age Qualcomm. We have seen this ARM dethroning x86 numerous times and failed. I hope this also fails, a non standard CPU leaves all fun out of equation. And needs emulation for consumer use which slows down performance.

    People want to see all the workloads. Not SPEC. Also where is EPYC Rome comparision Nowhere. Soon Milan is going to hit. Glad that AMD is alive. This stupid ARM BGA dumpster should be dead in its infancy.
  • Wilco1 - Wednesday, March 11, 2020 - link

    LOL - someone feels extremely threatened by Arm servers...

    Mission accomplished!
  • anonomouse - Wednesday, March 11, 2020 - link

    Well that was bizarrely incoherent. What workloads would you want to see instead? Nothing else you wrote made any sense or had any facts behind it.
  • Andrei Frumusanu - Wednesday, March 11, 2020 - link

    He's been doing it for the last year or two, ignore it.

Log in

Don't have an account? Sign up now