Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

Name: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Item: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Author: Andrei Frumusanu

by Andrei Frumusanu on March 10, 2020 8:30 AM EST

96 Comments | Add A Comment

96 Comments

SPEC - MT Performance (4xlarge 16 vCPU)

The 64-core results were quite interesting and put the Graviton2 in a very competitive performance position, but all this talk about performance scaling varying depending on the loaded core count of the system made me wonder how the EC2 instances would perform at lower vCPU counts.

I fired up the same tests, just this time around with only rate-16 to match the number of vCPUs. These are 4xlarge EC2 instances with corresponding 16 vCPUs, but there’s one large caveat in this comparison that we must keep in mind: The Graviton2 instances very likely have no neighbours at this point in time in the test preview, meaning the performance scaling we’re seeing here is very much a best-case scenario for the Amazon chip. EC2 global capacity floats around at 60% active usage, and I imagine Amazon distributes this horizontally across the available sockets in their datacentres. How these performance figures will look like in the real world once Graviton2 ramps up in public availability is anybody’s guess.

The AMD system likely won’t care too much about such scenarios as their NUMA nature means they’re isolated from noisy neighbours anyhow, and we’re just seeing use of a single 8-core chip with its own memory controllers, but the Intel system will have possibly some neighbours doing some activity on the same socket and shared resources. I only ran one test run here; you’d probably need a lot of data to get a representative figure across EC2 usage.

For the Intel m5n instances, using an 4xlarge instance actually means you're only on on single socket this time around, meaning that the scaling behaviour in favour of higher per-thread performance isn't to be expected as high as on the Graviton2 system, as system DRAM bandwidth and L3 is halved compared to the 16xlarge figures on the previous page.

Also, since we’re testing 16 vCPU setups here, we can have an apples-to-apples comparison between the first- and second-generation Graviton systems which should be a fun comparison.

SPECint2006 Rate Estimated Scores (16 vCPU)

The comparison between the two generations of Graviton processors here is also astounding. Memory intensive workloads favour the newer Graviton2 by at least a factor of 2x, more often 3x, 4x, 5x and even up to 7x in libquantum.

The AMD system as expected doesn’t gain much scaling from using less cores as there’s no more shared resources available on a per-thread basis. The Intel chip fares slightly better per-thread, but doesn’t see the same higher performance scaling (Or should I say, reverse-scaling) as achieved by the Graviton2.

SPECfp2006(C/C++) Rate Estimated Scores (16 vCPU)

In fp2006, we see more or less the same kind of results.

SPEC2006 Rate-16 Estimated Total (4xlarge)

Overall, in the 16-vCPU rate results the Graviton2 surpasses the performance advantage it showcased in the 64-core results, ending up with an even bigger margin.

SPECint2017 Rate Estimated Scores (16 vCPU) SPECfp2017 Rate Estimated Scores (16 vCPU) SPEC2017 Rate-16 Estimated Total (4xlarge)

The SPEC2017 results again show the same conclusion – the Graviton2 really gains a ton of per-thread performance through the ability to use more of the chip’s L3 cache and 8 memory channels. Whilst on the 64-rate results the Graviton2 and the Xeon were neck-in-neck in fp2017, here the Graviton ends up with a 44% performance advantage.

Again, I can’t put enough emphasis on this, but these results are a best-case scenario for the 4xlarge 16vCPU results of the Graviton2. If production instances are able to achieve such figures will very largely depend on the draw of luck on whether you’re going to be alone on the physical hardware or whether you’ll have any neighbours on the chip. And even if you have neighbours, the performance figures will largely depend on what kind of workloads they will be running alongside your use-cases.

I saw a few articles out there comparing the performance between the m6g instances against the m5 generation instances (Skylake-SP hardware), but most of these tests were done only on medium (1 vCPU) to xlarge (4 vCPUs). When reading such pieces, it’s naturally important to keep in mind the vast scaling advantage the Graviton2 chip has – the smaller your instance is the more chance you’ll have noisy neighbours on the hardware, something that currently just doesn’t happen in the Graviton2’s preview phase.

SPEC - MT Performance (16xlarge 64vCPU) Cost Analysis - An x86 Massacre

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

96 Comments

View All Comments

eastcoast_pete - Tuesday, March 10, 2020 - link
While I am currently not in the market for such cloud computing services aside from maybe some video processing, I for one welcome the arrival of a competitive non-x86 solution! Can only make life better and cheaper when and if I do. Also, ARM N1 arch lighting a fire under the x86 makers in their easy chairs will keep AMD and Intel on their feet, and that advance will filter down to my future desktops and laptops.
eastcoast_pete - Tuesday, March 10, 2020 - link
Thanks Andrei! Just out of curiosity, that "noisy neighbor" behavior you saw on the Xeon? I know it's mostly speculation, but would you expect this if someone is running AVX512 on neighboring cores? AVX512 is very powerful if applications can make use of it, but things get very toasty fast. Care to speculate?
willgart - Tuesday, March 10, 2020 - link
where are the real life benchmarks???
video encoding / decoding ?
database performance ?
web performance ?
https encryption ?
etc...
The_Assimilator - Thursday, March 12, 2020 - link
Agreed 100%. Without figures of actual real-world applications compiled with actual real-world compilers handling actual real-world workloads, this essentially amounts to an advertorial for Amazon, Graviton2 and Arm.
Danvelopment - Wednesday, March 11, 2020 - link
This may sound stupid as I'm just getting into AWS as backup throughput for local servers on my web project that releases April.

"If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant."

How do you know whether what you're using is Intel, AMD or Graviton(1/2)? (I'm using T2s right now with no weighting and if our release gets hit hard, will give it weight and and increase its capacity).

As they're not actually doing anything, then I'd have no issue switching over, but can't tell what I'm on.
CampGareth - Wednesday, March 11, 2020 - link
There's a list here: https://aws.amazon.com/ec2/instance-types/

If you're on T2 instances you're on Intel chips at the moment.
Quantumz0d - Wednesday, March 11, 2020 - link
No real benchmark. Another SPEC Whiteknighting. I see the AT forums Apple CPU thread being getting creamed over this again.

ARM is a lockdown POS. You can't even buy them in this case. Altera CPU didn't even came to STH for comparision where it had so many cores against x86 parts. You cannot get them running majority of the consumer workload. One can claim Power from IBM has SMT8 and first Gen4 and all but if its not consumer centric it won't generate much of profit.

Author seems to love ARM for some reason and hate x86. Its been since Apple articles but in real time we saw how iPhone gets decimated in speed comparison against Android Flagships running the stone age Qualcomm. We have seen this ARM dethroning x86 numerous times and failed. I hope this also fails, a non standard CPU leaves all fun out of equation. And needs emulation for consumer use which slows down performance.

People want to see all the workloads. Not SPEC. Also where is EPYC Rome comparision Nowhere. Soon Milan is going to hit. Glad that AMD is alive. This stupid ARM BGA dumpster should be dead in its infancy.
Wilco1 - Wednesday, March 11, 2020 - link
LOL - someone feels extremely threatened by Arm servers...

Mission accomplished!
anonomouse - Wednesday, March 11, 2020 - link
Well that was bizarrely incoherent. What workloads would you want to see instead? Nothing else you wrote made any sense or had any facts behind it.
Andrei Frumusanu - Wednesday, March 11, 2020 - link
He's been doing it for the last year or two, ignore it.

Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

SPEC - MT Performance (4xlarge 16 vCPU)

Post Your Comment

96 Comments

View All Comments

eastcoast_pete - Tuesday, March 10, 2020 - link

eastcoast_pete - Tuesday, March 10, 2020 - link

willgart - Tuesday, March 10, 2020 - link

The_Assimilator - Thursday, March 12, 2020 - link

Danvelopment - Wednesday, March 11, 2020 - link

CampGareth - Wednesday, March 11, 2020 - link

Quantumz0d - Wednesday, March 11, 2020 - link

Wilco1 - Wednesday, March 11, 2020 - link

anonomouse - Wednesday, March 11, 2020 - link

Andrei Frumusanu - Wednesday, March 11, 2020 - link

Log in

Don't have an account? Sign up now