Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

Name: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Item: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Author: Andrei Frumusanu

by Andrei Frumusanu on March 10, 2020 8:30 AM EST

96 Comments | Add A Comment

96 Comments

Compiler Setup, GCC vs LLVM

For further performance testing of the systems, we fell back to SPEC2006 and 2017. I wanted to make sure that there’s no heated discussions when it comes to the compilation of the test suites, so carefully investigated the compilers out there, particularly regarding the choice between GCC and LLVM.

Overall, I checked three different compiler setups: A freshly compiled GCC 9.2.0 release, Arm’s Allinea Studio Compiler 20 package which comes with both Arm’s closed source LLVM and Flang variants as well as a pre-compiled version of GCC 9.2.0, and Marvell’s branch of LLVM and Flang.

We had seen quite a push by Arm for us to consider GCC more closely than LLVM, as Arm had admitted that they’ve spent more time upstream optimising GCC than they’ve had for LLVM. Given the much more prevalent use of GCC in cloud and datacentre applications, I did somewhat agree with this given that’s most likely what you’ll see people use in such environments.

I ran some single-threaded tests across the different compiler setups, the compiler flags were straightforward with just a simple -Ofast flag as well as -march/-mcpu=cortex-a76 or =neoverse-n1 (alias) for the Arm compiler setup.

As always, our SPEC results aren't officially submitted results, and thus we have to label them merely as "estimates" for this article. Furthermore, SPEC2006 has been retired in favour of SPEC2017, but I still wanted to put up the figures for historical context, as well as mobile comparisons.

Graviton2 SPEC - Single Threaded - 2.5GHz

The overall results favour GCC in the SPECint workloads, while LLVM seemingly does better in the FP and memory heavy tests. Between the upstream GCC 9.2.0 and Arm’s precompiled version there’s seemingly no performance difference whatsoever, while there is some minor difference between Marvell’s setup and Arm’s branch of LLVM.

I ended up going forward with a clean compile of GCC 9.2.0 both for the Arm as well as x86 systems – meaning we’re using the exact same compiler for both architectures, just with different compile targets.

For x86, we’re again using the simple -Ofast flag for optimisations, and using the corresponding -march/-mtune targets for the EPYC and Intel platforms, meaning zenver1 and skylake-avx512.

Overall, it’s a bit odd to see GCC ahead in that many workloads given that LLVM the is the primary compiler for billions of Arm devices in the mobile space. Arm has said that they’re trying to put more effort into this compiler as seemingly it’s lagging behind GCC in terms of some optimisations.

Memory Subsystem & Latency SPEC - Single Threaded Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

96 Comments

View All Comments

eastcoast_pete - Tuesday, March 10, 2020 - link
While I am currently not in the market for such cloud computing services aside from maybe some video processing, I for one welcome the arrival of a competitive non-x86 solution! Can only make life better and cheaper when and if I do. Also, ARM N1 arch lighting a fire under the x86 makers in their easy chairs will keep AMD and Intel on their feet, and that advance will filter down to my future desktops and laptops.
eastcoast_pete - Tuesday, March 10, 2020 - link
Thanks Andrei! Just out of curiosity, that "noisy neighbor" behavior you saw on the Xeon? I know it's mostly speculation, but would you expect this if someone is running AVX512 on neighboring cores? AVX512 is very powerful if applications can make use of it, but things get very toasty fast. Care to speculate?
willgart - Tuesday, March 10, 2020 - link
where are the real life benchmarks???
video encoding / decoding ?
database performance ?
web performance ?
https encryption ?
etc...
The_Assimilator - Thursday, March 12, 2020 - link
Agreed 100%. Without figures of actual real-world applications compiled with actual real-world compilers handling actual real-world workloads, this essentially amounts to an advertorial for Amazon, Graviton2 and Arm.
Danvelopment - Wednesday, March 11, 2020 - link
This may sound stupid as I'm just getting into AWS as backup throughput for local servers on my web project that releases April.

"If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant."

How do you know whether what you're using is Intel, AMD or Graviton(1/2)? (I'm using T2s right now with no weighting and if our release gets hit hard, will give it weight and and increase its capacity).

As they're not actually doing anything, then I'd have no issue switching over, but can't tell what I'm on.
CampGareth - Wednesday, March 11, 2020 - link
There's a list here: https://aws.amazon.com/ec2/instance-types/

If you're on T2 instances you're on Intel chips at the moment.
Quantumz0d - Wednesday, March 11, 2020 - link
No real benchmark. Another SPEC Whiteknighting. I see the AT forums Apple CPU thread being getting creamed over this again.

ARM is a lockdown POS. You can't even buy them in this case. Altera CPU didn't even came to STH for comparision where it had so many cores against x86 parts. You cannot get them running majority of the consumer workload. One can claim Power from IBM has SMT8 and first Gen4 and all but if its not consumer centric it won't generate much of profit.

Author seems to love ARM for some reason and hate x86. Its been since Apple articles but in real time we saw how iPhone gets decimated in speed comparison against Android Flagships running the stone age Qualcomm. We have seen this ARM dethroning x86 numerous times and failed. I hope this also fails, a non standard CPU leaves all fun out of equation. And needs emulation for consumer use which slows down performance.

People want to see all the workloads. Not SPEC. Also where is EPYC Rome comparision Nowhere. Soon Milan is going to hit. Glad that AMD is alive. This stupid ARM BGA dumpster should be dead in its infancy.
Wilco1 - Wednesday, March 11, 2020 - link
LOL - someone feels extremely threatened by Arm servers...

Mission accomplished!
anonomouse - Wednesday, March 11, 2020 - link
Well that was bizarrely incoherent. What workloads would you want to see instead? Nothing else you wrote made any sense or had any facts behind it.
Andrei Frumusanu - Wednesday, March 11, 2020 - link
He's been doing it for the last year or two, ignore it.

Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

Compiler Setup, GCC vs LLVM

Post Your Comment

96 Comments

View All Comments

eastcoast_pete - Tuesday, March 10, 2020 - link

eastcoast_pete - Tuesday, March 10, 2020 - link

willgart - Tuesday, March 10, 2020 - link

The_Assimilator - Thursday, March 12, 2020 - link

Danvelopment - Wednesday, March 11, 2020 - link

CampGareth - Wednesday, March 11, 2020 - link

Quantumz0d - Wednesday, March 11, 2020 - link

Wilco1 - Wednesday, March 11, 2020 - link

anonomouse - Wednesday, March 11, 2020 - link

Andrei Frumusanu - Wednesday, March 11, 2020 - link

Log in

Don't have an account? Sign up now