Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

Name: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Item: Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
Author: Andrei Frumusanu

by Andrei Frumusanu on March 10, 2020 8:30 AM EST

96 Comments | Add A Comment

96 Comments

Compiler Setup, GCC vs LLVM

For further performance testing of the systems, we fell back to SPEC2006 and 2017. I wanted to make sure that there’s no heated discussions when it comes to the compilation of the test suites, so carefully investigated the compilers out there, particularly regarding the choice between GCC and LLVM.

Overall, I checked three different compiler setups: A freshly compiled GCC 9.2.0 release, Arm’s Allinea Studio Compiler 20 package which comes with both Arm’s closed source LLVM and Flang variants as well as a pre-compiled version of GCC 9.2.0, and Marvell’s branch of LLVM and Flang.

We had seen quite a push by Arm for us to consider GCC more closely than LLVM, as Arm had admitted that they’ve spent more time upstream optimising GCC than they’ve had for LLVM. Given the much more prevalent use of GCC in cloud and datacentre applications, I did somewhat agree with this given that’s most likely what you’ll see people use in such environments.

I ran some single-threaded tests across the different compiler setups, the compiler flags were straightforward with just a simple -Ofast flag as well as -march/-mcpu=cortex-a76 or =neoverse-n1 (alias) for the Arm compiler setup.

As always, our SPEC results aren't officially submitted results, and thus we have to label them merely as "estimates" for this article. Furthermore, SPEC2006 has been retired in favour of SPEC2017, but I still wanted to put up the figures for historical context, as well as mobile comparisons.

Graviton2 SPEC - Single Threaded - 2.5GHz

The overall results favour GCC in the SPECint workloads, while LLVM seemingly does better in the FP and memory heavy tests. Between the upstream GCC 9.2.0 and Arm’s precompiled version there’s seemingly no performance difference whatsoever, while there is some minor difference between Marvell’s setup and Arm’s branch of LLVM.

I ended up going forward with a clean compile of GCC 9.2.0 both for the Arm as well as x86 systems – meaning we’re using the exact same compiler for both architectures, just with different compile targets.

For x86, we’re again using the simple -Ofast flag for optimisations, and using the corresponding -march/-mtune targets for the EPYC and Intel platforms, meaning zenver1 and skylake-avx512.

Overall, it’s a bit odd to see GCC ahead in that many workloads given that LLVM the is the primary compiler for billions of Arm devices in the mobile space. Arm has said that they’re trying to put more effort into this compiler as seemingly it’s lagging behind GCC in terms of some optimisations.

Memory Subsystem & Latency SPEC - Single Threaded Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

96 Comments

View All Comments

Wilco1 - Friday, March 13, 2020 - link
Developing a chip based on a standard Arm core is much cheaper. Arm chip volumes are much higher than Intel and AMD, the costs are spread out over billions of chips.
ksec - Tuesday, March 10, 2020 - link
ARM's licensing comparatively speaking is extremely cheap even for their most expensive N1 Core Blueprint. The development and production cost are largely on ARM's because of the platform model. So Amazon is only really paying for the cost to Fab with TSMC, I would be surprised if those chip cost more than $300. Which is at least a few thousand less than Intel or even AMD.

Amazon will have to paid for all the software cost though. Making sure all their tools, and software runs on ARM. That is very expensive in engineering cost, but paid off in long term.
extide - Friday, March 13, 2020 - link
Actual production cost is going to be more like $50 or so. WAY less than $300.
ksec - Monday, March 30, 2020 - link
Only the Wafer Cost alone would be $50+ assuming 100% yield. That is excluding licensing and additional R&D. At their volume I would not be surprised it stack up to $300
FunBunny2 - Tuesday, March 10, 2020 - link
"Vertical integration is powerful."

I find it amusing that compute folks are reinventing the wheel from Henry Ford!! River Rouge.
mrvco - Tuesday, March 10, 2020 - link
It would be interesting to see how the AWS instances compare to performance-competitive Azure instances on a value basis.
kliend - Tuesday, March 10, 2020 - link
Anecdotally, Yes. Amazon is always trying to bring in users for little/no immediate profit.
skaurus - Tuesday, March 10, 2020 - link
At scale, predictability is more important in infrastructure than cost. It may seem that if we have everything we need compiled for Arm, we can just switch over. But these things often look easier in theory than practice. I'd be wary to move existing service to Arm instances, or even starting a new one when I just want to iterate fast and just be sure that underlying level doesn't have any new surprises.
It will be fine If I have time to experiment, or later, when the dust settles. Right now, I doubt that switching over to these instances once they are available, is actually easy or even smart decision.
FunBunny2 - Tuesday, March 10, 2020 - link
"It may seem that if we have everything we need compiled for Arm, we can just switch over. But these things often look easier in theory than practice. "

with language compliant compilers, I don't buy that argument. it can certainly be true that RISC-ier processors yield larger binaries and slower performance, but real application failure has to be due to OS mismatches. C is the universal assembler.
mm0zct - Wednesday, March 11, 2020 - link
Beware that in C struct packing is ABI dependent, if you write out a struct to disk on x86_64, and try and read it back in on Aarch64, you might have a bad time unless you use the packed pragma and use specified-width types. This is the sort of thing that might get you if you try to migrate between architectures.

Also many languages (including C) have hand optimised math libraries with inline assembler, which might still be using plain-C fallbacks on other architectures. There was a good article discussing the migration to Aarch64 at Cloudflare, they particulary encountered issues with go not being optimised on Aarch64 yet https://blog.cloudflare.com/arm-takes-wing/

Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

Compiler Setup, GCC vs LLVM

Post Your Comment

96 Comments

View All Comments

Wilco1 - Friday, March 13, 2020 - link

ksec - Tuesday, March 10, 2020 - link

extide - Friday, March 13, 2020 - link

ksec - Monday, March 30, 2020 - link

FunBunny2 - Tuesday, March 10, 2020 - link

mrvco - Tuesday, March 10, 2020 - link

kliend - Tuesday, March 10, 2020 - link

skaurus - Tuesday, March 10, 2020 - link

FunBunny2 - Tuesday, March 10, 2020 - link

mm0zct - Wednesday, March 11, 2020 - link

Log in

Don't have an account? Sign up now