Compiler Setup, GCC vs LLVM

For further performance testing of the systems, we fell back to SPEC2006 and 2017. I wanted to make sure that there’s no heated discussions when it comes to the compilation of the test suites, so carefully investigated the compilers out there, particularly regarding the choice between GCC and LLVM.

Overall, I checked three different compiler setups: A freshly compiled GCC 9.2.0 release, Arm’s Allinea Studio Compiler 20 package which comes with both Arm’s closed source LLVM and Flang variants as well as a pre-compiled version of GCC 9.2.0, and Marvell’s branch of LLVM and Flang.

We had seen quite a push by Arm for us to consider GCC more closely than LLVM, as Arm had admitted that they’ve spent more time upstream optimising GCC than they’ve had for LLVM. Given the much more prevalent use of GCC in cloud and datacentre applications, I did somewhat agree with this given that’s most likely what you’ll see people use in such environments.

I ran some single-threaded tests across the different compiler setups, the compiler flags were straightforward with just a simple -Ofast flag as well as -march/-mcpu=cortex-a76 or =neoverse-n1 (alias) for the Arm compiler setup.

As always, our SPEC results aren't officially submitted results, and thus we have to label them merely as "estimates" for this article. Furthermore, SPEC2006 has been retired in favour of SPEC2017, but I still wanted to put up the figures for historical context, as well as mobile comparisons.


Graviton2 SPEC - Single Threaded - 2.5GHz

The overall results favour GCC in the SPECint workloads, while LLVM seemingly does better in the FP and memory heavy tests. Between the upstream GCC 9.2.0 and Arm’s precompiled version there’s seemingly no performance difference whatsoever, while there is some minor difference between Marvell’s setup and Arm’s branch of LLVM.

I ended up going forward with a clean compile of GCC 9.2.0 both for the Arm as well as x86 systems – meaning we’re using the exact same compiler for both architectures, just with different compile targets.

For x86, we’re again using the simple -Ofast flag for optimisations, and using the corresponding -march/-mtune targets for the EPYC and Intel platforms, meaning zenver1 and skylake-avx512.

Overall, it’s a bit odd to see GCC ahead in that many workloads given that LLVM the is the primary compiler for billions of Arm devices in the mobile space. Arm has said that they’re trying to put more effort into this compiler as seemingly it’s lagging behind GCC in terms of some optimisations.

Memory Subsystem & Latency SPEC - Single Threaded Performance
Comments Locked

96 Comments

View All Comments

  • Wilco1 - Friday, March 13, 2020 - link

    Developing a chip based on a standard Arm core is much cheaper. Arm chip volumes are much higher than Intel and AMD, the costs are spread out over billions of chips.
  • ksec - Tuesday, March 10, 2020 - link

    ARM's licensing comparatively speaking is extremely cheap even for their most expensive N1 Core Blueprint. The development and production cost are largely on ARM's because of the platform model. So Amazon is only really paying for the cost to Fab with TSMC, I would be surprised if those chip cost more than $300. Which is at least a few thousand less than Intel or even AMD.

    Amazon will have to paid for all the software cost though. Making sure all their tools, and software runs on ARM. That is very expensive in engineering cost, but paid off in long term.
  • extide - Friday, March 13, 2020 - link

    Actual production cost is going to be more like $50 or so. WAY less than $300.
  • ksec - Monday, March 30, 2020 - link

    Only the Wafer Cost alone would be $50+ assuming 100% yield. That is excluding licensing and additional R&D. At their volume I would not be surprised it stack up to $300
  • FunBunny2 - Tuesday, March 10, 2020 - link

    "Vertical integration is powerful."

    I find it amusing that compute folks are reinventing the wheel from Henry Ford!! River Rouge.
  • mrvco - Tuesday, March 10, 2020 - link

    It would be interesting to see how the AWS instances compare to performance-competitive Azure instances on a value basis.
  • kliend - Tuesday, March 10, 2020 - link

    Anecdotally, Yes. Amazon is always trying to bring in users for little/no immediate profit.
  • skaurus - Tuesday, March 10, 2020 - link

    At scale, predictability is more important in infrastructure than cost. It may seem that if we have everything we need compiled for Arm, we can just switch over. But these things often look easier in theory than practice. I'd be wary to move existing service to Arm instances, or even starting a new one when I just want to iterate fast and just be sure that underlying level doesn't have any new surprises.
    It will be fine If I have time to experiment, or later, when the dust settles. Right now, I doubt that switching over to these instances once they are available, is actually easy or even smart decision.
  • FunBunny2 - Tuesday, March 10, 2020 - link

    "It may seem that if we have everything we need compiled for Arm, we can just switch over. But these things often look easier in theory than practice. "

    with language compliant compilers, I don't buy that argument. it can certainly be true that RISC-ier processors yield larger binaries and slower performance, but real application failure has to be due to OS mismatches. C is the universal assembler.
  • mm0zct - Wednesday, March 11, 2020 - link

    Beware that in C struct packing is ABI dependent, if you write out a struct to disk on x86_64, and try and read it back in on Aarch64, you might have a bad time unless you use the packed pragma and use specified-width types. This is the sort of thing that might get you if you try to migrate between architectures.

    Also many languages (including C) have hand optimised math libraries with inline assembler, which might still be using plain-C fallbacks on other architectures. There was a good article discussing the migration to Aarch64 at Cloudflare, they particulary encountered issues with go not being optimised on Aarch64 yet https://blog.cloudflare.com/arm-takes-wing/

Log in

Don't have an account? Sign up now