Compiler Setup, GCC vs LLVM

For further performance testing of the systems, we fell back to SPEC2006 and 2017. I wanted to make sure that there’s no heated discussions when it comes to the compilation of the test suites, so carefully investigated the compilers out there, particularly regarding the choice between GCC and LLVM.

Overall, I checked three different compiler setups: A freshly compiled GCC 9.2.0 release, Arm’s Allinea Studio Compiler 20 package which comes with both Arm’s closed source LLVM and Flang variants as well as a pre-compiled version of GCC 9.2.0, and Marvell’s branch of LLVM and Flang.

We had seen quite a push by Arm for us to consider GCC more closely than LLVM, as Arm had admitted that they’ve spent more time upstream optimising GCC than they’ve had for LLVM. Given the much more prevalent use of GCC in cloud and datacentre applications, I did somewhat agree with this given that’s most likely what you’ll see people use in such environments.

I ran some single-threaded tests across the different compiler setups, the compiler flags were straightforward with just a simple -Ofast flag as well as -march/-mcpu=cortex-a76 or =neoverse-n1 (alias) for the Arm compiler setup.

As always, our SPEC results aren't officially submitted results, and thus we have to label them merely as "estimates" for this article. Furthermore, SPEC2006 has been retired in favour of SPEC2017, but I still wanted to put up the figures for historical context, as well as mobile comparisons.


Graviton2 SPEC - Single Threaded - 2.5GHz

The overall results favour GCC in the SPECint workloads, while LLVM seemingly does better in the FP and memory heavy tests. Between the upstream GCC 9.2.0 and Arm’s precompiled version there’s seemingly no performance difference whatsoever, while there is some minor difference between Marvell’s setup and Arm’s branch of LLVM.

I ended up going forward with a clean compile of GCC 9.2.0 both for the Arm as well as x86 systems – meaning we’re using the exact same compiler for both architectures, just with different compile targets.

For x86, we’re again using the simple -Ofast flag for optimisations, and using the corresponding -march/-mtune targets for the EPYC and Intel platforms, meaning zenver1 and skylake-avx512.

Overall, it’s a bit odd to see GCC ahead in that many workloads given that LLVM the is the primary compiler for billions of Arm devices in the mobile space. Arm has said that they’re trying to put more effort into this compiler as seemingly it’s lagging behind GCC in terms of some optimisations.

Memory Subsystem & Latency SPEC - Single Threaded Performance
Comments Locked

96 Comments

View All Comments

  • anonomouse - Tuesday, March 10, 2020 - link

    Will there be more articles on this, covering other workloads than SPEC? You see lots of academic and industry papers talking about how real cloud/hyperscaler/server workloads have deep software stacks with large instruction-side footprints and static branch footprints, whereas SPEC is really... not that. Those workloads tend to have lower IPC on all platforms, and it would be interesting to see how Graviton2 performs on those from the instruction-supply side of things (1 core) as well as how I-side bandwidth scales horizontally with thread counts given the coherent I-Cache.
  • Andrei Frumusanu - Tuesday, March 10, 2020 - link

    Concrete suggestions in terms of workloads too look at and can be reasonably deployed are welcome- we currently don't have a well defined test suite for such things.
  • FunBunny2 - Tuesday, March 10, 2020 - link

    "Concrete suggestions in terms of workloads"

    OLTP on RDBMS?? real one, of course, not MySql. :)
  • Andrei Frumusanu - Tuesday, March 10, 2020 - link

    I mean an actual concrete example of such a structured benchmark, me going around doing random DB operations just opens up more criticism on why we didn't use test framework XYZ.
  • FunBunny2 - Tuesday, March 10, 2020 - link

    here's one: https://hammerdb.com/ don't know, perhaps likely, that you can get the source and compile for any db/OS of interest. didn't say it was simple. :)
  • Andrei Frumusanu - Wednesday, March 11, 2020 - link

    It's just I'm hearing a lot of "we want something specific" without actually specifying anything, me doing some random workload myself that isn't validated in terms of characterisation isn't in my view any better than the well understood nature of SPEC.
  • anonomouse - Wednesday, March 11, 2020 - link

    Have you looked at the benchmarks in GCP PerfKitBenchmarker (https://github.com/GoogleCloudPlatform/PerfKitBenc... It includes benchmark versions of various popular benchmarks including variants of ycsb on different databases, oltp, cloudsuite, hadoop, and a bunch of wrapper infrastructure around running the tests on cloud providers.
  • anonomouse - Wednesday, March 11, 2020 - link

    Okay so maybe the comment system doesn't have well with links:

    https://github.com/GoogleCloudPlatform/PerfKitBenc...
    http://googlecloudplatform.github.io/PerfKitBenchm...
  • yeeeeman - Tuesday, March 10, 2020 - link

    Ok, now imagine this chip with apple custom cores. Even Zen wouldn't stand a chance.
  • HStewart - Tuesday, March 10, 2020 - link

    You can't truly say that. Keep in mind both Apple and Amazon are aim at there own custom environments - things are like different in real world.

Log in

Don't have an account? Sign up now