Compiler Setup, GCC vs LLVM

For further performance testing of the systems, we fell back to SPEC2006 and 2017. I wanted to make sure that there’s no heated discussions when it comes to the compilation of the test suites, so carefully investigated the compilers out there, particularly regarding the choice between GCC and LLVM.

Overall, I checked three different compiler setups: A freshly compiled GCC 9.2.0 release, Arm’s Allinea Studio Compiler 20 package which comes with both Arm’s closed source LLVM and Flang variants as well as a pre-compiled version of GCC 9.2.0, and Marvell’s branch of LLVM and Flang.

We had seen quite a push by Arm for us to consider GCC more closely than LLVM, as Arm had admitted that they’ve spent more time upstream optimising GCC than they’ve had for LLVM. Given the much more prevalent use of GCC in cloud and datacentre applications, I did somewhat agree with this given that’s most likely what you’ll see people use in such environments.

I ran some single-threaded tests across the different compiler setups, the compiler flags were straightforward with just a simple -Ofast flag as well as -march/-mcpu=cortex-a76 or =neoverse-n1 (alias) for the Arm compiler setup.

As always, our SPEC results aren't officially submitted results, and thus we have to label them merely as "estimates" for this article. Furthermore, SPEC2006 has been retired in favour of SPEC2017, but I still wanted to put up the figures for historical context, as well as mobile comparisons.


Graviton2 SPEC - Single Threaded - 2.5GHz

The overall results favour GCC in the SPECint workloads, while LLVM seemingly does better in the FP and memory heavy tests. Between the upstream GCC 9.2.0 and Arm’s precompiled version there’s seemingly no performance difference whatsoever, while there is some minor difference between Marvell’s setup and Arm’s branch of LLVM.

I ended up going forward with a clean compile of GCC 9.2.0 both for the Arm as well as x86 systems – meaning we’re using the exact same compiler for both architectures, just with different compile targets.

For x86, we’re again using the simple -Ofast flag for optimisations, and using the corresponding -march/-mtune targets for the EPYC and Intel platforms, meaning zenver1 and skylake-avx512.

Overall, it’s a bit odd to see GCC ahead in that many workloads given that LLVM the is the primary compiler for billions of Arm devices in the mobile space. Arm has said that they’re trying to put more effort into this compiler as seemingly it’s lagging behind GCC in terms of some optimisations.

Memory Subsystem & Latency SPEC - Single Threaded Performance
Comments Locked

96 Comments

View All Comments

  • Duncan Macdonald - Tuesday, March 10, 2020 - link

    The Apple CPU cores are larger and more power hungry when loaded hard than the CPU cores on the N1. A 64 CPU chip with the high performance cores from the Apple A13 would consume far more power than the N1 and would be quite a bit larger than the N1. The Apple A13 chip (in the iPhone 11) is suited for intermittent load not the sustained use that server type chips such as the N1 have to deal with.
  • arashi - Wednesday, March 11, 2020 - link

    Yikesman
  • edsib1 - Tuesday, March 10, 2020 - link

    You are using an Epyc processor that is nearly 3 years old.

    Surely you should use this years model (or a 64-corer threadripper if you dont have one)
  • vanilla_gorilla - Wednesday, March 11, 2020 - link

    You should consider reading the article and then you would know exactly why they are using those CPU.
  • Kamen Rider Blade - Tuesday, March 10, 2020 - link

    The benchmarks feel incomplete. Why don't you have a 64-core Zen2 based processor in it to compare?

    Even the ThreadRipper 64-core would be something.

    But not having AMD's latest Server grade CPU in your benchmarks really feels like you're doing a disservice to your readers, especially since we've seen your previous reviews with the Zen 2 64 core monster.
  • Rudde - Wednesday, March 11, 2020 - link

    Read the article! Rome is mentioned over five times. In short, Amazon doesn't offer Rome instances yet and Anandtech will update this article once they do.
  • Sahrin - Tuesday, March 10, 2020 - link

    I may be remembering incorrectly, but doesn't Gen 1 Epyc have the same cache tweaks as Zen+ (ie, Epyc 7001 series is based on Zen+, not Zen)?
  • Rudde - Wednesday, March 11, 2020 - link

    They have same optimisations as first gen Zen APUs, i.e. Ryzen mobile 2xxx. Zen+ is a further developed architecture, albeit without further cache tweaks.
    The cache tweaks in question were meant to be included in the origina Zen, but didn't make it in time. As such one could argue that first gen Ryzen desktop is not full Zen (1), but a preview.
  • Sahrin - Tuesday, March 10, 2020 - link

    The fact that Amazon refused to grant access to Rome-based instances tells you everything you need to know. Graviton competes with Zen and Xeon, but is absolutely smoked by Zen 2 in both absolute terms and perf/watt.

    It's a shame to see Amazon hide behind marketing bullshit to make its products seem relevant.
  • rahvin - Thursday, March 12, 2020 - link

    Don't be silly. Amazon buys processors in the thousands. There is no way AMD could have supplied enough Rome CPU's to Amazon to load up an instance at each of their locations in the time Rome has been for sale.

    It typical takes about 6 months before Amazon gets instances online because AMD/Intel aren't going to give Amazon the entire production run for the first 3 months. They've got about 20 data centers and you'd probably need several hundered per data center to bring an instance up.

    Consider the cost and scale of building that out before you criticize them for not having the latest and greatest released a month a go. Rome hasn't been available to actually purchase for very long and the Cloud providers get special models and AMD still needs to supply everyone else as well.

Log in

Don't have an account? Sign up now