Compiling LLVM, NAMD Performance

As we’re trying to rebuild our server test suite piece by piece – and there’s still a lot of work go ahead to get a good representative “real world” set of workloads, one more highly desired benchmark amongst readers was a more realistic compilation suite. Chrome and LLVM codebases being the most requested, I landed on LLVM as it’s fairly easy to set up and straightforward.

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout release/11.x
mkdir ./build
cd ..
mkdir llvm-project-tmpfs
sudo mount -t tmpfs -o size=10G,mode=1777 tmpfs ./llvm-project-tmpfs
cp -r llvm-project/* llvm-project-tmpfs
cd ./llvm-project-tmpfs/build
cmake -G Ninja \
  -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb;compiler-rt;lld" \
  -DCMAKE_BUILD_TYPE=Release ../llvm
time cmake --build .

We’re using the LLVM 11.0.0 release as the build target version, and we’re compiling Clang, libc++abi, LLDB, Compiler-RT and LLD using GCC 10.2 (self-compiled). To avoid any concerns about I/O we’re building things on a ramdisk. We’re measuring the actual build time and don’t include the configuration phase as usually in the real world that doesn’t happen repeatedly.

LLVM Suite Compile Time

Starting off with the Xeon 8380, we’re looking at large generational improvements for the new Ice Lake SP chip. A 33-35% improvement in compile time depending on whether we’re looking at 2S or 1S figures is enough to reposition Intel’s flagship CPU in the rankings by notable amounts, finally no longer lagging behind as drastically as some of the competition.

It’s definitely not sufficient to compete with AMD and Ampere, both showcasing figures that are still 25 and 15% ahead of the Xeon 8380.

The Xeon 6330 is falling in line with where we benchmarked it in previous tests, just slightly edging out the Xeon 8280 (6258R equivalent), meaning we’re seeing minor ISO-core ISO-power generational improvements (again I have to mention that the 6330 is half the price of a 6258R).

NAMD (Git-2020-12-09) - Apolipoprotein A1

NAMD is a problem-child benchmark due to its recent addition of AVX512: the code had been contributed by Intel engineers – which isn’t exactly an issue in my view. The problem is that this is a new algorithm which has no relation to the normal code-path, which remains not as hand-optimised for AVX2, and further eyebrow raising is that it’s solely compatible with Intel’s ICC and no other compiler. That’s one level too much in terms of questionable status as a benchmark: are we benchmarking it as a general HPC-representative workload, or are we benchmarking it solely for the sake of NAMD and only NAMD performance?

We understand Intel is putting a lot of focus on these kinds of workloads that are hyper-optimised to run well extremely on Intel-only hardware, and it’s a valid optimisation path for many use-cases. I’m just questioning how representative it is of the wider market and workloads.

In any case, the GCC binaries of the test on the ApoA1 protein showcase significant performance uplifts for the Xeon 8380, showcasing a +35.6% gain. Using this apples-to-apples code path, it’s still quite behind the competition which scales the performance much higher thanks to more cores.

SPECjbb MultiJVM - Java Performance Conclusion & End Remarks
Comments Locked

169 Comments

View All Comments

  • mode_13h - Thursday, April 8, 2021 - link

    Please tell me you did this test with an ICC released only a couple years ago, or else I feel embarrassed for you polluting this discussion with such irrelevant facts.
  • Oxford Guy - Sunday, April 11, 2021 - link

    It wasn't that long ago.

    If you want to increase the signal to noise ratio you should post something substantive.

    For instance, if you think think ICC no longer produces faster Blender builds why not post some evidence to that effect?
  • eastcoast_pete - Tuesday, April 6, 2021 - link

    This Xeon generation exists primarily because Intel had to come through and deliver something in 10 nm, after announcing the heck out of it for years. As an actual processor, they are not bad as far as Xeons are concerned, but clearly inferior to AMD's current EPYC line, especially on price/performance. Plus, we and the world know that the real update is around the corner within a year: Sapphire Rapids. That one promises a lot of performance uplift, not the least by having PCI-5 and at least the option of directly linked HBM for RAM. Lastly, if Intel would have managed to make this line compatible with the older socket (it's not), one could at least have used these Ice Lake Xeons to update Cooper Lake systems via a CPU swap. As it stands, I don't quite see the value proposition, unless you're in an Intel shop and need capacity very badly right now.
  • Limadanilo2022 - Tuesday, April 6, 2021 - link

    Agreed. Both Ice Lake and Rocket lake are just placeholders to try to make something before the real improvement comes with Saphire rapids and Alder Make respectively... I'm one that says that AMD really needs the competition right now to not get sloppy and become "2017-2020 Intel". I want to see both competing hard in the next years ahead
  • drothgery - Wednesday, April 7, 2021 - link

    Rocket Lake is a stopgap. Ice Lake (and Ice Lake SP) were just late; they would have been unquestioned market leaders if launched on time and even now mostly just run into problems when the competition is throwing way more cores at the problem.
  • AdrianBc - Wednesday, April 7, 2021 - link

    No, Ice Lake Server cores have a much lower clock frequency and a much smaller L3 cache than Epyc 7xx3, so they are much slower core per core than AMD Milan for any general purpose application, e.g. software compilation.

    The Ice Lake Server cores have a double number of floating-point multipliers that can be used by AVX-512 programs, so they are faster (despite their clock frequency deficit) for the applications that are limited by FP multiplication throughput or that can use other special AVX-512 features, e.g. the instructions useful for machine learning.
  • Oxford Guy - Wednesday, April 7, 2021 - link

    'limited by FP multiplication throughput or that can use other special AVX-512 features, e.g. the instructions useful for machine learning.'

    How do they compare with Power?

    How do they compare with GPUs? (I realize that a GPU is very good at a much more limited palette of work types versus a general-purpose CPU. However... how much overlap there is between a GPU and AVX-512 is something at least non-experts will wonder about.)
  • AdrianBc - Thursday, April 8, 2021 - link

    The best GPUs from NVIDIA and AMD can provide between 3 and 4 times more performance per watt than the best Intel Xeons with AVX-512.

    However most GPUs are usable only in applications where low precision is appropriate, i.e. graphics and machine learning.

    The few GPUs that can be used for applications that need higher precision (e.g. NVIDIA A100 or Radeon Instinct) are extremely expensive, much more than Xeons or Epycs, and individuals or small businesses have very little chances to be able to buy them.
  • mode_13h - Friday, April 9, 2021 - link

    Please re-check the price list. The top-end A100 does sell for a bit more than the $8K list price of the top Xeon and EPYC, however MI100 seems to be pretty close. perf/$ is still wildly in favor of GPUs.

    Unfortunately, if you're only looking at the GPUs' ordinary compute specs, you're missing their real point of differentiation, which is their low-precision tensor performance. That's far beyond what the CPUs can dream of!

    Trust there are good reasons why Intel scrapped Xeon Phi, after flogging it for 2 generations (plus a few prior unreleased iterations), and adopted a pure GPU approach to compute!
  • mode_13h - Thursday, April 8, 2021 - link

    "woulda, coulda, shoulda"

    Ice Lake SP is not even competitive with Rome. So, they missed their market window by quite a lot!

Log in

Don't have an account? Sign up now