Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

Name: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Item: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Author: Andrei Frumusanu

by Andrei Frumusanu on April 6, 2021 11:00 AM EST

169 Comments | Add A Comment

169 Comments

Compiling LLVM, NAMD Performance

As we’re trying to rebuild our server test suite piece by piece – and there’s still a lot of work go ahead to get a good representative “real world” set of workloads, one more highly desired benchmark amongst readers was a more realistic compilation suite. Chrome and LLVM codebases being the most requested, I landed on LLVM as it’s fairly easy to set up and straightforward.

git clone https://github.com/llvm/llvm-project.git

cd llvm-project

git checkout release/11.x

mkdir ./build

cd ..

mkdir llvm-project-tmpfs

sudo mount -t tmpfs -o size=10G,mode=1777 tmpfs ./llvm-project-tmpfs

cp -r llvm-project/* llvm-project-tmpfs

cd ./llvm-project-tmpfs/build

cmake -G Ninja \

-DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;lldb;compiler-rt;lld" \

-DCMAKE_BUILD_TYPE=Release ../llvm

time cmake --build .

We’re using the LLVM 11.0.0 release as the build target version, and we’re compiling Clang, libc++abi, LLDB, Compiler-RT and LLD using GCC 10.2 (self-compiled). To avoid any concerns about I/O we’re building things on a ramdisk. We’re measuring the actual build time and don’t include the configuration phase as usually in the real world that doesn’t happen repeatedly.

LLVM Suite Compile Time

Starting off with the Xeon 8380, we’re looking at large generational improvements for the new Ice Lake SP chip. A 33-35% improvement in compile time depending on whether we’re looking at 2S or 1S figures is enough to reposition Intel’s flagship CPU in the rankings by notable amounts, finally no longer lagging behind as drastically as some of the competition.

It’s definitely not sufficient to compete with AMD and Ampere, both showcasing figures that are still 25 and 15% ahead of the Xeon 8380.

The Xeon 6330 is falling in line with where we benchmarked it in previous tests, just slightly edging out the Xeon 8280 (6258R equivalent), meaning we’re seeing minor ISO-core ISO-power generational improvements (again I have to mention that the 6330 is half the price of a 6258R).

NAMD (Git-2020-12-09) - Apolipoprotein A1

NAMD is a problem-child benchmark due to its recent addition of AVX512: the code had been contributed by Intel engineers – which isn’t exactly an issue in my view. The problem is that this is a new algorithm which has no relation to the normal code-path, which remains not as hand-optimised for AVX2, and further eyebrow raising is that it’s solely compatible with Intel’s ICC and no other compiler. That’s one level too much in terms of questionable status as a benchmark: are we benchmarking it as a general HPC-representative workload, or are we benchmarking it solely for the sake of NAMD and only NAMD performance?

We understand Intel is putting a lot of focus on these kinds of workloads that are hyper-optimised to run well extremely on Intel-only hardware, and it’s a valid optimisation path for many use-cases. I’m just questioning how representative it is of the wider market and workloads.

In any case, the GCC binaries of the test on the ApoA1 protein showcase significant performance uplifts for the Xeon 8380, showcasing a +35.6% gain. Using this apples-to-apples code path, it’s still quite behind the competition which scales the performance much higher thanks to more cores.

SPECjbb MultiJVM - Java Performance Conclusion & End Remarks

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

169 Comments

View All Comments

fallaha56 - Tuesday, April 6, 2021 - link
sure

but when your 64-core part virtually beats Intel's dual socket 32-core part on performance alone?

add the energy savings and suddenly it's a 300-400% perf lead
Jorgp2 - Tuesday, April 6, 2021 - link
The fuck?

You do realize that they put more than CPUs onto servers right?
Andrei Frumusanu - Tuesday, April 6, 2021 - link
We are testing non-production server configurations, all with varying hardware, PSUs, and other setup differences. Socket comparisons remain relatively static between systems.
edzieba - Tuesday, April 6, 2021 - link
Would be interesting to pit the 4309 (or 5315) against the Rocket Lake octacores. Yes, it's a very different platform aimed at a different market, but it would be interesting to see what a hypothetical '10nm Sunny Cove consumer desktop' could have resembled compared to what Rocket Lake's Sunny Cove delivered on 14nm.
Jorgp2 - Tuesday, April 6, 2021 - link
You could also compare it to the 10900x, which is an existing AVX-512 CPU with large L2 caches.
Holliday75 - Tuesday, April 6, 2021 - link
Typical consumer workloads the RL will be better. For typical server workloads the IL will be better. That is the gist of what would be said.
ricebunny - Tuesday, April 6, 2021 - link
These tests are not entirely representative of real world use cases. For open source software, the icc compiler should always be the first choice for Intel chips. The fact that Intel provides such a compiler for free and AMD doesn’t is a perk that you get with owning Intel. It would be foolish not to take advantage of it.
Andrei Frumusanu - Tuesday, April 6, 2021 - link
AMD provides AOCC and there's nothing stopping you from running ICC on AMD either. The relative positioning in that scenario doesn't change, and GCC is the industry standard in that regard in the real world.
ricebunny - Tuesday, April 6, 2021 - link
Thanks for your reply. I was speaking from my experience in HPC: I’ve never compiled code that I intended to run on Intel architectures with anything but icc, except when the environment did not provide me such liberty, which was rare.

If I were to run the benchmarks, I would build them with the most optimal settings for each architecture using their respective optimizing compilers. I would also make sure that I am using optimized libraries, e.g. Intel MKL and not Open BLAS for Intel architecture, etc.
Wilco1 - Tuesday, April 6, 2021 - link
And I could optimize benchmarks using hand crafted optimal inner loops in assembler. It's possible to double the SPEC score that way. By using such optimized code on a slow CPU, it can *appear* to beat a much faster CPU. And what does that prove exactly? How good one is at cheating?

If we want to compare different CPUs then the only fair option is to use identical compilers and options like AnandTech does.

Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

Compiling LLVM, NAMD Performance

Post Your Comment

169 Comments

View All Comments

fallaha56 - Tuesday, April 6, 2021 - link

Jorgp2 - Tuesday, April 6, 2021 - link

Andrei Frumusanu - Tuesday, April 6, 2021 - link

edzieba - Tuesday, April 6, 2021 - link

Jorgp2 - Tuesday, April 6, 2021 - link

Holliday75 - Tuesday, April 6, 2021 - link

ricebunny - Tuesday, April 6, 2021 - link

Andrei Frumusanu - Tuesday, April 6, 2021 - link

ricebunny - Tuesday, April 6, 2021 - link

Wilco1 - Tuesday, April 6, 2021 - link

Log in

Don't have an account? Sign up now