Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

Name: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Item: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Author: Andrei Frumusanu

by Andrei Frumusanu on April 6, 2021 11:00 AM EST

169 Comments | Add A Comment

169 Comments

SPEC - Per-Core Performance under Load

A metric that is actually more interesting than isolated single-thread performance, is actually per-thread performance in a fully loaded system. This actually is a measurement and benchmark figure that would greatly interest enterprises and customers which are running software or workloads that are possibly licensed on a per-core basis, or simply workloads that require a certain level of per-thread service level agreement in terms of performance.

This has been a strong-point of Intel SKUs for some time now, even when the chips wouldn’t be competitive in terms of total throughput. With the new Ice Lake SPs SKUs now more notably increasing total throughput, it’ll be interesting to see the per-thread breakdown and resulting performance:

SPEC2017 Rate-N Estimated Per-Thread Performance (1S)

Because the total throughput generational performance increase is larger than the core count increase of the parts, this means that per-thread and per-core performance is higher with this generation. The Xeon 8380 is posting +16.3% and +10.4% per-thread performance versus the Xeon 8280 when only using one thread per core.

Interestingly, these figures are less at +8.2 and +7.4% when using both SMT threads per core. Intel has explained such an increase through the better usage of shared microarchitectural structure usage in the new Sunny Cove cores, essentially diminishing the SMT yield by improving 1/T per core performance.

Generally, Intel is extremely competitive in this benchmark metric, and while AMD easily beats them with the frequency-optimised parts, it’s an advantage that should help Intel in the SLA-centric workloads.

SPEC - Single-Threaded Performance SPECjbb MultiJVM - Java Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

169 Comments

View All Comments

mode_13h - Monday, April 12, 2021 - link
With regard specifically to testing AVX-512, perhaps the best method is to include results both with and without it. This serves the dual-role of informing customers of the likely performance for software compiled with more typical options, as well as showing how much further performance is to be gained by using an AVX-512 optimized build.
KurtL - Wednesday, April 7, 2021 - link
GCC the industry standard in real world? Maybe in that part of the world where you live, but not everywhere. It is only true in a part of the world. HPC centres have relied on icc for ages for much of the performance-critical code, though GCC is slowly catching up, at least for C and C++ but not at all for Fortran, an important language in HPC (I just read it made it back in the top-20 of most used languages after falling back to position 34 a year or so ago). In embedded systems and the non-x86-world in general, LLVM derived compilers have long been the norm. Commercial compiler vendors and CPU manufacturers are all moving to LLVM-based compilers or have been there for years already.
Wilco1 - Wednesday, April 7, 2021 - link
Yes GCC is the industry standard for Linux. That's a simple fact, not something you can dispute.

In HPC people are willing to use various compilers to get best performance, so it's certainly not purely ICC. And HPC isn't exclusively Intel or x86 based either. LLVM is increasing in popularity in the wider industry but it still needs to catch up to GCC in performance.
mode_13h - Wednesday, April 7, 2021 - link
GCC is the only supported compiler for building the Linux kernel, although Google is working hard to make it build with LLVM. They seem to believe it's better for security.

From the benchmarks that Phoronix routinely publishes, each has its strengths and weaknesses. I think neither is a clear winner.
Wilco1 - Thursday, April 8, 2021 - link
Plus almost all distros use GCC - there is only one I know that uses LLVM. LLVM is slowly gaining popularity though.

They are fairly close for general code, however recent GCC versions significantly improved vectorization, and that helps SPEC.
Wilco1 - Tuesday, April 6, 2021 - link
ICC and AMD's AOCC are SPEC trick compilers. Neither is used much in the real world since for real code they are typically slower than GCC or LLVM.

Btw are you equally happy if I propose to use a compiler which replaces critical inner loops of the benchmarks with hand-optimized assembler code? It would be foolish not to take advantage of the extra performance you get only on those benchmarks...
ricebunny - Tuesday, April 6, 2021 - link
They are not SPEC tricks. You can use these compilers for any compliant C++ code that you have. In the last 10 years, the only time I didn’t use icc with Intel chips was on systems where I had no control over the sw ecosystem.
Wilco1 - Tuesday, April 6, 2021 - link
They only exist because of SPEC. The latest ICC is now based on LLVM since it was falling further behind on typical code.
ricebunny - Tuesday, April 6, 2021 - link
From my experience icc consistently produced better vectorized code.

Anandtech again didn’t publicize the compiler flags they used to build the benchmark code. By default, gcc will not generate avx512 optimized code.
Wilco1 - Tuesday, April 6, 2021 - link
Maybe compared to old GCC/LLVM versions, but things have changed. There is now little difference between ICC and GCC when running SPEC in terms of vectorized performance. Note the amount of code that can benefit from AVX-512 is absolutely tiny, and the speedups in the real world are smaller than expected (see eg. SIMDJson results with hand-optimized AVX-512).

And please read the article - the setup is clearly explained in every review: "We compile the binaries with GCC 10.2 on their respective platforms, with simple -Ofast optimisation flags and relevant architecture and machine tuning flags (-march/-mtune=Neoverse-n1 ; -march/-mtune=skylake-avx512 ; -march/-mtune=znver2 (for Zen3 as well due to GCC 10.2 not having znver3). "

Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

SPEC - Per-Core Performance under Load

Post Your Comment

169 Comments

View All Comments

mode_13h - Monday, April 12, 2021 - link

KurtL - Wednesday, April 7, 2021 - link

Wilco1 - Wednesday, April 7, 2021 - link

mode_13h - Wednesday, April 7, 2021 - link

Wilco1 - Thursday, April 8, 2021 - link

Wilco1 - Tuesday, April 6, 2021 - link

ricebunny - Tuesday, April 6, 2021 - link

Wilco1 - Tuesday, April 6, 2021 - link

ricebunny - Tuesday, April 6, 2021 - link

Wilco1 - Tuesday, April 6, 2021 - link

Log in

Don't have an account? Sign up now