Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

Name: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Item: Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
Author: Andrei Frumusanu

by Andrei Frumusanu on April 6, 2021 11:00 AM EST

169 Comments | Add A Comment

169 Comments

SPEC - Single-Threaded Performance

Single-thread performance of server CPUs usually isn’t the most important metric for most scale-out workloads, but there are use-cases such as EDA tools which are pretty much single-thread performance bound.

Power envelopes here usually don’t matter, and what is actually the performance factor that comes at play here is simply the boost clocks of the CPUs as well as the IPC improvement, and memory latency of the cores.

The one hiccup for the Xeon 8380 this generation is the fact that although there’s plenty of IPC gains to be had compared to previous microarchitectures, the new SKU is only boosting up to 3.4GHz, whereas the 8280 was able to boost up to 4GHz, which is a 15% deficit.

SPECint2017 Rate-1 Estimated Scores

Even with the clock frequency disadvantage, thanks to the IPC gains, much improved memory bandwidth, as well as the much larger L3 cache, the new Ice Lake part to most of the time beat the Cascade Lake part, with only a couple of compute-bound core workloads where it falls behind.

SPECfp2017 Rate-1 Estimated Scores

The floating-point figures are more favourable to the ICX architecture due to the stronger memory performance.

SPEC2017 Rate-1 Estimated Total

Overall, the new Xeon 8380 at least manages to post slight single-threaded performance increases this generation, with larger gains in memory-bound workloads. The 8380 is essentially on par with AMD’s 7763, and loses out to the higher frequency optimised parts.

Intel has a few SKUs which offers slightly higher ST boost clocks of up to 3.7GHz – 300Mhz / 8.8% higher than the 8380, however that part is only 8-core and features only 18MB of cache. Other SKUS offer 3.5-3.6GHz boosts, but again less cache. So while the ST figures here could improve a bit on those parts, it’s unlikely to be significant.

SPEC - Multi-Threaded Performance SPEC - Per-Core Performance under Load

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

169 Comments

View All Comments

mode_13h - Wednesday, April 7, 2021 - link
Intel, AMD, and ARM all contribute loads of patches to both GCC and LLVM. There's no way either of these compilers can be seen as "underdeveloped".

And Intel is usually doing compiler work a couple YEARS ahead of each CPU & GPU generation. If anyone is behind, it's AMD.
Oxford Guy - Wednesday, April 7, 2021 - link
It's not cheating if the CPU can do that work art that speed.

It's only cheating if you don't make it clear to readers what kind of benchmark it is (hand-tuned assembly).
mode_13h - Thursday, April 8, 2021 - link
Benchmarks, in articles like this, should strive to be *relevant*. And for that, they ought to focus on representing the performance of the CPUs as the bulk of readers are likely to experience it.

So, even if using some vendor-supplied compiler with trick settings might not fit your definition of "cheating", that doesn't mean it's a service to the readers. Maybe save that sort of thing for articles that specifically focus on some aspect of the CPU, rather than the *main* review.
Oxford Guy - Sunday, April 11, 2021 - link
There is nothing more relevant than being able to see all facets of a part's performance. This makes it possible to discern its actual performance capability.

Some think all a CPU comparison needs are gaming benchmarks. There is more to look at than subsets of commercial software. Synthetic benchmarks also are valid data points.
mode_13h - Monday, April 12, 2021 - link
It's kind of like whether an automobile reviewer tests a car with racing tyres and 100-octane fuel. That would show you its maximum capabilities, but it's not how most people are going to experience it. While a racing enthusiast might be interested in knowing this, it's not a good proxy for the experience most people are likely to have with it.

All I'm proposing is to prioritize accordingly. Yes, we want to know how many lateral g's it can pull on a skid pad, once you remove the limiting factor of the all-season tyres, but that's secondary.
Wilco1 - Thursday, April 8, 2021 - link
It's still cheating if you compare highly tuned benchmark scores with untuned scores. If you use it to trick users into believing CPU A is faster than CPU B eventhough CPU A is really slower, you are basically doing deceptive marketing. Mentioning it in the small print (which nobody reads) does not make it any less cheating.
Oxford Guy - Sunday, April 11, 2021 - link
It's cheating to use software that's very unoptimized to claim that that's as much performance as CPU has.

For example... let's say we'll just skip all software that has AVX-512 support — on the basis that it's just not worth testing because so many CPUs don't support it.
Wilco1 - Sunday, April 11, 2021 - link
Running not fully optimized software is what we do all the time, so that's exactly what we should be benchmarking. The -Ofast option used here is actually too optimized since most code is built with -O2. Some browsers use -Os/-Oz for much of their code!

AVX-512 and software optimized for AVX-512 is quite rare today, and the results are pretty awful on the latest cores: https://www.phoronix.com/scan.php?page=article&...

Btw Andrei ran ICC vs GCC: https://twitter.com/andreif7/status/13808945639975...

ICC is 5% slower than GCC on SPECINT. So there we go.
mode_13h - Monday, April 12, 2021 - link
Not to disagree with you, but always take Phoronix' benchmarks with a grain of salt.

First, he tested one 14 nm CPU model that only has one AVX-512 unit per core. Ice Lake has 2, and therefore might've shown more benefit.

Second, PTS is enormous (more than 1 month typical runtime) and I haven't seen Michael being very transparent about his criteria for selecting which benchmarks to feature in his articles. He can easily bias perception through picking benchmarks that respond well or poorly to the feature or product in question.

There are also some questions raised about his methodology, such as whether he effectively controlled for AVX-512 usage in some packages that contain hand-written asm. However, by looking at the power utilization graphs, I doubt that's an issue in this case. But, if he excluded such packages for that very reason, then it could unintentionally bias the results.
Wilco1 - Monday, April 12, 2021 - link
Completely agree that Phoronix benchmarks are dubious - it's not only the selection but also the lack of analysis of odd results and the incorrect way he does cross-ISA comparisons. It's far better to show a few standard benchmarks with well-known characteristics than a random sample of unknown microbenchmarks.

Ignoring all that, there are sometimes useful results in all the noise. The power results show that for the selected benchmarks there is really use of AVX-512. Whether this is typical across a wider range of code is indeed the question...

Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small

SPEC - Single-Threaded Performance

Post Your Comment

169 Comments

View All Comments

mode_13h - Wednesday, April 7, 2021 - link

Oxford Guy - Wednesday, April 7, 2021 - link

mode_13h - Thursday, April 8, 2021 - link

Oxford Guy - Sunday, April 11, 2021 - link

mode_13h - Monday, April 12, 2021 - link

Wilco1 - Thursday, April 8, 2021 - link

Oxford Guy - Sunday, April 11, 2021 - link

Wilco1 - Sunday, April 11, 2021 - link

mode_13h - Monday, April 12, 2021 - link

Wilco1 - Monday, April 12, 2021 - link

Log in

Don't have an account? Sign up now