SPEC - Multi-Threaded Performance

Picking up from the power efficiency discussion, let’s dive directly into the multi-threaded SPEC results. As usual, because these are not officially submitted scores to SPEC, we’re labelling the results as “estimates” as per the SPEC rules and license.

We compile the binaries with GCC 10.2 on their respective platforms, with simple -Ofast optimisation flags and relevant architecture and machine tuning flags (-march/-mtune=Neoverse-n1 ; -march/-mtune=skylake-avx512 ; -march/-mtune=znver2 (for Zen3 as well due to GCC 10.2 not having znver3). 

The new Ice Lake SP parts are using the -march/-mtune=icelake-server target. It’s to be noted that I briefly tested the system with the Skylake binaries, with little differences within margin of error.

I’m limiting the detailed comparison data to the flagship SKUs, to indicate peak performance of each platform. For that reason it’s not exactly as much an architectural comparison as it’s more of a top SKU comparison.

SPECint2017 Rate-N Estimated Scores (1 Socket)

To not large surprise, the Xeon 8380 is posting very impressive performance advancements compared to the Xeon 8280, with large increases across the board for all workloads. The geo-mean increase is +54% with a low of +40% up to a high of +71%.

It’s to be noted that while the new Ice Lake system is a major generational boost, it’s nowhere near enough to catch up with the performance of the AMD Milan or Rome, or Ampere’s Altra when it comes to total throughput.

SPECfp2017 Rate-N Estimated Scores (1 Socket)

Looking at the FP suite, we have more workloads that are purely memory performance bound, and the Ice Lake Xeon 8380 again is posting significant performance increases compared to its predecessor, with a geo-mean of +53% with a range of +41% to +64%.

In some of the workloads, the new Xeon now catches up and is on par with AMD’s EPYC 7763 due to the fact that both systems have the same memory configuration with 8-channel DDR4-3200.

In any other workloads that requires more CPU compute power, the Xeon doesn’t hold up nearly as well, falling behind the competition by significant margins.

SPECint2017 Base Rate-N Estimated Performance

In the aggregate geomean scores, we’re seeing again that the new Xeon 8380 allows Intel to significantly reposition itself in the performance charts. Unfortunately, this is only enough to match the lower core count SKUs from the competition, as AMD and Ampere are still well ahead by massive leads – although admittedly the gap isn’t as embarrassing as it was before.

SPECfp2017 Base Rate-N Estimated Performance

In the floating-point suite, the results are a bit more in favour for the Xeon 8380 compared to the integer suite, as the memory performance is weighed more into the total contribution of the total performance. It’s still not enough to beat the AMD and Ampere parts, but it’s much more competitive than it was before.

The Xeon 6330 is showcasing minor performance improvements over the 8280 and its cheaper equivalent the 6258R, but at least comes at half the cost – so while performance isn’t very impressive, the performance / $ might be more competitive.

Our performance results match Intel’s own marketing materials when it comes to the generational gains, actually even surpassing Intel’s figures by a few percent.

If you would be looking at Intel’s slide above, you could be extremely enthusiastic about Intel’s new generation, as indeed the performance improvements are extremely large compared to a Cascade Lake system.

As impressive as those generational numbers are, they really only help to somewhat narrow the Grand Canyon sized competitive performance gap we’ve had to date, and the 40-core Xeon 8380 still loses out to a 32-core Milan, and from a performance / price comparison, even a premium 75F3 costs 40% less than the Xeon 8380. Lower SKUs in the Ice Lake line-up would probably fare better in perf/$, however would also just lower the performance to an even worse competitive positioning.

Power & Efficiency - 10nm Gains SPEC - Single-Threaded Performance
Comments Locked

169 Comments

View All Comments

  • Oxford Guy - Wednesday, April 7, 2021 - link

    You're arguing apples (latency) and oranges (capability).

    An Apple II has better latency than an Apple Lisa, even though the latter is vastly more powerful in most respects. The sluggishness of the UI was one of the big problems with that system from a consumer point of view. Many self-described power users equated a snappy interface with capability, so they believed their CLI machines (like the IBM PC) were a lot better.
  • GeoffreyA - Wednesday, April 7, 2021 - link

    "today's software and OSes are absurdly slow, and in many cases desktop applications are slower in user-time than their late 1980s counterparts"

    Oh yes. One builds a computer nowadays and it's fast for a year. But then applications, being updated, grow sluggish over time. And it starts to feel like one's old computer again. So what exactly did we gain, I sometimes wonder. Take a simple suite like LibreOffice, which was never fast to begin with. I feel version 7 opens even slower than 6. Firefox was quite all right, but as of 85 or 86, when they introduced some new security feature, it seems to open a lot slower, at least on my computer. At any rate, I do appreciate all the free software.
  • ricebunny - Wednesday, April 7, 2021 - link

    Well said.
  • Frank_M - Thursday, April 8, 2021 - link

    Intel Fortran is vastly faster then GCC.

    How did ricebunny get a free compiler?
  • mode_13h - Thursday, April 8, 2021 - link

    > It's strange to tell people who use the Intel compiler that it's not used much in the real world, as though that carries some substantive point.

    To use the automotive analogy, it's as if a car is being reviewed using 100-octane fuel, even though most people can only get 93 or 91 octane (and many will just use the cheap 87 octane, anyhow).

    The point of these reviews isn't to milk the most performance from the product that's theoretically possible, but rather to inform readers about how they're likely to experience it. THAT is why it's relevant that almost nobody uses ICC in practice.

    And, in fact, BECAUSE so few people are using ICC, Intel puts a lot of work into GCC and LLVM.
  • GeoffreyA - Thursday, April 8, 2021 - link

    I think that a common compiler like GCC should be used (like Andrei is doing), along with a generic x86-64 -march (in the case of Intel/AMD) and generic -mtune. The idea would be to get the CPUs on as equal a footing as possible, even with code that might not be optimal, and reveal relative rather than absolute performance.
  • Wilco1 - Thursday, April 8, 2021 - link

    Using generic (-march=x86-64) means you are building for ancient SSE2... If you want a common baseline then use something like -march=x86-64-v3. You'll then get people claiming that excluding AVX-512 is unfair eventhough there is little difference on most benchmarks except for higher power consumption ( https://www.phoronix.com/scan.php?page=article&... ).
  • GeoffreyA - Saturday, April 10, 2021 - link

    I think leaving AVX512 out is a good policy.
  • GeoffreyA - Thursday, April 8, 2021 - link

    If I may offer an analogy, I would say: the benchmark is like an exam in school but here we test time to finish the paper (and with the constraint of complete accuracy). Each pupil should be given the identical paper, and that's it.

    Using optimised binaries for different CPUs is a bit like knowing each child's brain beforehand (one has thicker circuitry in Bodman region 10, etc.) and giving each a paper with peculiar layout and formatting but same questions (in essence). Which system is better, who can say, but I'd go with the first.
  • Oxford Guy - Wednesday, April 7, 2021 - link

    Well, whatever tricks were used made Blender faster with the ICC builds I tested — both on AMD's Piledriver and on several Intel releases (Lynnfield and Haswell).

Log in

Don't have an account? Sign up now