CPU Tests: SPEC MT Performance - DDR5 Advantage

Multi-threaded performance is where things become very interesting for Alder Lake, where the chip can now combine its 8 P-cores with its 8 E-cores. As we saw, the 8 E-cores are nothing to sneeze about, but another larger consideration for MT performance is DDR5. While in the ST results we didn’t see much change in the performance of the cores, in MT scenarios when all cores are hammering the memory, having double the memory channels as well as +50% more bandwidth is going to be extremely beneficial for Alder Lake.

SPECint2017 Rate-N Estimated Scores

As we noted, the DDR5 vs DDR4 results showcase a very large performance gap between the two memory technologies in MT scenarios. Running a total of 24 threads, 16 for the SMT-enabled P-cores, and 8 for the E-cores, Alder Lake is able to take the performance crown in quite a lot of the workloads. There are still cases where AMD’s 16-core setup with larger cores are able to perform better, undoubtedly also partly attributed to 64MB of on-chip cache.

Compared to the 11900K, the new 12900K showcases giant leaps, especially when paired with DDR5.

SPECfp2017 Rate-N Estimated Scores

In the FP suite, the DDR5 advantage in some workloads is even larger, as the results scale beyond that of the pure theoretical +50% bandwidth improvement. What’s important for performance is not just the theoretical bandwidth, but the actual utilised bandwidth, and again, the doubled up memory channels of DDR5 here are seemingly contributing to extremely large increases, if the workload can take advantage of it.

SPEC2017 Rate-N Estimated Total

In the aggregate results, there’s very clearly two conclusions, depending on whether you use the chip with DDR5 or DDR4.

With DDR4, Alder Lake and the 12900K in particular, is able to showcase very good and solid increases in performance, thanks to the IPC gains on the Golden Cove core, but most importantly, also thanks to the extra 8 Gracemont cores, which do carry their own weight. The 12900K falls behind AMD’s 5900X with DDR4, which is fair given the pricing of the chips here are generally in line with teach other.

With DDR5, the 12900K is able to fully stretch its multi-threaded performance legs. In less memory dependent workloads, the chip battles it out with AMD’s 16-core 5950X, winning some workloads, losing some others. In more memory dependent workloads, the DDR5 advantage is extremely clear, and the 12900K is able to blow past any competition, even slightly edging out the latest Apple M1 Max, released a few weeks ago, and notable for its memory bandwidth.

CPU Tests: SPEC ST Performance on P-Cores & E-Cores CPU Tests: SPEC MT Performance - P and E-Core Scaling
Comments Locked

474 Comments

View All Comments

  • mode_13h - Tuesday, November 9, 2021 - link

    Well, AMD does have V-Cache and Zen 3+ in the queue. But if you want to short them, be my guest!
  • Sivar - Monday, November 8, 2021 - link

    This is an amazingly deep, properly Anandtech review, even ignoring time constraints and the unusual difficulty of this particular launch.
    I bet Ian and Andrei will be catching up on sleep for weeks.
  • xhris4747 - Tuesday, November 9, 2021 - link

    Hi
  • ricebunny - Tuesday, November 9, 2021 - link

    It’s disappointing that Anandtech continues to use suboptimal compilers for their platforms. Intel’s Compiler classic demonstrated 41% better performance than Clang 12.0.0 in the SPECrate 2017 Floating Point suite.
  • mode_13h - Wednesday, November 10, 2021 - link

    I think it's fair, though. Most workloads people run aren't built with vendor-supplied compilers, they use industry standards of gcc, clang, or msvc. And the point of benchmarks it to give you an idea of what the typical user experience would be.
  • ricebunny - Wednesday, November 10, 2021 - link

    But are they not compiling the code for the M1 series chips with a vendor supplied compiler?

    Second, almost all benchmarks in SPECrate 2017 Floating Point are scientific codes, half of which are in Fortran. That’s exactly the target domain of the Intel compiler. I admit, I am out of date with the HPC developments, but back when I was still in the game icc was the most commonly used compiler.
  • mode_13h - Thursday, November 11, 2021 - link

    > are they not compiling the code for the M1 series chips with a vendor supplied compiler?

    It's just a slightly newer version of LLVM than what you'd get on Linux.

    > almost all benchmarks in SPECrate 2017 Floating Point are scientific codes,

    3 are rendering, animation, and image processing. Some of the others could fall more in the category of engineering than scientific, but whatever.

    > half of which are in Fortran.

    Only 3 are pure fortran. Another 4 are some mixture, but we don't know the relative amounts. They could literally link in BLAS or some FFT code for some trivial setup computation, and that would count as including fortran.

    https://www.spec.org/cpu2017/Docs/index.html#intra...

    BTW, you conveniently ignored how only one of the SPECrate 2017 int tests is fortran.
  • mode_13h - Thursday, November 11, 2021 - link

    Oops, I accidentally counted one test that's only SPECspeed.

    So, in SPECrate 2017 fp:

    3 are fortran
    3 are fortran & C/C++
    7 are only C/C++
  • ricebunny - Thursday, November 11, 2021 - link

    Yes, I made the same mistake when counting.

    Without knowing what the Fortran code in the mixed code represents I would not discard it as irrelevant: those tests could very well spend a majority of their time executing Fortran.

    As for the int tests, the advantage of the Intel compiler was even more pronounced: almost 50% over Clang. IMO this is too significant to ignore.

    If I ran these tests, I would provide results from multiple compilers. I would also consult with the CPU vendors regarding the recommended compiler settings. Anandtech refuses to compile code with AVX512 support for non Alder Lake Intel chips, whereas Intel’s runs of SPECrate2017 enable that switch?
  • xray9 - Sunday, November 14, 2021 - link

    > At Intel’s Innovation event last week, we learned that the operating system
    > will de-emphasise any workload that is not in user focus.

    I see performance critical for audio applications which need near-real time performance.
    It's already a pain to find good working drivers that do not allocate CPU core for too long, not to block processes with near-realtime demands.
    And for performance tuning we use already the Windows option to priotize for background processes, which gives the process scheduler a higher and fix time quantum, to be able to work more efficient on processes and to lower the number of context switches.
    And now we get this hybrid design where everything becomes out of control and you can only hope and pray, that the process scheduling will not be too bad. I am not amused about that and very skeptical, that this will work out well.

Log in

Don't have an account? Sign up now