CPU Tests: SPEC MT Performance - P and E-Core Scaling

Update Nov 6th:

We’ve finished our MT breakdown for the platform, investigating the various combination of cores and memory configurations for Alder Lake and the i9-12900K. We're posting the detailed scores for the DDR5 results, following up the aggregate results for DDR4 as well.

The results here solely cover the i9-12900K and various combinations of MT performance, such as 8 E-cores, 8 P-cores with 1T as well as 2T, and the full 24T 8P2T+8E scenario. The results here were done on Linux due to easier way to set affinities to the various cores, and they’re not completely comparable to the WSL results on the previous page, however should be within small margins of error for most tests.

SPECint2017 Rate-N Estimated Scores (i9-12900K Scaling)

In the integer suite, the E-cores are quite powerful, reaching scores of around 50% of the 8P2T results, or more.

Many of the more core-bound workloads appear to very much enjoy just having more cores added to the suite, and these are also the workloads that have the largest gains in terms of gaining performance when we add 8 E-cores on top of the 8P2T results.

Workloads that are more cache-heavy, or rely on memory bandwidth, both shared resources on the chip, don’t scale too well at the top-end of things when adding the 8 E-cores. Most surprising to me was the 502.gcc_r result which barely saw any improvement with the added 8 E-cores.

More memory-bound workloads such as 520.omnetpp or 505.mcf are not surprising to see them not scale with the added E-cores – mcf even seeing a performance regression as the added cores mean more memory contention on the L3 and memory controllers.

SPECfp2017 Rate-N Estimated Scores (i9-12900K Scaling)

In the FP suite, the E-cores more clearly showcase a lower % of performance relative to the P-cores, and this makes sense given their design. Only few more compute-bound tests, such as 508.namd, 511.povray, or 538.imagick see larger contributions of the E-cores when they’re added in on top of the P-cores.

The FP suite also has a lot more memory-hungry workload. When it comes to DRAM bandwidth, having either E-cores or P-cores doesn’t matter much for the workload, as it’s the memory which is bottlenecked. Here, the E-cores are able to achieve extremely large performance figures compared to the P-cores. 503.bwaves and 519.lbm for example are pure DRAM bandwidth limited, and using the E-cores in MT scenarios allows for similar performance to the P-cores, however at only 35-40W package power, versus 110-125W for the P-cores result set.

Some of these workloads also see regressions in performance when adding in more cores or threads, as it just means more memory traffic contention on the chip, such as seen in the 8P2T+8E, 8P2T regressions over the 8P1T results.

SPEC2017 Rate-N Estimated Total (i9-2900K Scaling)

What’s most interesting here is the scaling of performance and the attribution between the P-cores and the E-cores. Focusing on the DDR5 set, the 8 E-cores are able to provide around 52-55% of the performance of 8 P-cores without SMT, and 47-51% of the P-cores with SMT. At first glance this could be argued that the 8P+8E setup can be somewhat similar to a 12P setup in MT performance, however the combined performance of both clusters only raises the MT scores by respectively 25% in the integer suite, and 5% in the FP suite, as we are hitting near package power limits with just 8P2T, and there’s diminishing returns on performance given the shared L3. What the E-cores do seem to allow the system is to allows to reduce every-day average power usage and increase the efficiency of the socket, as less P-cores need to be active at any one time.

CPU Tests: SPEC MT Performance - DDR5 Advantage CPU Benchmark Performance: E-Core
Comments Locked

474 Comments

View All Comments

  • xhris4747 - Tuesday, November 9, 2021 - link

    They should use pbo it's fair to
  • xhris4747 - Tuesday, November 9, 2021 - link

    Is you using pbo some people are t using pbo which I think isn't fair because that i9 is oc to snot
  • EnglishMike - Thursday, November 4, 2021 - link

    It's not just the gaming world -- it's the entire world except for long-running CPU intensive tasks. Handbrake and blender are valuable benchmarking tools for seeing what a CPU is capable of when pushed to the limit, but the vast majority of users -- even most power users -- don't do that.

    Sure, Intel has more work to do to improve power efficiency in long running CPU intensive workloads, but taking the worst case power usage scenarios distorts the picture as much as you're claiming the reviewers are doing.
  • Wrs - Thursday, November 4, 2021 - link

    Can't calculate efficiency without scores. Also, well known that power scales much faster than performance. The proper way to compare efficiency is really at constant work rate or constant power.
  • blanarahul - Thursday, November 4, 2021 - link

    Sorry sir I can't. You haven't provided me the data for how much time each test took! Would you be so kind as to do that?
  • Netmsm - Thursday, November 4, 2021 - link

    Sorry, this is a direct link to Tom's bench:
    https://cdn.mos.cms.futurecdn.net/if3Lox9ZJBRxjbhr...
    this is for "blender bmw27" in which both 12900k and 5950x finish the job around 80 seconds BUT 12900k sucks power for about 70 percent more than 5950x.

    you can find other benches here:
    https://www.tomshardware.com/news/intel-core-i9-12...

    I'm wondering why Ian hasn't put 12900k nominal TDP in results just like all other CPU's! When 10900k was released with nominal TDP of 125, Ian put than number in every bench while in reality 10900k was consuming up to 254 (according to the Ian's review)! When I asked him to put real numbers of power consumption for every test he said I can't because of time and because I've too much to do and because I've no money to pay and delegate such works to an assistant!
    But now we have 12900k with nominal TDP of 241 which seems unpleasant to Ian to put it in front of it in results.
  • Zingam - Friday, November 5, 2021 - link

    Last gen game. How about glquake?

    1 billion computing devices and just a few million game units sold? What does it mean? Gamers are a tiny but vocal minority.
    If they bring this performance at 5W on low and 45W on high then its good for majority of people. This is just a space heater.
  • Gothmoth - Friday, November 5, 2021 - link

    so throwing more cores on a game that can´t make use of them is usless thanks for clarifing that.... genius!!

    when a 5600x is producing 144 FPS and a 5950x is producing 150 FPS the 5600x is the clear winner when it comes to efficency.

    now try to cool the 12900K in a work environment with an air cooler.
    i can cool my threadripper with a noctua aircooler and let it run under full load for ours.

    i am really curious to see how the 12900k will handle that.

    i am not an amd fanboy. i was using anti-consumer intel for a decade before switching to ryzen.
    i would us intel again when it makes sense for me (i need my pc for work not gaming).

    but with this power draw it does not make sense.
  • Wrs - Saturday, November 6, 2021 - link

    The 12900k is fine with a Noctua D15 in a work environment. Doesn't matter if you're hammering it at 95C the whole time, the D15 doesn't get louder. But it's no megachip like a Threadripper. For that on the Intel side you'd wait for Sapphire Rapids or put up with an existing Xeon Gold with 8-32 Ice Lake cores at 10nm.
  • Netmsm - Saturday, November 6, 2021 - link

    How would it be justified to buy Xeon Gold in place of Threadripper and Epyc?!

Log in

Don't have an account? Sign up now