CPU Tests: SPEC MT Performance - P and E-Core Scaling

Update Nov 6th:

We’ve finished our MT breakdown for the platform, investigating the various combination of cores and memory configurations for Alder Lake and the i9-12900K. We're posting the detailed scores for the DDR5 results, following up the aggregate results for DDR4 as well.

The results here solely cover the i9-12900K and various combinations of MT performance, such as 8 E-cores, 8 P-cores with 1T as well as 2T, and the full 24T 8P2T+8E scenario. The results here were done on Linux due to easier way to set affinities to the various cores, and they’re not completely comparable to the WSL results on the previous page, however should be within small margins of error for most tests.

SPECint2017 Rate-N Estimated Scores (i9-12900K Scaling)

In the integer suite, the E-cores are quite powerful, reaching scores of around 50% of the 8P2T results, or more.

Many of the more core-bound workloads appear to very much enjoy just having more cores added to the suite, and these are also the workloads that have the largest gains in terms of gaining performance when we add 8 E-cores on top of the 8P2T results.

Workloads that are more cache-heavy, or rely on memory bandwidth, both shared resources on the chip, don’t scale too well at the top-end of things when adding the 8 E-cores. Most surprising to me was the 502.gcc_r result which barely saw any improvement with the added 8 E-cores.

More memory-bound workloads such as 520.omnetpp or 505.mcf are not surprising to see them not scale with the added E-cores – mcf even seeing a performance regression as the added cores mean more memory contention on the L3 and memory controllers.

SPECfp2017 Rate-N Estimated Scores (i9-12900K Scaling)

In the FP suite, the E-cores more clearly showcase a lower % of performance relative to the P-cores, and this makes sense given their design. Only few more compute-bound tests, such as 508.namd, 511.povray, or 538.imagick see larger contributions of the E-cores when they’re added in on top of the P-cores.

The FP suite also has a lot more memory-hungry workload. When it comes to DRAM bandwidth, having either E-cores or P-cores doesn’t matter much for the workload, as it’s the memory which is bottlenecked. Here, the E-cores are able to achieve extremely large performance figures compared to the P-cores. 503.bwaves and 519.lbm for example are pure DRAM bandwidth limited, and using the E-cores in MT scenarios allows for similar performance to the P-cores, however at only 35-40W package power, versus 110-125W for the P-cores result set.

Some of these workloads also see regressions in performance when adding in more cores or threads, as it just means more memory traffic contention on the chip, such as seen in the 8P2T+8E, 8P2T regressions over the 8P1T results.

SPEC2017 Rate-N Estimated Total (i9-2900K Scaling)

What’s most interesting here is the scaling of performance and the attribution between the P-cores and the E-cores. Focusing on the DDR5 set, the 8 E-cores are able to provide around 52-55% of the performance of 8 P-cores without SMT, and 47-51% of the P-cores with SMT. At first glance this could be argued that the 8P+8E setup can be somewhat similar to a 12P setup in MT performance, however the combined performance of both clusters only raises the MT scores by respectively 25% in the integer suite, and 5% in the FP suite, as we are hitting near package power limits with just 8P2T, and there’s diminishing returns on performance given the shared L3. What the E-cores do seem to allow the system is to allows to reduce every-day average power usage and increase the efficiency of the socket, as less P-cores need to be active at any one time.

CPU Tests: SPEC MT Performance - DDR5 Advantage CPU Benchmark Performance: E-Core
Comments Locked

474 Comments

View All Comments

  • Kvaern1 - Sunday, November 7, 2021 - link

    Because there are no games which are 'incompatible'' with ADL.
  • eastcoast_pete - Sunday, November 7, 2021 - link

    While AL is an interesting CPU (regardless of what one's preference is), I still think the star of AL is the Gracemont core (E cores), and did some very simple-minded, back of a napkin calculations. The top AL has 8 (P cores with multithreading) = 16 + 8 E core threads (no multithreading here) for a total of 24 threads. According to first die shots, one P core requires the same die area as 4 E cores. That leaves me wanting an all-E core CPU with the same die size as the i9 AL, because that could fit 8x4= 32 plus the existing 8 Gracemonts, for a total of 40. And, the old problem of "Atoms can't do AVX and AVX2" is solved - because now they can! Yes, single thread performance would be significantly lower, but any workload that can take advantage of many threads should be at least as fast as on the i9. Anyone here knows if Intel is considering that? It wouldn't be the choice for gaming, but for productivity, it might give both the i9 and, possibly, the 5950x a run for the money.
  • mode_13h - Monday, November 8, 2021 - link

    They currently make Atom-branded embedded server CPUs with up to 24 cores. This one launched last year, using Tremont cores:

    https://ark.intel.com/content/www/us/en/ark/produc...

    I think you can expect to see a Gracemont-based refresh, possibly with some new product lines expanding into non-embedded markets.
  • eastcoast_pete - Monday, November 8, 2021 - link

    Yes, those Tremont-based CPUs are intended/sold for 5G cell stations; I hope that Intel doesn't just refresh those with Gracemont, but makes a 32-40 Gracemont core CPU available for workstations and servers. The one thing that might prevent that is fear (Intel's) of cannibalizing their Sapphire Rapid sales. However, if I would be in their shoes, I'd worry more about upcoming AMD and multi-core ARM server chips, and sell all the CPUs they can.
  • mode_13h - Tuesday, November 9, 2021 - link

    Well, it's a start that Intel is already using these cores in *some* kind of server CPU, no? That suggests they already should have some server-grade RAS features built-in. So, it should be a fairly small step to use them in a high core count CPU to counter the Gravitons and Altras. I think they will, since it should be more competitive in terms of perf/W.

    As for workstations, I think you'll need to find a workstation board with a server CPU socket. I doubt they'll be pushing massive E-core -only CPUs specifically for workstations, since workstation users also tend to care about single-thread performance.
  • anemusek - Sunday, November 7, 2021 - link

    Sorry but performance it isn't all +- a few percent in the real world will not restore confidence. Critical flaws, disabling functionality (dx12 in hanswell for example), instabbility instruction features etc.
    I cannot afford to trust such a company
  • Dolda2000 - Sunday, November 7, 2021 - link

    I just wanted to add a big Kudos for this article. AnandTech's coverage of the 12900K was by a wide margin the best of any I read or watched, with regards to coverage of the various variables involved, and with the breadth and depth of testing. Thanks for keeping it up!
  • chantzeleong - Monday, November 8, 2021 - link

    I run Power bi and tensorflow with large dataset. Which Intel CPU do you recommend and why?
  • mode_13h - Tuesday, November 9, 2021 - link

    I don't know about "Power bi", but Tensorflow should run best on GPUs. Which CPU to get then depends on how many GPUs you're going to use. If >= 3, then Threadripper. Otherwise, go for Alder Lake or Ryzen 5000 series.

    You'll probably find the best advice among user communities for those specific apps.
  • velanapontinha - Monday, November 8, 2021 - link

    We've seen this before. It is time to short AMD, unfortunately.

Log in

Don't have an account? Sign up now