CPU Benchmark Performance: E-Core

In this batch of testing, we're focusing primarily on the E-cores. Intel claimed that the performance was around the level of its Skylake generation of processors (6th Gen to 10th Gen, depending which slide you read), and we had to put that to the test. In this instance, we're comparing to the flagship Skylake processor, the Core i7-6700K, which offered 4C/8T at 91 W. We also did a number of multi-threaded tests to see where the E-cores would line up.

In order to enable E-core only operation, we used affinity masks.

Single Threaded

(3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr(3-3) Dolphin 5.0 Render Test(4-8a) CineBench R20 Single Thread

(8-1c) Geekbench 5 Single Thread

In these few tests, we can see that the E-core is almost there at 4.2 GHz Skylake. Moving down to 3.9 GHz, perhaps something like the i7-6700, would put it on par. 

Multi-Thread Tests

(1-1) Agisoft Photoscan 1.3, Complex Test(2-1) 3D Particle Movement v2.1 (non-AVX)(2-2) 3D Particle Movement v2.1 (Peak AVX)(2-5) NAMD ApoA1 Simulation(2-6) AI Benchmark 0.1.2 Total(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)(4-2) Corona 1.3 Benchmark(4-3a) Crysis CPU Render at 320x200 Low(4-5) V-Ray Renderer(4-8b) CineBench R20 Multi-Thread(5-1a) Handbrake 1.3.2, 1080p30 H264 to 480p Discord(5-1b) Handbrake 1.3.2, 1080p30 H264 to 720p YouTube(5-1c) Handbrake 1.3.2, 1080p30 H264 to 4K60 HEVC(5-2c) 7-Zip 1900 Combined Score(5-3) AES Encoding(5-4) WinRAR 5.90 Test, 3477 files, 1.96 GB(7-1) Kraken 1.1 Web Test(7-2) Google Octane 2.0 Web Test(7-3) Speedometer 2.0 Web Test(8-1d) Geekbench 5 Multi-Thread

Having a full eight E-cores compared to Skylake's 4C/8T arrangement helps in a lot of scenarios that are compute limited. When we move to more memory limited environments, or with cross-talk, then the E-cores are a bit more limited due to the cache structure and the long core-to-core latencies. Even with DDR5 in tow, the E-cores can be marginal to the Skylake, for example in WinRAR which tends to benefit from cache and memory bandwidth.

CPU Tests: SPEC MT Performance - P and E-Core Scaling CPU Benchmark Performance: Windows 11 vs Windows 10
Comments Locked

474 Comments

View All Comments

  • mode_13h - Monday, November 15, 2021 - link

    Do you know, for a fact, that the new scheduling policies override the priority-boost you mentioned? I wouldn't assume so, but I'm not saying they don't.

    Maybe I'm optimistic, but I think MS is smart enough to know there are realtime services that don't necessarily have focus and wouldn't break that usage model.
  • ZioTom - Monday, November 29, 2021 - link

    Windows 11 scheduler fails to allocate workloads...
    I noticed that the scheduler parks the cores if the application isn't full screen.
    I did a test on a 12700k with Handbrake: as long as the program window remains in the foreground, all the Pcore and Ecore are allocated at 100%. If I open a browser and use it while the movie is being compressed, the kernel takes the load off the Pcore and runs the video compression only on the Ecores. Absurd behavior, absolutely useless!
  • alpha754293 - Wednesday, January 12, 2022 - link

    I have my 12900K for a little less than a month now and here's what I've found from the testing that I've done with the CPU:

    (Hardware notes/specs: Asus Z690 Prime-P D4 motherboard, 4x Crucial 32 GB DDR4-3200 unbuffered, non-ECC RAM (128 GB total), running CentOS 7.7.1908 with the 5.14.15 kernel)

    IF your workload CAN be multithreaded and it can run on BOTH the P cores AND the E cores simultaneously, then there is a potential that you can have better performance than the 5950X. BUT if you CAN'T run your application on both the P cores and the E cores at the same time (which a number of distributed parallel applications that rely on MPI), then you WON'T be able to realise the performance advantages that having both said P cores and E cores would give you (based on what the benchmark results show).

    And if your program, further, cannot use HyperThreading (which some HPC/CAE program will actually lock you out of doing so), then you can be upwards of anywhere between 63-81% SLOWER than the 5950X (because on the 5950X, even with SMT disabled, you can still run the programme on all 16 physical cores, vs. the 8 P cores on the 12900K).

    Please take note.
  • alceryes - Wednesday, August 24, 2022 - link

    Question.
    Did you use 'affinities' for all the different core tests (P-core only, P+E-core tests)?

Log in

Don't have an account? Sign up now