CPU MT Performance: A Real Monster

What’s more interesting than ST performance, is MT performance. With 8 performance cores and 2 efficiency cores, this is now the largest iteration of Apple Silicon we’ve seen.

As a prelude into the scores, I wanted to remark some things on the previous smaller M1 chip. The 4+4 setup on the M1 actually resulted that a significant chunk of the MT performance being enabled by the E-cores, with the SPECint score in particular seeing a +33% performance boost versus just the 4 P-cores of the system. Because the new M1 Pro and Max have 2 less E-cores, just assuming linear scaling, the theoretical peak of the M1 Pro/Max should be +62% over the M1. Of course, the new chips should behave better than linear, due to the better memory subsystem.

In the detailed scores I’m showcasing the full 8+2 scores of the new chips, and later we’ll talk about the 8 P scores in context. I hadn’t run the MT scores of the new Fortran compiler set on the M1 and some numbers will be missing from the charts because of that reason.

SPECint2017 Rate-N Estimated Scores

Looking at the data – there’s very evident changes to Apple’s performance positioning with the new 10-core CPU. Although, yes, Apple does have 2 additional cores versus the 8-core 11980HK or the 5980HS, the performance advantages of Apple’s silicon is far ahead of either competitor in most workloads. Again, to reiterate, we’re comparing the M1 Max against Intel’s best of the best, and also nearly AMD’s best (The 5980HX has a 45W TDP).

The one workload standing out to me the most was 502.gcc_r, where the M1 Max nearly doubles the M1 score, and lands in +69% ahead of the 11980HK. We’re seeing similar mind-boggling performance deltas in other workloads, memory bound tests such as mcf and omnetpp are evidently in Apple’s forte. A few of the workloads, mostly more core-bound or L2 resident, have less advantages, or sometimes even fall behind AMD’s CPUs.

SPECfp2017 Rate-N Estimated Scores

The fp2017 suite has more workloads that are more memory-bound, and it’s here where the M1 Max is absolutely absurd. The workloads that put the most memory pressure and stress the DRAM the most, such as 503.bwaves, 519.lbm, 549.fotonik3d and 554.roms, have all multiple factors of performance advantages compared to the best Intel and AMD have to offer.

The performance differences here are just insane, and really showcase just how far ahead Apple’s memory subsystem is in its ability to allow the CPUs to scale to such degree in memory-bound workloads.

Even workloads which are more execution bound, such as 511.porvray or 538.imagick, are – albeit not as dramatically, still very much clearly in favour of the M1 Max, achieving significantly better performance at drastically lower power.

We noted how the M1 Max CPUs are not able to fully take advantage of the DRAM bandwidth of the chip, and as of writing we didn’t measure the M1 Pro, but imagine that design not to score much lower than the M1 Max here. We can’t help but ask ourselves how much better the CPUs would score if the cluster and fabric would allow them to fully utilise the memory.

SPEC2017 Rate-N Estimated Total

In the aggregate scores – there’s two sides. On the SPECint work suite, the M1 Max lies +37% ahead of the best competition, it’s a very clear win here and given the power levels and TDPs, the performance per watt advantages is clear. The M1 Max is also able to outperform desktop chips such as the 11900K, or AMD’s 5800X.

In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.

We also ran the chip with just the 8 performance cores active, as expected, the scores are a little lower at -7-9%, the 2 E-cores here represent a much smaller percentage of the total MT performance than on the M1.

Apple’s stark advantage in specific workloads here do make us ask the question how this translates into application and use-cases. We’ve never seen such a design before, so it’s not exactly clear where things would land, but I think Apple has been rather clear that their focus with these designs is catering to the content creation crowd, the power users who use the large productivity applications, be it in video editing, audio mastering, or code compiling. These are all areas where the microarchitectural characteristics of the M1 Pro/Max would shine and are likely vastly outperform any other system out there.

CPU ST Performance: Not Much Change from M1 GPU Performance: 2-4x For Productivity, Mixed Gaming
Comments Locked

493 Comments

View All Comments

  • caribbeanblue - Saturday, October 30, 2021 - link

    Lol, you're just a troll at this point.
  • sharath.naik - Monday, October 25, 2021 - link

    The only reason M1 falls behind 3060 RTX is because the games are emulated.. if native M1 will match 3080. This is remarkable.. time for others to shift over to the same shared high bandwith memory on chip.
  • vlad42 - Monday, October 25, 2021 - link

    Go back and reread the article. Andrei explicitly mentioned that the games were GPU bound, not CPU bound. Here are the relevant quotes:

    Shadow of the Tomb Raider:
    "We have to go to 4K just to help the M1 Max fully stretch its legs. Even then the 16-inch MacBook Pro is well off the 6800M. Though we’re definitely GPU-bound at this point, as reported by both the game itself, and demonstrated by the 2x performance scaling from the M1 Pro to the M1 Max."

    Borderlands 3:
    "The game seems to be GPU-bound at 4K, so it’s not a case of an obvious CPU bottleneck."
  • web2dot0 - Tuesday, October 26, 2021 - link

    I heard otherwise on m1 optimized games like WoW
  • AshlayW - Tuesday, October 26, 2021 - link

    4096 ALU at 1.3 GHz vs 6144 ALU at 1.4-1.5 Ghz? What makes you think Apple's GPU is magic sauce?
  • Ppietra - Tuesday, October 26, 2021 - link

    Not going to argue that Apple's GPU is better, however the number of ALU and clock speed doesn’t tell the all story.
    Sometimes it can be faster not because it can work more but because it reduces some bottlenecks and because it works in a smarter way (by avoiding doing work that is not necessary for the end result).
  • jospoortvliet - Wednesday, October 27, 2021 - link

    Thing is also that the game devs didn't write their game for and test on these gpus and drivers. Nor did Apple write or optimize their drivers for these games. Both of these can easily make high-double digit differences, so being 50% slower on a fully new platform without any optimizations and running half-emulated code is very promising.
  • varase - Thursday, November 4, 2021 - link

    Apple isn't interested in producing chips - they produce consumer electronics products.

    If they wanted to they could probably trash AMD and Intel by selling their silicon - but customers would expect them to remain static and support their legacy stuff forever.

    When Apple finally decided ARMv7 was unoptimizable, they wrote 32 bit support out of iOS and dropped those logic blocks from their CPUs in something like 2 years. No one else can deprecate and shed baggage so quickly which is how they maintain their pace of innovation.
  • halo37253 - Monday, October 25, 2021 - link

    Apple's GPU isn't magic. It is not going to be any more efficient than what Nvidia or AMD have.

    Clearly a Apple GPU that only uses around 65watts is going to compete with a Nvidia or AMD GPU that only uses around 65watts in actual usage.

    Apple clearly has a node advantage at work here, and with that being said. It is clear to see that when it comes to actual workloads like games, Apple still has some work to do efficiency wise. As their GPU in the same performance/watt range compared to a Nvidia chip in the same performance/watt range on a older and not as power efficient node is able to still do better.

    Apple's GPU is a compute champ and great for workloads that avg user will never see. This is why the M1 Pro makes a lot more sense then the M1 Max. The M1 Max seems like it will do fine for light gaming, but the cost of that chip must be crazy. It is a huge chip. Would love to see one in a mac mini.
  • misan - Monday, October 25, 2021 - link

    Just replace GPU by CPU and you will see how devoid of logic your argument is.

    Apple has much more experience in low-power GPU design. Their silicon is specifically optimized for low-power usage. Why wouldn't it be more efficient than the competitors?

    Besides, Andreis' test already confirm that your claims are pure speculation without any factual basis. Look at the power usage tests for the GFXbench. Almost three times lower power consumption with a better overall result.

    These GPUs are incredible rasterizers. It's that you look at bad quality game ports and decide that they reflect the maximal possible reachable performance. Sure, GFXbench is crap, then look at Wild Life Extreme. That result translates to 20k points. Thats on par with the mobile RTX 3070 at 100W.

Log in

Don't have an account? Sign up now