Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

Name: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Item: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Author: Andrei Frumusanu

by Andrei Frumusanu on October 25, 2021 9:00 AM EST

493 Comments | Add A Comment

493 Comments

CPU MT Performance: A Real Monster

What’s more interesting than ST performance, is MT performance. With 8 performance cores and 2 efficiency cores, this is now the largest iteration of Apple Silicon we’ve seen.

As a prelude into the scores, I wanted to remark some things on the previous smaller M1 chip. The 4+4 setup on the M1 actually resulted that a significant chunk of the MT performance being enabled by the E-cores, with the SPECint score in particular seeing a +33% performance boost versus just the 4 P-cores of the system. Because the new M1 Pro and Max have 2 less E-cores, just assuming linear scaling, the theoretical peak of the M1 Pro/Max should be +62% over the M1. Of course, the new chips should behave better than linear, due to the better memory subsystem.

In the detailed scores I’m showcasing the full 8+2 scores of the new chips, and later we’ll talk about the 8 P scores in context. I hadn’t run the MT scores of the new Fortran compiler set on the M1 and some numbers will be missing from the charts because of that reason.

SPECint2017 Rate-N Estimated Scores

Looking at the data – there’s very evident changes to Apple’s performance positioning with the new 10-core CPU. Although, yes, Apple does have 2 additional cores versus the 8-core 11980HK or the 5980HS, the performance advantages of Apple’s silicon is far ahead of either competitor in most workloads. Again, to reiterate, we’re comparing the M1 Max against Intel’s best of the best, and also nearly AMD’s best (The 5980HX has a 45W TDP).

The one workload standing out to me the most was 502.gcc_r, where the M1 Max nearly doubles the M1 score, and lands in +69% ahead of the 11980HK. We’re seeing similar mind-boggling performance deltas in other workloads, memory bound tests such as mcf and omnetpp are evidently in Apple’s forte. A few of the workloads, mostly more core-bound or L2 resident, have less advantages, or sometimes even fall behind AMD’s CPUs.

SPECfp2017 Rate-N Estimated Scores

The fp2017 suite has more workloads that are more memory-bound, and it’s here where the M1 Max is absolutely absurd. The workloads that put the most memory pressure and stress the DRAM the most, such as 503.bwaves, 519.lbm, 549.fotonik3d and 554.roms, have all multiple factors of performance advantages compared to the best Intel and AMD have to offer.

The performance differences here are just insane, and really showcase just how far ahead Apple’s memory subsystem is in its ability to allow the CPUs to scale to such degree in memory-bound workloads.

Even workloads which are more execution bound, such as 511.porvray or 538.imagick, are – albeit not as dramatically, still very much clearly in favour of the M1 Max, achieving significantly better performance at drastically lower power.

We noted how the M1 Max CPUs are not able to fully take advantage of the DRAM bandwidth of the chip, and as of writing we didn’t measure the M1 Pro, but imagine that design not to score much lower than the M1 Max here. We can’t help but ask ourselves how much better the CPUs would score if the cluster and fabric would allow them to fully utilise the memory.

SPEC2017 Rate-N Estimated Total

In the aggregate scores – there’s two sides. On the SPECint work suite, the M1 Max lies +37% ahead of the best competition, it’s a very clear win here and given the power levels and TDPs, the performance per watt advantages is clear. The M1 Max is also able to outperform desktop chips such as the 11900K, or AMD’s 5800X.

In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.

We also ran the chip with just the 8 performance cores active, as expected, the scores are a little lower at -7-9%, the 2 E-cores here represent a much smaller percentage of the total MT performance than on the M1.

Apple’s stark advantage in specific workloads here do make us ask the question how this translates into application and use-cases. We’ve never seen such a design before, so it’s not exactly clear where things would land, but I think Apple has been rather clear that their focus with these designs is catering to the content creation crowd, the power users who use the large productivity applications, be it in video editing, audio mastering, or code compiling. These are all areas where the microarchitectural characteristics of the M1 Pro/Max would shine and are likely vastly outperform any other system out there.

CPU ST Performance: Not Much Change from M1 GPU Performance: 2-4x For Productivity, Mixed Gaming

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

493 Comments

View All Comments

vlad42 - Monday, October 25, 2021 - link
And there you go making pure speculative claims without any factual basis for the quality of the ports. I could similarly make absurd claims such as every benchmark Intel's CPU looses is because that is just a bad port. Provide documented evidence it is a bad port as you are the one making that claim (and not bad Apple drivers, thermal throttling because they would not turn on the fans until the chip hit 85C, etc.).

Face it, in the real world benchmarks this article provides, AMD's and Nvidia's GPUs are roughly 50% faster than Apple's M1 Max GPU.

Also, a full node shrink and integrating a dGPU into the SOC would make it much more energy efficient. The node shrink should be obvious and this site has repeatedly demonstrated the significant energy efficiency benefits of integrating discrete components, such as GPUs, into the SOCs.
jospoortvliet - Wednesday, October 27, 2021 - link
Well they are 100% sure bad ports as this gpu didn't exist. The games are written for a different platform, different gpus and different drivers. That they perform far from optimal must be obvious as fsck - driver optimization for specific games and game optimization for specific cards, vendors and even drivers usually make the difference between amd and nvidia - 20-50% between entirely unoptimized (this) and final is not even remotely rare. So yeah this is an absolute worst case. And Aztec Ruins shows the potential when (mildly?) optimized - nearly 3080 levels of performance.
Blastdoor - Monday, October 25, 2021 - link
Apple's GPU isn't magic, but the advantage is real and it's not just the node. Apple has made a design choice to achieve a given performance level through more transistors rather than more Hz. This is true of both their CPU and GPU designs, actually. PC OEMs would rather pay less for a smaller, hotter chip and let their customers eat the electricity costs and inconvenience of shorter battery life and hotter devices. Apple's customers aren't PC OEMs, though, they're real people. And not just any real people, real people with $$ to spend and good taste .
markiz - Tuesday, October 26, 2021 - link
When you say "Apple has made a design choice", who did in fact make that choice? Can it e attributed to an individual?
Also, why is nobody else making this choice? Simply economics, or other reasons?
markiz - Tuesday, October 26, 2021 - link
Apple customers having $$ and taste, at a time where 60% of USA has an iphone can not exactly be true. Every loser these days has an iphone.

I know you were likely being specific in regards to Macbooks Pros, so I guess both COULD be true, but does sound very bad to say it.
michael2k - Monday, October 25, 2021 - link
That would be true if there were and AMD or NVIDIA GPU manufactured on TSMC N5P node.

Since there isn't, a 65W Apple GPU will perform like a 93W AMD GPU at N7, and slightly higher still for an NVIDIA GPU at Samsung 8nm.

That is probably the biggest reason they're so competitive. At 5nm they can fit far more transistors and clock them far lower than AMD or NVIDIA. In a desktop you can imagine they can clock higher 1.3GHz to push performance even higher. 2x perf at 2.6GHz, and power usage would only go up from 57W to 114W if there is no need to increase voltage when driving the GPU that fast.
Wrs - Monday, October 25, 2021 - link
All the evidence says M1 Max has more resources and outperforms the RTX 3060 mobile. But throw crappy/Rosetta code at the former and performance can very well turn into a wash. I don't expect that to change as Macs are mainly mobile and AAA gaming doesn't originate on mobile because of the restrictive thermals. It's just that Windows laptops are optimized for the exact same code as the desktops, so they have an easy time outperforming the M1's on games originating on Windows.

When I wanna game seriously, I use a Windows desktop or a console, which outperforms any laptop by the same margin as Windows beats Mac OS/Rosetta in game efficiency. TDP is 250-600w (the consoles are more efficient because of Apple-like integration). Any gaming I'd do on a Windows laptop or an M1 is just casual. There are plenty of games already optimized for M1 btw - they started on iOS. /shrug
Blastdoor - Tuesday, October 26, 2021 - link
As things stand now, the Windows advantage in gaming is huge, no doubt.

But any doubt about Apple's commitment to the Mac must surely be gone now. Apple has invested serious resources in the Mac, from top to bottom. If they've gone to all the work of creating Metal and these killer SOCs, why not take one more step and invest some money+time in getting optimized AAA games available on these machines? At this point, with so many pieces in place, it almost seems silly not to make that effort.
techconc - Monday, October 25, 2021 - link
It's hard to speak about these GPUs for gaming performance when the games you choose to run for your benchmark are Intel native and have to run under emulation. That's not exactly a showcase for native gaming performance.
sean8102 - Tuesday, October 26, 2021 - link
What games could they have used? The only two somewhat demanding ARM native macOS games are WoW, and Baldur's Gate 3.

Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

CPU MT Performance: A Real Monster

Post Your Comment

493 Comments

View All Comments

vlad42 - Monday, October 25, 2021 - link

jospoortvliet - Wednesday, October 27, 2021 - link

Blastdoor - Monday, October 25, 2021 - link

markiz - Tuesday, October 26, 2021 - link

markiz - Tuesday, October 26, 2021 - link

michael2k - Monday, October 25, 2021 - link

Wrs - Monday, October 25, 2021 - link

Blastdoor - Tuesday, October 26, 2021 - link

techconc - Monday, October 25, 2021 - link

sean8102 - Tuesday, October 26, 2021 - link

Log in

Don't have an account? Sign up now