Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

Name: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Item: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Author: Andrei Frumusanu

by Andrei Frumusanu on October 25, 2021 9:00 AM EST

493 Comments | Add A Comment

493 Comments

Last week, Apple had unveiled their new generation MacBook Pro laptop series, a new range of flagship devices that bring with them significant updates to the company’s professional and power-user oriented user-base. The new devices particularly differentiate themselves in that they’re now powered by two new additional entries in Apple’s own silicon line-up, the M1 Pro and the M1 Max. We’ve covered the initial reveal in last week’s overview article of the two new chips, and today we’re getting the first glimpses of the performance we’re expected to see off the new silicon.

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors

Starting off with the M1 Pro, the smaller sibling of the two, the design appears to be a new implementation of the first generation M1 chip, but this time designed from the ground up to scale up larger and to more performance. The M1 Pro in our view is the more interesting of the two designs, as it offers mostly everything that power users will deem generationally important in terms of upgrades.

At the heart of the SoC we find a new 10-core CPU setup, in a 8+2 configuration, with there being 8 performance Firestorm cores and 2 efficiency Icestorm cores. We had indicated in our initial coverage that it appears that Apple’s new M1 Pro and Max chips is using a similar, if not the same generation CPU IP as on the M1, rather than updating things to the newer generation cores that are being used in the A15. We seemingly can confirm this, as we’re seeing no apparent changes in the cores compared to what we’ve discovered on the M1 chips.

The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz.

The two E-cores in the system clock at up to 2064MHz, and as opposed to the M1, there’s only two of them this time around, however, Apple still gives them their full 4MB of L2 cache, same as on the M1 and A-derivative chips.

One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.

We’ve been able to identify the “SLC”, or system level cache as we call it, to be falling in at 24MB for the M1 Pro, and 48MB on the M1 Max, a bit smaller than what we initially speculated, but makes sense given the SRAM die area – representing a 50% increase over the per-block SLC on the M1.

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors

Above the M1 Pro we have Apple’s second new M1 chip, the M1 Max. The M1 Max is essentially identical to the M1 Pro in terms of architecture and in many of its functional blocks – but what sets the Max apart is that Apple has equipped it with much larger GPU and media encode/decode complexes. Overall, Apple has doubled the number of GPU cores and media blocks, giving the M1 Max virtually twice the GPU and media performance.

The GPU and memory interfaces of the chip are by far the most differentiated aspects of the chip, instead of a 16-core GPU, Apple doubles things up to a 32-core unit. On the M1 Max which we tested for today, the GPU is running at up to 1296MHz - quite fast for what we consider mobile IP, but still significantly slower than what we’ve seen from the conventional PC and console space where GPUs now can run up to around 2.5GHz.

Apple also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.

The memory controller caches are at 48MB in this chip, allowing for theoretically amplified memory bandwidth for various SoC blocks as well as reducing off-chip DRAM traffic, thus also reducing power and energy usage of the chip.

Apple’s die shot of the M1 Max was a bit weird initially in that we weren’t sure if it actually represents physical reality – especially on the bottom part of the chip we had noted that there appears to be a doubled up NPU – something Apple doesn’t officially disclose. A doubled up media engine makes sense as that’s part of the features of the chip, however until we can get a third-party die shot to confirm that this is indeed how the chip looks like, we’ll refrain from speculating further in this regard.

Huge Memory Bandwidth, but not for every Block

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

493 Comments

View All Comments

arglborps - Friday, March 25, 2022 - link
Exactly. In the world of video editing suites Premiere is the slowest, buggiest piece of crap you can think of, not really a great benchmark except for how fast to crash an app.
DaVinci and Final Cut run circles around it.
ikjadoon - Monday, October 25, 2021 - link
AnandTech literally tested the M1 Max on PugetBench Premiere Pro *in this article*. Surprise, surprise 955 points on standard, 868 on extended, thus just 4% slower than a desktop 5950X + desktop RTX 3080.

"biggest problem with the Apple eco system" Huh? Premiere Pro has already been written in Apple Silicon's arm64 for macOS. It's been months now.

>We’ll start with Puget System’s PugetBench for Premiere Pro, which is these days the de facto Premiere Pro benchmark. This test involves multiple playback and video export tests, as well as tests that apply heavily GPU-accelerated and heavily CPU-accelerated effects. So it’s more of an all-around system test than a pure GPU test, though that’s fitting for Premiere Pro giving its enormous system requirements.

You clearly did not read the article and a misinformed "slight" against Apple's SoC performance: "These benchmarks disagree with my narrative, so I need to change the benchmarks quickly now."

I don't get why so many people are addicted to their "Apple SoCs can't be good" narrative that they'll literally ignore:

1) the AnandTech article that benchmarked what they claimed never got benchmarked
2) the flurry of press when Adobe finally ported Premiere Pro to arm64
easp - Monday, October 25, 2021 - link
So if one can't really compare "real-world" benchmarks between platforms how are you so sure that Mac's fall-short?
sirmo - Monday, October 25, 2021 - link
We aren't shore of anything. Why are we even here?
SarahKerrigan - Monday, October 25, 2021 - link
Sure, OEM submissions are mostly nonsense. SPEC is a useful collection of real-world code streams, though. We use it for performance characterization of our new cores, and we have an internal database of results we've run inhouse for other CPUs too (currently including SPARC, Power, ARM, IPF, and x86 types.) Run with reasonable and comparable compiler settings, which Anandtech does, it's absolutely a useful indicator of real world performance, one of the best available.
schujj07 - Monday, October 25, 2021 - link
You are the first person I have talked to in industry that actually uses SPEC. All the other people I know have their own things they run to benchmark.
phr3dly - Monday, October 25, 2021 - link
I'm in the industry. As a mid-sized company we can't afford to buy every platform and test it with our workflow. So I identify the spec scores which tend to correlate to our own flows, and use those to guide the our platform evaluation decisions.

Looking at specific spec scores is a reasonable proxy for our own workloads.
0x16a1 - Monday, October 25, 2021 - link
uhhhh.... SPEC in the industry is still used. SPEC2000? Not anymore, and people have mostly moved off of 2006 too onto 2017.

But SPEC as a whole is still a useful aggregate benchmark. What others would you suggest?
sirmo - Monday, October 25, 2021 - link
It's a synthetic benchmark which claims that it isn't. But it very much is. Anything that's closed source and compiled by some 3rd party that can't be verified can be easily gamed.
Tamz_msc - Tuesday, October 26, 2021 - link
LOL more dumb takes. Majority of the benchmarks are licensed to SPEC under open-source licenses.

https://www.spec.org/cpu2017/Docs/licenses.html

Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors

The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors

Post Your Comment

493 Comments

View All Comments

arglborps - Friday, March 25, 2022 - link

ikjadoon - Monday, October 25, 2021 - link

easp - Monday, October 25, 2021 - link

sirmo - Monday, October 25, 2021 - link

SarahKerrigan - Monday, October 25, 2021 - link

schujj07 - Monday, October 25, 2021 - link

phr3dly - Monday, October 25, 2021 - link

0x16a1 - Monday, October 25, 2021 - link

sirmo - Monday, October 25, 2021 - link

Tamz_msc - Tuesday, October 26, 2021 - link

Log in

Don't have an account? Sign up now