Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

Name: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Item: Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
Author: Andrei Frumusanu

by Andrei Frumusanu on October 25, 2021 9:00 AM EST

493 Comments | Add A Comment

493 Comments

Power Behaviour: No Real TDP, but Wide Range

Last year when we reviewed the M1 inside the Mac mini, we did some rough power measurements based on the wall-power of the machine. Since then, we learned how to read out Apple’s individual CPU, GPU, NPU and memory controller power figures, as well as total advertised package power. We repeat the exercise here for the 16” MacBook Pro, focusing on chip package power, as well as AC active wall power, meaning device load power, minus idle power.

Apple doesn’t advertise any TDP for the chips of the devices – it’s our understanding that simply doesn’t exist, and the only limitation to the power draw of the chips and laptops are simply thermals. As long as temperature is kept in check, the silicon will not throttle or not limit itself in terms of power draw. Of course, there’s still an actual average power draw figure when under different scenarios, which is what we come to test here:

Apple MacBook Pro 16 M1 Max Power Behaviour

Starting off with device idle, the chip reports a package power of around 200mW when doing nothing but idling on a static screen. This is extremely low compared to competitor designs, and is likely a reason Apple is able achieve such fantastic battery life. The AC wall power under idle was 7.2W, this was on Apple’s included 140W charger, and while the laptop was on minimum display brightness – it’s likely the actual DC battery power under this scenario is much lower, but lacking the ability to measure this, it’s the second-best thing we have. One should probably assume a 90% efficiency figure in the AC-to-DC conversion chain from 230V wall to 28V USB-C MagSafe to whatever the internal PMIC usage voltage of the device is.

In single-threaded workloads, such as CineBench r23 and SPEC 502.gcc_r, both which are more mixed in terms of pure computation vs also memory demanding, we see the chip report 11W package power, however we’re just measuring a 8.5-8.7W difference at the wall when under use. It’s possible the software is over-reporting things here. The actual CPU cluster is only using around 4-5W under this scenario, and we don’t seem to see much of a difference to the M1 in that regard. The package and active power are higher than what we’ve seen on the M1, which could be explained by the much larger memory resources of the M1 Max. 511.povray is mostly core-bound with little memory traffic, package power is reported less, although at the wall again the difference is minor.

In multi-threaded scenarios, the package and wall power vary from 34-43W on package, and wall active power from 40 to 62W. 503.bwaves stands out as having a larger difference between wall power and reported package power – although Apple’s powermetrics showcases a “DRAM” power figure, I think this is just the memory controllers, and that the actual DRAM is not accounted for in the package power figure – the extra wattage that we’re measuring here, because it’s a massive DRAM workload, would be the memory of the M1 Max package.

On the GPU side, we lack notable workloads, but GFXBench Aztec High Offscreen ends up with a 56.8W package figure and 69.80W wall active figure. The GPU block itself is reported to be running at 43W.

Finally, stressing out both CPU and GPU at the same time, the SoC goes up to 92W package power and 120W wall active power. That’s quite high, and we haven’t tested how long the machine is able to sustain such loads (it’s highly environment dependent), but it very much appears that the chip and platform don’t have any practical power limit, and just uses whatever it needs as long as temperatures are in check.

	M1 Max MacBook Pro 16"			Intel i9-11980HK MSI GE76 Raider
	Score	Package Power (W)	Wall Power Total - Idle (W)	Score	Package Power (W)	Wall Power Total - Idle (W)
Idle		0.2	7.2 (Total)		1.08	13.5 (Total)
CB23 ST	1529	11.0	8.7	1604	30.0	43.5
CB23 MT	12375	34.0	39.7	12830	82.6	106.5
502 ST	11.9	11.0	9.5	10.7	25.5	24.5
502 MT	74.6	36.9	44.8	46.2	72.6	109.5
511 ST	10.3	5.5	8.0	10.7	17.6	28.5
511 MT	82.7	40.9	50.8	60.1	79.5	106.5
503 ST	57.3	14.5	16.8	44.2	19.5	31.5
503 MT	295.7	43.9	62.3	60.4	58.3	80.5
Aztec High Off	307fps	56.8	69.8	266fps	35 + 144	200.5
Aztec+511MT		92.0	119.8		78 + 142	256.5

Comparing the M1 Max against the competition, we resorted to Intel’s 11980HK on the MSI GE76 Raider. Unfortunately, we wanted to also do a comparison against AMD’s 5980HS, however our test machine is dead.

In single-threaded workloads, Apple’s showcases massive performance and power advantages against Intel’s best CPU. In CineBench, it’s one of the rare workloads where Apple’s cores lose out in performance for some reason, but this further widens the gap in terms of power usage, whereas the M1 Max only uses 8.7W, while a comparable figure on the 11980HK is 43.5W.

In other ST workloads, the M1 Max is more ahead in performance, or at least in a similar range. The performance/W difference here is around 2.5x to 3x in favour of Apple’s silicon.

In multi-threaded tests, the 11980HK is clearly allowed to go to much higher power levels than the M1 Max, reaching package power levels of 80W, for 105-110W active wall power, significantly more than what the MacBook Pro here is drawing. The performance levels of the M1 Max are significantly higher than the Intel chip here, due to the much better scalability of the cores. The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance, meaning the perf/W at ISO-perf would be even higher than this.

On the GPU side, the GE76 Raider comes with a GTX 3080 mobile. On Aztec High, this uses a total of 200W power for 266fps, while the M1 Max beats it at 307fps with just 70W wall active power. The package powers for the MSI system are reported at 35+144W.

Finally, the Intel and GeForce GPU go up to 256W power daw when used together, also more than double that of the MacBook Pro and its M1 Max SoC.

The 11980HK isn’t a very efficient chip, as we had noted it back in our May review, and AMD’s chips should fare quite a bit better in a comparison, however the Apple Silicon is likely still ahead by extremely comfortable margins.

Huge Memory Bandwidth, but not for every Block CPU ST Performance: Not Much Change from M1

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

493 Comments

View All Comments

celeste_P - Tuesday, October 26, 2021 - link
Does any one know where can I find the policy about translating/reprinting the article? Do AnandTech allow such behavior? What are the policies that one needs to follow?
This article is quite interesting and I want to translate/publish it on Chinese website to share with a broader range of people
colinstalter - Wednesday, October 27, 2021 - link
Why not just share the URL on the Chinese page? Do people in China not have translator functions built into their web browsers like Chrome does?
celeste_P - Wednesday, October 27, 2021 - link
Of course they do XD
But as you can imagine, the quality of machine translation won't be that great, especially considering all these domain specific terms within this article.
ABR - Tuesday, October 26, 2021 - link
An excellent review.
ajmas - Tuesday, October 26, 2021 - link
Given the number of games already available and running on iOS, I wonder how much work would be involved in making them available on macOS?

As for effective performance, I am eagerly waiting to see what the real world tests reveal, since specs only say so much.
mandirabl - Wednesday, October 27, 2021 - link
As a developer, technically you don't have to do much, just re-compile the game and check another box (for Mac), basically.

The problem is: iOS games are mostly touch-focused, whereas macOS is mouse-first. So they have to check if that translates without changing anything. If it does, it's a matter of a couple of minutes. If it doesn't translate well ... they have a choice to release it anyway or blocking access on macOS. Yes, developers have to actually decide against releasing their app/game for macOS - if they don't do anything in that regard, the app/game simply shows up in an App Store search on a Mac.
Kevin45 - Tuesday, October 26, 2021 - link
Apple's goal is very simple: If you are going to provide SW tools for Pro users of the MacOS platform, you write to Metal - period.

It IS the most superior way to take advantage of what Apple has laid out to developers and Apple's Pro users absolutely want the HW tools they buy to be max'd out by the developers.

Apple has taken an approach Intel and AMD cannot. Unified memory design aside, Apple has looked at it's creative markets and developed sub-cores, which for this Creative focus segment, Apple markets as it's "Media Engine" which has hardware h.264 and hardware ProRes compute, which just crush these formats and codecs.

The argument "Yah, but the CPU and GPU cores aren't the most powerful that one can buy." is still. They don't need to be because they have dedicated cores to where the power needs to be. Sure, in a Wintel world, or Linux space, more powerful GPU and CPU cores is all they've got. So when talking those worlds indeed that's the correct argument. Not when talking Apple HW with Apple silicon.

Intel has fought nVIDIA to have their beefier and beefier cores do heavy lifting, while nVIDA wants the GPU to be the most important play in the mix. Apple has broken out their SoC into many sub-sets to meet the high compute needs of it's user base.

Now more than ever, developers that have drug their feet, need to get onboard. As companies continue to show off - such as Apple with FCP, Motion and Compressor optimized apps for the hardware, even DaVinci (niche player but powerful), they put pressure on other players such as sloth-boy Adobe, to get going and truly write for Apple's tools that take advantage of such well thought out HW + SW combo.
richardnpaul - Tuesday, October 26, 2021 - link
The article comes across a bit fanbioy. (yes, yes I know that this is usually the case here but I just wanted to say it out loud again). See below for why.

You have covered in depth things like how the increased L3 design between Zen2 and 3 can cause big jumps in performance and what was missing here was discussion of how the 24/48MB cache between the memory interface impacts performance especially when using the GPU (we've seen this last year AMD's designs doing exactly this to improve performance of their designs by reducing the impact of calling out to the slow GDDR6 RAM.)

The GPU is nothing special. 10Tflops at 1.3GHz puts it around the same class as a Vega64, a 14nm design, which similarly used RAM packaged on an interposer with the GPU (being 14nm it was big, 5nm makes it much more reasonable). With the buffer cache I'd expect it might perform better, also the CPUs will bump up performance (just look at how much more FPS you get with Zen3 over Zen2 and with Zen3 with vcache it'll be another 15% more on top from exactly the same GPU hardware and that's with the CPU and GPU having to talk over PCI-E).

Also, Apple have made themselves second class gaming citizens with their decision to build Mantle and enforce it as the only API (I may be mistaken here but as far as I'm aware the whole reason for Molten is because you have to use Metal on MacOS and developers have introduced this Vulkan to Metal shim to ease porting). Also, as I understand it, you can't connect external dGPUs via Thunderbolt to provide comparisons. Apple's vendor lock-in at it's worst (have I mentioned that Apple are their own worst enemy a lot of the time?)

As such the gaming performance doesn't surprise me, this is a technically much slower and inferior GPU to AMD and nVIDIAs current designs on an older process (7nm and 8nm respectively). The cost is that whilst these are faster, they're larger and more power hungry though a die shrink of bring something like an AMD 6600 based chip into the same ballpark.

Also on the 512bit memory interface I'd probably look at it more like 384bit plus 128bit, which is the GPU plus the usual CPU interfaces. The CPU is always gojng to contend for some of that 512bit interface, so you're never going to see 512bit for the GPU, on the other hand, you get what ever the cpu doesn't use for free, which is a great bonus of this design, and if the CPU needs more than a 128bit interface can manage it has access to that too if the GPU isn't heavily loaded on the memory interface.

I kind of expect you guys to cover all this though in the article, not have me railing at the lack of it in the comments section.
richardnpaul - Tuesday, October 26, 2021 - link
Oh and you failed to ever mention that the trade-off of the design is that you need to buy all the RAM you'll ever need up front because it's soldered to the SoC package. The reason that we don't normally see such designs is that the trade-off is potentially expensive unsaleable parts. The cost of these laptops are way above the usual and whilst they have some really nice tech this is one of the other downsides of this design (and the 5nm node and the amount of silicon).
OreoCookie - Tuesday, October 26, 2021 - link
Or perhaps Anandtech gave it a glowing review simply because the M1 Max is fast and energy efficient at the same time? In memory intensive benchmarks it was 2-5 x faster than the x86 competition while being more energy efficient. What more do you want?

And the article *was* including a Zen 3 mobile part in its comparison and the M1 Max was faster while consuming less energy. Since the V-Cache version of Zen 3 hasn't been released yet, there are no benchmarks for Anandtech to release as they either haven't been run yet or are under embargo.

Lastly, this article is about some of the low-level capabilities of the hardware, not vendor lock-in or whether Metal is better or worse than Vulkan. They did not even test the ML accelerator or hardware codec bits (which is completely fair).

Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

Power Behaviour: No Real TDP, but Wide Range

Post Your Comment

493 Comments

View All Comments

celeste_P - Tuesday, October 26, 2021 - link

colinstalter - Wednesday, October 27, 2021 - link

celeste_P - Wednesday, October 27, 2021 - link

ABR - Tuesday, October 26, 2021 - link

ajmas - Tuesday, October 26, 2021 - link

mandirabl - Wednesday, October 27, 2021 - link

Kevin45 - Tuesday, October 26, 2021 - link

richardnpaul - Tuesday, October 26, 2021 - link

richardnpaul - Tuesday, October 26, 2021 - link

OreoCookie - Tuesday, October 26, 2021 - link

Log in

Don't have an account? Sign up now