CPU MT Performance: A Real Monster

What’s more interesting than ST performance, is MT performance. With 8 performance cores and 2 efficiency cores, this is now the largest iteration of Apple Silicon we’ve seen.

As a prelude into the scores, I wanted to remark some things on the previous smaller M1 chip. The 4+4 setup on the M1 actually resulted that a significant chunk of the MT performance being enabled by the E-cores, with the SPECint score in particular seeing a +33% performance boost versus just the 4 P-cores of the system. Because the new M1 Pro and Max have 2 less E-cores, just assuming linear scaling, the theoretical peak of the M1 Pro/Max should be +62% over the M1. Of course, the new chips should behave better than linear, due to the better memory subsystem.

In the detailed scores I’m showcasing the full 8+2 scores of the new chips, and later we’ll talk about the 8 P scores in context. I hadn’t run the MT scores of the new Fortran compiler set on the M1 and some numbers will be missing from the charts because of that reason.

SPECint2017 Rate-N Estimated Scores

Looking at the data – there’s very evident changes to Apple’s performance positioning with the new 10-core CPU. Although, yes, Apple does have 2 additional cores versus the 8-core 11980HK or the 5980HS, the performance advantages of Apple’s silicon is far ahead of either competitor in most workloads. Again, to reiterate, we’re comparing the M1 Max against Intel’s best of the best, and also nearly AMD’s best (The 5980HX has a 45W TDP).

The one workload standing out to me the most was 502.gcc_r, where the M1 Max nearly doubles the M1 score, and lands in +69% ahead of the 11980HK. We’re seeing similar mind-boggling performance deltas in other workloads, memory bound tests such as mcf and omnetpp are evidently in Apple’s forte. A few of the workloads, mostly more core-bound or L2 resident, have less advantages, or sometimes even fall behind AMD’s CPUs.

SPECfp2017 Rate-N Estimated Scores

The fp2017 suite has more workloads that are more memory-bound, and it’s here where the M1 Max is absolutely absurd. The workloads that put the most memory pressure and stress the DRAM the most, such as 503.bwaves, 519.lbm, 549.fotonik3d and 554.roms, have all multiple factors of performance advantages compared to the best Intel and AMD have to offer.

The performance differences here are just insane, and really showcase just how far ahead Apple’s memory subsystem is in its ability to allow the CPUs to scale to such degree in memory-bound workloads.

Even workloads which are more execution bound, such as 511.porvray or 538.imagick, are – albeit not as dramatically, still very much clearly in favour of the M1 Max, achieving significantly better performance at drastically lower power.

We noted how the M1 Max CPUs are not able to fully take advantage of the DRAM bandwidth of the chip, and as of writing we didn’t measure the M1 Pro, but imagine that design not to score much lower than the M1 Max here. We can’t help but ask ourselves how much better the CPUs would score if the cluster and fabric would allow them to fully utilise the memory.

SPEC2017 Rate-N Estimated Total

In the aggregate scores – there’s two sides. On the SPECint work suite, the M1 Max lies +37% ahead of the best competition, it’s a very clear win here and given the power levels and TDPs, the performance per watt advantages is clear. The M1 Max is also able to outperform desktop chips such as the 11900K, or AMD’s 5800X.

In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.

We also ran the chip with just the 8 performance cores active, as expected, the scores are a little lower at -7-9%, the 2 E-cores here represent a much smaller percentage of the total MT performance than on the M1.

Apple’s stark advantage in specific workloads here do make us ask the question how this translates into application and use-cases. We’ve never seen such a design before, so it’s not exactly clear where things would land, but I think Apple has been rather clear that their focus with these designs is catering to the content creation crowd, the power users who use the large productivity applications, be it in video editing, audio mastering, or code compiling. These are all areas where the microarchitectural characteristics of the M1 Pro/Max would shine and are likely vastly outperform any other system out there.

CPU ST Performance: Not Much Change from M1 GPU Performance: 2-4x For Productivity, Mixed Gaming
POST A COMMENT

487 Comments

View All Comments

  • JfromImaginstuff - Monday, October 25, 2021 - link

    Huh, nice Reply
  • Kangal - Monday, October 25, 2021 - link

    What isn't nice is gaming on macOS.
    We all know how bad emulation is, and whilst Apple seems to have pulled "magic" with their implementation of Metal/Rosetta2's hybrid-translation strong performance.... at the end of the day it isn't enough.

    The M1X is slightly slower than the RTX-3080, at least on-paper and in synthetic benchmarks. This is the sort of hardware that we've been denied for the past 3 years. Should be great. It isn't. When it comes to the actual Gaming Performance, the M1X is slightly slower than the RTX-3060. A massive downgrade.

    The silver lining is that developers will get excited, and we might see some AAA-ports over to the macOS system. Even if it's the top-100 games (non-exclusives), and if they get ported over natively, it should create a shock. We might see designers then developing games for PS5, XSX, OSX and Windows. And maybe SteamOS too. And in such a scenario, we can see native-coded games tapping into the proper M1X hardware, and show impressive performance.

    The same applies for professional programs for content creators.
    Reply
  • at_clucks - Monday, October 25, 2021 - link

    "The silver lining is that developers will get excited, and we might see some AAA-ports over to the macOS"

    I think that's their whole point. Make developers optimize for Mac knowing that gamers would very likely choose to have their performant gaming machine in a Mac format (light, cool, low power) rather than in a hot and heavy DTR format if they had the choice of natively optimized games.
    Reply
  • bernstein - Monday, October 25, 2021 - link

    we now have 3 primary gpu api‘s:
    - directx (xbox, windows)
    - vulkan (ps5, switch, steamos, android)
    - metal (macos, ios & derivates)
    Because they’re all low level & similar, most bigger engines support them all.

    There used to be two for pc, one for mobile and three for consoles. And vastly different ones at that.

    So it will come down to the addressable market and how fast apple evolves the api‘s. Historically windows, with its build once run two decades later has made it much much easier on devs.
    Reply
  • yetanotherhuman - Tuesday, October 26, 2021 - link

    "how fast apple evolves the api‘s"

    That'll be a very slow, given their history. Why they invented another API, I have no idea. Vulkan could easily be universal. It runs on Windows, which you didn't note, with great results.
    Reply
  • Dribble - Tuesday, October 26, 2021 - link

    Vulkan is too low level, it assumes nothing which means you have to right a ton of code to get to the level of Metal which assumes you have an apple device. If metal/dx are like writing in assembly language, for vulkan you start of with just machine code and have to write your own assembler first. Hence it's not really a great language to work with, if you were working with apple then metal is so much nicer. Reply
  • Gracemont - Wednesday, October 27, 2021 - link

    Vulkan is too low level? It’s literally comparable to DX12. Like bruh, if anything the Metal API is even more low level for Apple devices cuz of it being built specifically for Apple devices. Just like how the NVAPI for the Switch is the lowest level API for that system cuz it was specifically tailored for that system, not Vulkan. Reply
  • Ppietra - Wednesday, October 27, 2021 - link

    Gracemont, the Metal API was already being used with Intel and AMD GPUs, so not exactly a measure of "low level" Reply
  • NPPraxis - Tuesday, October 26, 2021 - link

    "Why they invented another API, I have no idea. Vulkan could easily be universal."

    You're misremembering the history. Metal predates Vulkan.

    Apple was basically stuck with OpenGL for a long time, which fell further and further behind as DirectX got lower level and faster. That made all of Apple's devices at a huge gaming handicap.

    Then Apple invented Metal for iOS in 2014 which gave them a huge performance rendering lead on mobile devices.

    They led the Mac languish for a couple years, not even updating the OpenGL version. Macs got worse and worse for games. In 2016, Vulcan came out. People speculated that Apple could adopt it.

    In 2017, Apple released Metal 2 which was included in the new MacOS.

    Basically, Apple had to pick between unifying MacOS (Metal) with iOS or with Linux gaming (Vulkan). Apple has gotten screwed over before by being reliant on open source third parties that fell further and further behind (OpenGL, web browsers before they helped build WebKit, etc) so it's kind of understandable that they went the Metal-on-MacOS direction since they had already built it for iOS.

    I still wish Apple would add support for it (Mac: Metal and Vulkan, Windows: DirectX and Vulkan, Linux: Vulkan only), because it would really help destroy any reason for developers to target DirectX first, but I understand that they really want to push devs to Metal to make porting to iOS easier.
    Reply
  • Eric S - Friday, October 29, 2021 - link

    Everyone has their own graphics stack- Microsoft, Sony, Apple, and Nintendo all have proprietary stacks. Vulcan wants to change that, but that doesn’t solve everything. Developers still need to optimize for differences in GPUs. Apple is looking for full vertical integration which helps to have their own stack. Reply

Log in

Don't have an account? Sign up now