Dominating Mobile Performance

Before we dig deeper into the x86 vs Apple Silicon debate, it would be useful to look into more detail how the A14 Firestorm cores have improved upon the A13 Lightning cores, as well as detail the power and power efficiency improvements of the new chip’s 5nm process node.

The process node is actually quite the wildcard in the comparisons here as the A14 is the first 5nm chipset on the market, closely followed by Huawei’s Kirin 9000 in the Mate 40 series. We happen to have both devices and chips in house for testing, and contrasting the Kirin 9000 (Cortex-A77 3.13GHz on N5) vs the Snapdragon 865+ (Cortex-A77 3.09GHz on N7P) we can somewhat deduct how much of an impact the process node has in terms of power and efficiency, translating those improvements to the A13 vs A14 comparison.

Starting off with SPECint2006, we don’t see anything very unusual about the A14 scores, save the great improvement in 456.hmmer. Actually, this wasn’t due to a microarchitectural jump, but rather due to new optimisations on the part of the new LLVM version in Xcode 12. It seems here that the compiler has employed a similar loop optimisation as found on GCC8 onwards. The A13 score actually had improved from 47.79 to 64.87, but I hadn’t run new numbers on the whole suite yet.

For the rest of the workloads, the A14 generally looks like a relatively linear progression from the A13 in terms of progression, accounting for the clock frequency increase from 2.66GHz to 3GHz. The overall IPC gains for the suite look to be around 5% which is a bit less than Apple’s prior generations, though with a larger than usual clock speed increase.

Power consumption for the new chip is actually in line, and sometimes even better than the A13, which means that workload energy efficiency this generation has seen a noticeable improvement even at the peak performance point.

Performance against the contemporary Android and Cortex-core powered SoCs looks to be quite lopsided in favour of Apple. The one thing that stands out the most are the memory-intensive, sparse memory characterised workloads such as 429.mcf and 471.omnetpp where the Apple design features well over twice the performance, even though all the chip is running similar mobile-grade LPDDR4X/LPDDR5 memory. In our microarchitectural investigations we’ve seen signs of “memory magic” on Apple’s designs, where we might believe they’re using some sort of pointer-chase prefetching mechanism.

In SPECfp, the increases of the A14 over the A13 are a little higher than the linear clock frequency increase, as we’re measuring an overall 10-11% IPC uplift here. This isn’t too surprising given the additional fourth FP/SIMD pipeline of the design, whereas the integer side of the core has remained relatively unchanged compared to the A13.

In the overall mobile comparison, we can see that the new A14 has made robust progress in terms of increasing performance over the A13. Compared to the competition, Apple is well ahead of the pack – we’ll have to wait for next year’s Cortex-X1 devices to see the gap narrow again.

What’s also very important to note here is that Apple has achieved this all whilst remaining flat, or even lowering the power consumption of the new chip, notably reducing energy consumption for the same workloads.

Looking at the Kirin 9000 vs the Snapdragon 865+, we’re seeing a 10% reduction in power at relatively similar performance. Both chips use the same CPU IP, only differing in their process node and implementations. It seems Apple’s A14 here has been able to achieve better figures than just the process node improvement, which is expected given that it’s a new microarchitecture design as well.

One further note is the data of the A14’s small efficiency cores. This generation we saw a large microarchitectural boost on the part of these new cores which are now seeing 35% better performance versus last year’s A13 efficiency cores – all while further reducing energy consumption. I don’t know how the small cores will come into play on Apple’s “Apple Silicon” Mac designs, but they’re certainly still very performant and extremely efficient compared to other current contemporary Arm designs.

Lastly, there’s the x86 vs Apple performance comparison. Usually for iPhone reviews I comment on this in this section of the article, but given today’s context and the goals Apple has made for Apple Silicon, let’s investigate that into a whole dedicated section…

Apple's Humongous CPU Microarchitecture From Mobile to Mac: What to Expect?
Comments Locked

644 Comments

View All Comments

  • Spunjji - Friday, November 13, 2020 - link

    @vais - Perhaps I should have clarified that my comment was indeed regarding the single-core results? But I sort-of assumed anybody reading it would have *read the article* and thus was aware of that context. 🤦‍♂️

    We won't get multi-core data until later, for sure, but your attempt to pretend that we therefore have *no idea* what's coming is mere sophistry. I'd advise you adjust your expectations accordingly, as you've already indicated a level of certainty in the outcome (mY i5 WiLl BeAt It) that you are simultaneously arguing nobody else is entitled to. That cognitive dissonance has been noted.
  • vais - Friday, November 13, 2020 - link

    I had missed Graviton, thanks for the link!

    It seems very interesting and is a much more fair comparison as all 3 CPUs have similar TDPs. AMD are still some way from Zen3 based Epyc CPUs, but even if they have better performance than Graviton 2, they still could be far behind in the performance/$.

    As for M1 all I'm saying is that comparing it to other laptop CPUs with similar TDP (of course higher too since it is more efficient) is one thing. But comparing it to the latest desktop CPUs is another story and reality might not reflect the synthetic benchmarks that well.
    If say M2 was positioned for the Mac Pro at 50-60-70W TDP, then comparing it to 5950X would make sense and it really could have better performance - that is all.
  • misan - Thursday, November 12, 2020 - link

    Dedicated silicon for compiling code? Or for doing scientific computation? Or for traversing linked lists? SPEC benchmark suite is extensively documented and the individual benchmark behavior is understood. This is all plain C code that targets the CPU.

    I understand that it might be hard to accept it, but the simple fact is that Apple has made a substantially better chip. They can do much more work per clock than anything else on general-purpose code, which allows them to be fast without needing very high clocks.
  • Spunjji - Thursday, November 12, 2020 - link

    @Coldfriction - the "dedicated silicon" you refer to played no part in any of the tests in this article.
  • Coldfriction - Thursday, November 12, 2020 - link

    You mean all of the memory on package with the CPU doesn't make a difference? That's the "dedicated silicon" sort of thing I'm talking about. How fast was the SNES CPU was 3.58 Mhz. It took a massively more powerful intel, amd, or cyrix chip to do what the SNES could do. MASSIVELY more powerful. What Apple is doing here is making a console PC. The performance isn't all derived from their CPU cores independent of the rest of the system. Everything is tightly integrated with no flexibility on the users end. The Cell architecture boasted similar stuff back in the day. Yes, Apple has a strong ARM CPU here, but it's the tight integration that makes it so strong, not the core itself. There's a reason the memory is in the package and non-upgradable. The functions tested in this article may drastically favor the cache system of the M1, but once you go outside of that, you lose a lot.

    It's ALWAYS been the case that custom built systems have outperformed generic computing devices.

    This article doesn't test very many things. It certainly doesn't test demanding workloads that saturate much of the systems capabilities.

    I owned Amigas back in the day. I had access to a variety of computing devices. The IBM compatible PCs were the ugliest slowest machines around, but they succeeded where everyone else's prettier systems failed. Why? Compatibility and ability to swap the software and hardware from different vendors around. They became cheap and maintained by a variety of people due to that. The Apple Lisa I had as a kid blew away my first 386, but then Apple still nearly went bankrupt a decade later.

    Custom build design is great for a very short term solution, but Apple's leash of leather is being swapped out for a leash of steel chain with this move.
  • daveedvdv - Thursday, November 12, 2020 - link

    > You mean all of the memory on package with the CPU doesn't make a difference? That's the "dedicated silicon" sort of thing I'm talking about.

    That's quite the stretch. What exactly is that memory dedicated to? It's not like other manufacturers cannot package main memory with their chips either. It sounds like you're grasping at straws because you don't like the news.
  • Spunjji - Friday, November 13, 2020 - link

    @daveedvdv - I think you nailed it there. There's a lot of that going on in these comments.

    Hell, I don't *like* the news. I'm a Windows guy and I don't buy Apple devices; if it turns out they'll have exclusive access to some of the best-performing mobile silicon on the planet it'll be kind of a bitch. But it is what it is.
  • daveedvdv - Friday, November 13, 2020 - link

    @Spunjji:
    Thanks. And, FWIW, while I'm an Apple eco-system person, I'm under no illusion that others will be able to match the achievement relatively soon. There are lots of players in the ARM ISA world, and they each have some serious talent working for them.
  • Spunjji - Friday, November 13, 2020 - link

    @coldfriction - On-package memory isn't "custom silicon". Either you're talking about "dedicated silicon" - i.e. the accelerators that Apple's chip has and others don't - or you're talking about shared caches and memory interfaces that every other chip out there has / can have.

    The SNES CPU thing is a weird flex - everybody knows that emulation requires more resources than the original system, but the SNES CPU wasn't remarkable in any way. A better example would have been the Amiga's video controllers, but then you'd run straight into what I pointed out, which is that such a comparison is irrelevant to what was actually tested in this article - the CPU architecture (including caches and, in some tests, memory performance).

    You're right that it doesn't test demanding workloads or the entire system - that wasn't the remit of the article. We'll see that stuff when they have actual M1-based systems to test; running those tests on A14 in an iPhone would be worse than useless for estimating M1 performance.
  • magreen - Sunday, November 15, 2020 - link

    Keep it up, Spunjji! Thanks for injecting rational discourse into these comments. I have no horse in this race, but I recognize measured statements and rational arguments based on evidence when I see them.

Log in

Don't have an account? Sign up now