Dominating Mobile Performance

Before we dig deeper into the x86 vs Apple Silicon debate, it would be useful to look into more detail how the A14 Firestorm cores have improved upon the A13 Lightning cores, as well as detail the power and power efficiency improvements of the new chip’s 5nm process node.

The process node is actually quite the wildcard in the comparisons here as the A14 is the first 5nm chipset on the market, closely followed by Huawei’s Kirin 9000 in the Mate 40 series. We happen to have both devices and chips in house for testing, and contrasting the Kirin 9000 (Cortex-A77 3.13GHz on N5) vs the Snapdragon 865+ (Cortex-A77 3.09GHz on N7P) we can somewhat deduct how much of an impact the process node has in terms of power and efficiency, translating those improvements to the A13 vs A14 comparison.

Starting off with SPECint2006, we don’t see anything very unusual about the A14 scores, save the great improvement in 456.hmmer. Actually, this wasn’t due to a microarchitectural jump, but rather due to new optimisations on the part of the new LLVM version in Xcode 12. It seems here that the compiler has employed a similar loop optimisation as found on GCC8 onwards. The A13 score actually had improved from 47.79 to 64.87, but I hadn’t run new numbers on the whole suite yet.

For the rest of the workloads, the A14 generally looks like a relatively linear progression from the A13 in terms of progression, accounting for the clock frequency increase from 2.66GHz to 3GHz. The overall IPC gains for the suite look to be around 5% which is a bit less than Apple’s prior generations, though with a larger than usual clock speed increase.

Power consumption for the new chip is actually in line, and sometimes even better than the A13, which means that workload energy efficiency this generation has seen a noticeable improvement even at the peak performance point.

Performance against the contemporary Android and Cortex-core powered SoCs looks to be quite lopsided in favour of Apple. The one thing that stands out the most are the memory-intensive, sparse memory characterised workloads such as 429.mcf and 471.omnetpp where the Apple design features well over twice the performance, even though all the chip is running similar mobile-grade LPDDR4X/LPDDR5 memory. In our microarchitectural investigations we’ve seen signs of “memory magic” on Apple’s designs, where we might believe they’re using some sort of pointer-chase prefetching mechanism.

In SPECfp, the increases of the A14 over the A13 are a little higher than the linear clock frequency increase, as we’re measuring an overall 10-11% IPC uplift here. This isn’t too surprising given the additional fourth FP/SIMD pipeline of the design, whereas the integer side of the core has remained relatively unchanged compared to the A13.

In the overall mobile comparison, we can see that the new A14 has made robust progress in terms of increasing performance over the A13. Compared to the competition, Apple is well ahead of the pack – we’ll have to wait for next year’s Cortex-X1 devices to see the gap narrow again.

What’s also very important to note here is that Apple has achieved this all whilst remaining flat, or even lowering the power consumption of the new chip, notably reducing energy consumption for the same workloads.

Looking at the Kirin 9000 vs the Snapdragon 865+, we’re seeing a 10% reduction in power at relatively similar performance. Both chips use the same CPU IP, only differing in their process node and implementations. It seems Apple’s A14 here has been able to achieve better figures than just the process node improvement, which is expected given that it’s a new microarchitecture design as well.

One further note is the data of the A14’s small efficiency cores. This generation we saw a large microarchitectural boost on the part of these new cores which are now seeing 35% better performance versus last year’s A13 efficiency cores – all while further reducing energy consumption. I don’t know how the small cores will come into play on Apple’s “Apple Silicon” Mac designs, but they’re certainly still very performant and extremely efficient compared to other current contemporary Arm designs.

Lastly, there’s the x86 vs Apple performance comparison. Usually for iPhone reviews I comment on this in this section of the article, but given today’s context and the goals Apple has made for Apple Silicon, let’s investigate that into a whole dedicated section…

Apple's Humongous CPU Microarchitecture From Mobile to Mac: What to Expect?
Comments Locked

644 Comments

View All Comments

  • rtharston - Thursday, November 12, 2020 - link

    These benchmarks are all CPU only (with some memory bandwidth too, since the CPU can't do anything without memory...). All these figures are CPU only. Yes, Apple managed to make a CPU that is as fast or faster than Intel and AMD. Just read the article and you'll see how. They have more ALUs. They have more registers (thanks to ARM64). They have larger (and faster) caches across the board. All that adds up to a higher performance CPU.

    You are right about the dedicated silicon being better at other things though, so now imagine how much more performance Apple's devices will have when using these fast CPUs *and* the dedicated silicon to do other things.
  • daveedvdv - Thursday, November 12, 2020 - link

    > My opinion is that using dedicated silicon for a specific task and not generic CPU computing is where almost ALL of the improved performance comes from.

    No. Neither SPEC nor GeekBench would take advantage of that.

    Furthermore, the applications that Apple uses to boast about the CPU (not GPU) performance are thing like Clang, Ninja, and CMake, which wouldn't benefit from that either.
  • millfi - Wednesday, June 23, 2021 - link

    Dedicated silicon is very fast and efficient when run specific Tasks with a high degree of parallelism, such as ML. But this cannot explain that M1 chips high IPC recorded in spec int. Because this benchmark runs general tasks which are written general CPU Instructions. If Apple could offload such a task to a dedicated circuit, that would be magic.
  • melgross - Wednesday, November 11, 2020 - link

    The M1 isn’t that much of an advance.
  • BlackHat - Tuesday, November 10, 2020 - link

    Guys, the main reason why I loved your website is because you dig in marketing footnotes, I don't know if you already read it (Aparrently no) but they are out and Apple claims that "the most powerful chip or the most power-efficient" is against their own 2018 MacBook, an 14nm i7 SkyLake with LPDDR3, no even against their last Ice Lake model, let alone Renoir (yes I know Apple doesn't have ryzen products) I don't know know if I missing something but I think that you ARM destroying x86 isn't going too far? Yes is power efficiency but it is that big difference to ignore all the performance lost?
    Greetings.
  • BlackHat - Tuesday, November 10, 2020 - link

    And the single core claims are based in single core peak performance in "leadership industry benchmarks" whatever that means and an combination of JavaScript test and Speedometer (this last one I heard from you that was close to Apple) so, anyways we will wait for bench.
  • Kilnk - Tuesday, November 10, 2020 - link

    https://browser.geekbench.com/processor-benchmarks
    https://browser.geekbench.com/ios-benchmarks#
    https://www.cpu-monkey.com/en/compare_cpu-apple_a1...

    These are the benchmarks they are talking about.
    Single core is faster than anything else in the consumer market. Yes. ANYTHING else.
    Multicore lands just above the 9700KF.
  • BlackHat - Tuesday, November 10, 2020 - link

    The think why people don't thrust Geekbench is because even they confirmed that old Geekbench benchmarks (basically 4 and older) were inaccurate due its dependency of bigger caches, meaning that CPU with bigger caches could beat other CPUs in short workloads (something that Geekbench those) but I can be wrong.
  • name99 - Tuesday, November 10, 2020 - link

    You mean Apple "cheat" by adding to their CPUs the pieces that make CPU's run faster, like a larger cache?
    OMG, say it isn't so!

    The pretzels people twist themselves into when they don't want to face reality...

    Just as a guide to the future, look to what really impresses those "skilled in the art" about this CPU. It's explicitly NOT the cache sizes; those are nice but even more impressive are the LSQ sizes, the MLP numbers (not covered here but in an earlier AnandTech piece) and the spec numbers for mcf and omnetpp.
    Understand what those numbers mean and why they are impressive and you'll be competent to judge future CPUs.
  • BlackHat - Tuesday, November 10, 2020 - link

    Talking about twist and you twist my comment, what Geekbench maker themselves said is due their bechmarks being of short run, CPUs with big caches show a big margin (no matter if Apple or other maker, in fact, they show how Samsung Exynos mongoose took advantage of this), for long workload the benchmark was useless.

Log in

Don't have an account? Sign up now