Dominating Mobile Performance

Before we dig deeper into the x86 vs Apple Silicon debate, it would be useful to look into more detail how the A14 Firestorm cores have improved upon the A13 Lightning cores, as well as detail the power and power efficiency improvements of the new chip’s 5nm process node.

The process node is actually quite the wildcard in the comparisons here as the A14 is the first 5nm chipset on the market, closely followed by Huawei’s Kirin 9000 in the Mate 40 series. We happen to have both devices and chips in house for testing, and contrasting the Kirin 9000 (Cortex-A77 3.13GHz on N5) vs the Snapdragon 865+ (Cortex-A77 3.09GHz on N7P) we can somewhat deduct how much of an impact the process node has in terms of power and efficiency, translating those improvements to the A13 vs A14 comparison.

Starting off with SPECint2006, we don’t see anything very unusual about the A14 scores, save the great improvement in 456.hmmer. Actually, this wasn’t due to a microarchitectural jump, but rather due to new optimisations on the part of the new LLVM version in Xcode 12. It seems here that the compiler has employed a similar loop optimisation as found on GCC8 onwards. The A13 score actually had improved from 47.79 to 64.87, but I hadn’t run new numbers on the whole suite yet.

For the rest of the workloads, the A14 generally looks like a relatively linear progression from the A13 in terms of progression, accounting for the clock frequency increase from 2.66GHz to 3GHz. The overall IPC gains for the suite look to be around 5% which is a bit less than Apple’s prior generations, though with a larger than usual clock speed increase.

Power consumption for the new chip is actually in line, and sometimes even better than the A13, which means that workload energy efficiency this generation has seen a noticeable improvement even at the peak performance point.

Performance against the contemporary Android and Cortex-core powered SoCs looks to be quite lopsided in favour of Apple. The one thing that stands out the most are the memory-intensive, sparse memory characterised workloads such as 429.mcf and 471.omnetpp where the Apple design features well over twice the performance, even though all the chip is running similar mobile-grade LPDDR4X/LPDDR5 memory. In our microarchitectural investigations we’ve seen signs of “memory magic” on Apple’s designs, where we might believe they’re using some sort of pointer-chase prefetching mechanism.

In SPECfp, the increases of the A14 over the A13 are a little higher than the linear clock frequency increase, as we’re measuring an overall 10-11% IPC uplift here. This isn’t too surprising given the additional fourth FP/SIMD pipeline of the design, whereas the integer side of the core has remained relatively unchanged compared to the A13.

In the overall mobile comparison, we can see that the new A14 has made robust progress in terms of increasing performance over the A13. Compared to the competition, Apple is well ahead of the pack – we’ll have to wait for next year’s Cortex-X1 devices to see the gap narrow again.

What’s also very important to note here is that Apple has achieved this all whilst remaining flat, or even lowering the power consumption of the new chip, notably reducing energy consumption for the same workloads.

Looking at the Kirin 9000 vs the Snapdragon 865+, we’re seeing a 10% reduction in power at relatively similar performance. Both chips use the same CPU IP, only differing in their process node and implementations. It seems Apple’s A14 here has been able to achieve better figures than just the process node improvement, which is expected given that it’s a new microarchitecture design as well.

One further note is the data of the A14’s small efficiency cores. This generation we saw a large microarchitectural boost on the part of these new cores which are now seeing 35% better performance versus last year’s A13 efficiency cores – all while further reducing energy consumption. I don’t know how the small cores will come into play on Apple’s “Apple Silicon” Mac designs, but they’re certainly still very performant and extremely efficient compared to other current contemporary Arm designs.

Lastly, there’s the x86 vs Apple performance comparison. Usually for iPhone reviews I comment on this in this section of the article, but given today’s context and the goals Apple has made for Apple Silicon, let’s investigate that into a whole dedicated section…

Apple's Humongous CPU Microarchitecture From Mobile to Mac: What to Expect?
Comments Locked

644 Comments

View All Comments

  • vais - Thursday, November 12, 2020 - link

    A great article up until the benchmarking and comparing to x86 part. Then it turned into something reeking of paid promotion piece.
    Below are some quotes I want to focus the discussion on:

    "x86 CPUs today still only feature a 4-wide decoder designs (Intel is 1+4) that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions"
    - This implies wider decoder is always a better thing, even when comparing not only different architectures, but architectures using different instruction sets. How was this conclusion reached?

    "On the ARM side of things, Samsung’s designs had been 6-wide from the M3 onwards, whilst Arm’s own Cortex cores had been steadily going wider with each generation, currently 4-wide in currently available silicon"
    - So Samsung’s Exynos is 6-wide - does that make it better than Snapdragon (which should be 4-wide)? Even better, does anyone in their right mind think it performs close to any modern x86 CPU, let alone an enthusiast grade desktop chip?

    "To not surprise, this is also again deeper than any other microarchitecture on the market. Interesting comparisons are AMD’s Zen3 at 44/64 loads & stores, and Intel’s Sunny Cove at 128/72. "
    - Again this assumes higher loads & stores is automagically better. Isn't Zen3 better than Intel counterparts accross the board? Despite the signifficantly worse loads & stores.

    "AMD also wouldn’t be looking good if not for the recently released Zen3 design."
    - What is the logic here? The competition is lucky they released a better product before Apple? How unfair that Apple have to compete with the latest (Zen3) instead of the previous generation - then their amazing architecture would have really shone bright!

    "The fact that Apple is able to achieve this in a total device power consumption of 5W including the SoC, DRAM, and regulators, versus +21W (1185G7) and 49W (5950X) package power figures, without DRAM or regulation, is absolutely mind-blowing."
    - I am specifically interested where the 49W for 5950X come from. AMD's specs list the TDP at 105W, so where is this draw of only 49W, for an enthusiast desktop processor, coming from?
  • thunng8 - Thursday, November 12, 2020 - link

    It is obvious that the power figure comes from running the spec benchmark. Spec is single threaded, so the Ryzen package is using 49w when using turbo boosting to 5.0ghz on the single core to achieve the score on the chart while the a14 using the exact same criteria uses 5w.
  • vais - Thursday, November 12, 2020 - link

    How is it obvious? Such things as "this benchmark is single threaded" must be stated clearly, not rely on everyone looking at the benchmarks knowing it. Same about the power.
  • thunng8 - Friday, November 13, 2020 - link

    The fact that it is a single threaded is ni the text of the review.
  • name99 - Friday, November 13, 2020 - link

    If you don't know the nature of SPEC benchmarks, then perhaps you should be using your ears/eye more and your mouth less? You don't barge into a conversation you admit to knowing nothing about and start telling all the gathered experts that they are wrong!
  • mandirabl - Thursday, November 12, 2020 - link

    Pretty cool, I came from this video https://www.youtube.com/watch?v=xUkDku_Qt5c and the analogy is awesome.
  • atomek - Thursday, November 12, 2020 - link

    If Apple plays it well, this is the dawn of x86 era. They'll just need to open their M1 for OEMs/builders, so people could actually make gaming desktops on their platform. And that would be end of AMD/Intel (or they will quickly (2-5 years) release ARM CPU which would be very problematic for them). I wouldn't mind to moving away from x86, only if Apple will open their ARM platform to enthusiasts/gamers, and don't lock it to MacOS.
  • dodoei - Thursday, November 12, 2020 - link

    The reason for the great performance could very well be that it’s locked to the MacOS
  • Zerrohero - Friday, November 13, 2020 - link

    Apple has spent billions to develop their own chips to differentiate from the others and to achieve iPad/iPhone like vertical integration with their own software.

    Why would they sell them to anyone?

    It seems that lots of people do not understand why Apple is doing this: to build better *Apple* products.

    There is nothing wrong with that, even if PC folks refuse to accept it. Every company strives to do better stuff.
  • corinthos - Thursday, November 12, 2020 - link

    Cheers to all of those who purchased Threadrippers and hi-end Intel Extreme processors plus the latest 3080/3090 gpus for video editing, only to be crushed by M1 with iGPU due to its more current and superior hardware decoders.

Log in

Don't have an account? Sign up now