Dominating Mobile Performance

Before we dig deeper into the x86 vs Apple Silicon debate, it would be useful to look into more detail how the A14 Firestorm cores have improved upon the A13 Lightning cores, as well as detail the power and power efficiency improvements of the new chip’s 5nm process node.

The process node is actually quite the wildcard in the comparisons here as the A14 is the first 5nm chipset on the market, closely followed by Huawei’s Kirin 9000 in the Mate 40 series. We happen to have both devices and chips in house for testing, and contrasting the Kirin 9000 (Cortex-A77 3.13GHz on N5) vs the Snapdragon 865+ (Cortex-A77 3.09GHz on N7P) we can somewhat deduct how much of an impact the process node has in terms of power and efficiency, translating those improvements to the A13 vs A14 comparison.

Starting off with SPECint2006, we don’t see anything very unusual about the A14 scores, save the great improvement in 456.hmmer. Actually, this wasn’t due to a microarchitectural jump, but rather due to new optimisations on the part of the new LLVM version in Xcode 12. It seems here that the compiler has employed a similar loop optimisation as found on GCC8 onwards. The A13 score actually had improved from 47.79 to 64.87, but I hadn’t run new numbers on the whole suite yet.

For the rest of the workloads, the A14 generally looks like a relatively linear progression from the A13 in terms of progression, accounting for the clock frequency increase from 2.66GHz to 3GHz. The overall IPC gains for the suite look to be around 5% which is a bit less than Apple’s prior generations, though with a larger than usual clock speed increase.

Power consumption for the new chip is actually in line, and sometimes even better than the A13, which means that workload energy efficiency this generation has seen a noticeable improvement even at the peak performance point.

Performance against the contemporary Android and Cortex-core powered SoCs looks to be quite lopsided in favour of Apple. The one thing that stands out the most are the memory-intensive, sparse memory characterised workloads such as 429.mcf and 471.omnetpp where the Apple design features well over twice the performance, even though all the chip is running similar mobile-grade LPDDR4X/LPDDR5 memory. In our microarchitectural investigations we’ve seen signs of “memory magic” on Apple’s designs, where we might believe they’re using some sort of pointer-chase prefetching mechanism.

In SPECfp, the increases of the A14 over the A13 are a little higher than the linear clock frequency increase, as we’re measuring an overall 10-11% IPC uplift here. This isn’t too surprising given the additional fourth FP/SIMD pipeline of the design, whereas the integer side of the core has remained relatively unchanged compared to the A13.

In the overall mobile comparison, we can see that the new A14 has made robust progress in terms of increasing performance over the A13. Compared to the competition, Apple is well ahead of the pack – we’ll have to wait for next year’s Cortex-X1 devices to see the gap narrow again.

What’s also very important to note here is that Apple has achieved this all whilst remaining flat, or even lowering the power consumption of the new chip, notably reducing energy consumption for the same workloads.

Looking at the Kirin 9000 vs the Snapdragon 865+, we’re seeing a 10% reduction in power at relatively similar performance. Both chips use the same CPU IP, only differing in their process node and implementations. It seems Apple’s A14 here has been able to achieve better figures than just the process node improvement, which is expected given that it’s a new microarchitecture design as well.

One further note is the data of the A14’s small efficiency cores. This generation we saw a large microarchitectural boost on the part of these new cores which are now seeing 35% better performance versus last year’s A13 efficiency cores – all while further reducing energy consumption. I don’t know how the small cores will come into play on Apple’s “Apple Silicon” Mac designs, but they’re certainly still very performant and extremely efficient compared to other current contemporary Arm designs.

Lastly, there’s the x86 vs Apple performance comparison. Usually for iPhone reviews I comment on this in this section of the article, but given today’s context and the goals Apple has made for Apple Silicon, let’s investigate that into a whole dedicated section…

Apple's Humongous CPU Microarchitecture From Mobile to Mac: What to Expect?
Comments Locked

644 Comments

View All Comments

  • grayson_carr - Wednesday, November 11, 2020 - link

    The article explains that a wider CPU design would be much more difficult for x86. You should probably read it.
  • vais - Thursday, November 12, 2020 - link

    Wider instruction decoder doesn't always mean better. And remember you are comparing two entirely different instruction sets. As mentioned in the article x86 has variable length instructions, so one instruction can be decoded into multiple micro operations.

    Anyway, take a look at this quote from the article specifically:

    "On the ARM side of things, Samsung’s designs had been 6-wide from the M3 onwards, whilst Arm’s own Cortex cores had been steadily going wider with each generation, currently 4-wide in currently available silicon"

    So Samsung's core is 6 wide, while Cortex are 4 wide. Isn't snapdragon using Cortex cores, assuming "only" 4 wide - how the hell it outperforms the "superior" architecture?
    Apple's A14 is 30-40% faster than Exynos and Snapdragon (let's say). And how do Exynos and Snapdragon compare to desktop x86 CPUs? They are lightyears behind and this is normal.
  • rtharston - Thursday, November 12, 2020 - link

    Hear hear. Armv8 (ARM64) is a newer, cleaner ISA, with different design decisions, including purposefully breaking compatibility with ARM32 in many ways* (hence why iOS and macOS dropped 32 bit app support a while back), which means Apple has an easier time making some things bigger, because they are simpler than in x86.

    *Yes, you can run ARM32 bit apps on ARM64 chips, if they've included the necessary support, but it is separate hardware and commands. Apple decided to rip out that extra hardware a few generations of A chips back, freeing up space and complexity for other things. x86 hardware still supports running code meant for the 8086 which released nearly 40 years ago, and that adds a lot of complexity.
  • misan - Thursday, November 12, 2020 - link

    I really don't know what more evidence you need. You have been shown various common CPU algorithms running with comparable performance on Apple's phone chip and AMD/Intel desktop chips. It's literally there in the article you are commenting. If you don't find this evidence convincing, how can you be sure that Zen 3 is a fast CPU? That statements based on exactly the same kind of evidence.
  • vais - Friday, November 13, 2020 - link

    @misan - the problem is this "evidence" is circumstantial at best and does not reflect actual performance for a normally running A14 vs a normally running (all cores and at full power) AMD 5950X.
    It does show the architecture is good, but the claims about a 5W chip somehow slamming in the ground a 105W, 16 core Zen3 CPU are nothing but hilarious. Otherwise Crysis would have long been ported for the magically powerful iPhone...
  • Spunjji - Thursday, November 12, 2020 - link

    @Coldfriction - You seem to be getting confused. Custom silicon does indeed punch above its weight, but none of the "small scale tests" done in this article will take advantage of any of it.

    SPEC results don't translate readily to application performance, but they do serve as excellent ways to compare CPU architectures; whichever way you slice it this architecture is demonstrably impressive.
  • Mgradon - Thursday, November 12, 2020 - link

    Agree fully - for MacBook Air it might be the right think to do. If the new OS will work well. For MacBook Pro i am not sure, would like to see what it can and what it cannot do.
  • daveedvdv - Thursday, November 12, 2020 - link

    > No doubt what they have is efficient. But their claims are out of this world high. If it was simply a matter of making a wider CPU design, AMD or Intel would have done exactly that years ago. If it were simply a matter of making a larger L2 cache, AMD or Intel would have done that years ago.

    Of course. So that's not what dominates Apple's lead. For example, a significant part of Apple's advantage is power management: PA Semi had critical patents in that area and I believe they're still in effect.
  • techconc - Wednesday, November 11, 2020 - link

    "...there's essentially no way that Apple made a general computing CPU that is faster than Intel or AMD..."

    That's exactly what they've done. They've also done so with a far more power efficient solution. It's funny how people can deny reality even while seeing the results in articles like this. Take the blinders off and see reality.
  • Coldfriction - Wednesday, November 11, 2020 - link

    What results? The benchmarks here are extremely limited in scope. That's not "generic computing".

Log in

Don't have an account? Sign up now