Conclusion & End Remarks

Today’s investigation into the new A15 is just scratching the tip of the iceberg of what Apple has to offer in the new generation iPhone 13 series devices. As we’re still working on the full device review, we got a good glimpse of what the new silicon is able to achieve, and what to expect from the new devices in terms of performance.

On the CPU side of things, Apple’s initial vague presentation of the new A15 improvements could either have resulted in disappointment, or simply a more hidden shift towards power efficiency rather than pure performance. In our extensive testing, we’re elated to see that it was actually mostly an efficiency focus this year, with the new performance cores showcasing adequate performance improvements, while at the same time reducing power consumption, as well as significantly improving energy efficiency.

The efficiency cores of the A15 have also seen massive gains, this time around with Apple mostly investing them back into performance, with the new cores showcasing +23-28% absolute performance improvements, something that isn’t easily identified by popular benchmarking. This large performance increase further helps the SoC improve energy efficiency, and our initial battery life figures of the new 13 series showcase that the chip has a very large part into the vastly longer longevity of the new devices.

In the GPU side, Apple’s peak performance improvements are off the charts, with a combination of a new larger GPU, new architecture, and the larger system cache that helps both performance as well as efficiency.

Apple’s iPhone component design seems to be limiting the SoC from achieving even better results, especially the newer Pro models, however even with that being said and done, Apple remains far above the competition in terms of performance and efficiency.

Overall, while the A15 isn’t the brute force iteration we’ve become used to from Apple in recent years, it very much comes with substantial generational gains that allow it to be a notably better SoC than the A14. In the end, it seems like Apple’s SoC team has executed well after all.

GPU Performance - Great GPU, So-So Thermals Designs
Comments Locked

204 Comments

View All Comments

  • name99 - Wednesday, October 6, 2021 - link

    v8 based.
    Essentially ARMv8.5 minus BTI.
  • name99 - Wednesday, October 6, 2021 - link

    https://community.arm.com/developer/ip-products/pr...
    lists what's new in 8.5

    Wiki still says A15 is essentially 8.4, but A14 is generally described as above, eg
    https://twitter.com/never_released/status/13610248...

    On the other hand, no-one has seen evidence of MTE usage in iOS (either iOS14 or 15). Which may reflect non-presence, or that compiler support isn't yet there?

    Mostly 8.5 is technical stuff that would be hard to test.
    One possibility would be the random number instructions. Maybe we'll get clarification of these over the next month?
  • name99 - Wednesday, October 6, 2021 - link

    We can see a little more detail here:
    https://github.com/llvm/llvm-project/blob/main/llv...

    We see that, among other things, A14 added
    - cache clean to deep persistence (basically instructions to support non-volatile-ram...)
    - security stuff to invalidate predictions
    - speculation barrier
    - and a few other (uninteresting to me) 8.5 security features

    Interestingly it also claims, on the performance side, to have added to A14 over A13, fusion of literal generation instructions, something I did not see when I tried to test for it -- presumably you have to get the order of the literal instructions correct, and I used the incorrect order in my quick tests?
    Along with claims of a number of other instruction fusion patterns that I want to test at some point!

    This was added in late Jan 2021, which suggests we won't see the equivalent for A15 until beginning of next year :-(
  • OreoCookie - Thursday, October 7, 2021 - link

    My understanding is that ARM v9 essentially mandates parts of the ARM v8.x ISA that were optional and introduces SVE2. If I read your posts correctly (thanks for doing the checking, much appreciated), then it seems that Apple has implemented the “first half” of ARM v9 anyway and the only notable omission is SVE2.

    SVE2 sure sounds like a nice-to-have, but like you wrote the compiler will play a crucial role here. I reckon a proper implementation will eat up quite a bit of die area, and if you are not going to use it, what is the point?
  • RoyceTrentRolls - Wednesday, October 6, 2021 - link

    Hear me out:

    A13 - 14 cycles 8MB 2 cores/big cluster
    M1 - 16 cycles 12MB 4 cores
    M1X/M2? - 18 cycles 16MB 8 cores

    🤪
  • name99 - Wednesday, October 6, 2021 - link

    It's a reasonable hypothesis BUT a big problem with cores sharing an L2 is that they all have to sit on the same frequency plane. (They can have different voltages, which matters if one of them is eg engaged in heavy NEON work, while another is doing light integer work; but they must share frequencies.)

    This may be considered less of a problem for the target machines?
    Alternatively you just accept that life ain't perfect and provide two clusters of 4core+(?12?16MB L2)?
  • OreoCookie - Thursday, October 7, 2021 - link

    With 2+4 core SoCs, I don’t think this is that big of an issue, though. Of course, it gets trickier once you scale up to more than 8 performance cores, but we will have to see what Apple’s solution is here anyway (8-core chiplets perhaps?).

    Overall, though, it seems that massively increasing caches is a common trend, AMD has been going in that direction (including their Zen 3 with additional cache slated for later this year) and IBM will be using massive caches on their new CPUs that will power their Z15 mainframes. The drawbacks are pretty clear, but the potential upside is, too.
  • mixmaxmix - Thursday, October 7, 2021 - link

    battery life test result please
  • mixmaxmix - Saturday, October 9, 2021 - link

    please
  • Raqia - Thursday, October 7, 2021 - link

    Die shot now available:

    https://semianalysis.com/apple-a15-die-shot-and-an...

    More caches all around, and the GPU doubles the number of FP32 ALUs without adding much die area.

Log in

Don't have an account? Sign up now