From Mobile to Mac: What to Expect?

To date, our performance comparisons for Apple’s chipsets have always been in the context of iPhone reviews, with the juxtaposition to x86 designs being a rather small footnote within the context of the articles. Today’s Apple Silicon launch event completely changes the narrative of what we portray in terms of performance, setting aside the typical apples vs oranges comparisons people usually argument with.

We currently do not have Apple Silicon devices and likely won’t get our hands on them for another few weeks, but we do have the A14, and expect the new Mac chips to be strongly based on the microarchitecture we’re seeing employed in the iPhone designs. Of course, we’re still comparing a phone chip versus a high-end laptop and even a high-end desktop chip, but given the performance numbers, that’s also exactly the point we’re trying to make here, setting the stage as the bare minimum of what Apple could achieve with their new Apple Silicon Mac chips.

SPECint2006 Speed Estimated Scores

The performance numbers of the A14 on this chart is relatively mind-boggling. If I were to release this data with the label of the A14 hidden, one would guess that the data-points came from some other x86 SKU from either AMD or Intel. The fact that the A14 currently competes with the very best top-performance designs that the x86 vendors have on the market today is just an astonishing feat.

Looking into the detailed scores, what again amazes me is the fact that the A14 not only keeps up, but actually beats both these competitors in memory-latency sensitive workloads such as 429.mcf and 471.omnetpp, even though they either have the same memory (i7-1185G7 with LPDDR4X-4266), or desktop-grade memory (5950X with DDR-3200).

Again, disregard the 456.hmmer score advantage of the A14, that’s majorly due to compiler discrepancies, subtract 33% for a more apt comparison figure.

SPECfp2006(C/C++) Speed Estimated Scores

Even in SPECfp which is even more dominated by memory heavy workloads, the A14 not only keeps up, but generally beats the Intel CPU design more often than not. AMD also wouldn’t be looking good if not for the recently released Zen3 design.

SPEC2006 Speed Estimated Total

In the overall SPEC2006 chart, the A14 is performing absolutely fantastic, taking the lead in absolute performance only falling short of AMD’s recent Ryzen 5000 series.

The fact that Apple is able to achieve this in a total device power consumption of 5W including the SoC, DRAM, and regulators, versus +21W (1185G7) and 49W (5950X) package power figures, without DRAM or regulation, is absolutely mind-blowing.

GeekBench 5 - Single Threaded

There’s been a lot of criticism about more common benchmark suites such as GeekBench, but frankly I've found these concerns or arguments to be quite unfounded. The only factual differences between workloads in SPEC and workloads in GB5 is that the latter has less outlier tests which are memory-heavy, meaning it’s more of a CPU benchmark whereas SPEC has more tendency towards CPU+DRAM.

The fact that Apple does well in both workloads is evidence that they have an extremely well-balanced microarchitecture, and that Apple Silicon will be able to scale up to “desktop workloads” in terms of performance without much issue.

Where the Performance Trajectory Finally Intersects

During the release of the A7, people were pretty dismissive of the fact that Apple had called their microarchitecture a desktop-class design. People were also very dismissive of us calling the A11 and A12 reaching near desktop level performance figures a few years back, and today marks an important moment in time for the industry as Apple’s A14 now clearly is able to showcase performance that’s beyond the best that Intel can offer. It’s been a performance trajectory that’s been steadily executing and progressing for years:

Whilst in the past 5 years Intel has managed to increase their best single-thread performance by about 28%, Apple has managed to improve their designs by 198%, or 2.98x (let’s call it 3x) the performance of the Apple A9 of late 2015.

Apple’s performance trajectory and unquestioned execution over these years is what has made Apple Silicon a reality today. Anybody looking at the absurdness of that graph will realise that there simply was no other choice but for Apple to ditch Intel and x86 in favour of their own in-house microarchitecture – staying par for the course would have meant stagnation and worse consumer products.

Today’s announcements only covered Apple’s laptop-class Apple Silicon, whilst we don’t know the details at time of writing as to what Apple will be presenting, Apple’s enormous power efficiency advantage means that the new chip will be able to offer either vastly increased battery life, and/or, vastly increased performance, compared to the current Intel MacBook line-up.

Apple has claimed that they will completely transition their whole consumer line-up to Apple Silicon within two years, which is an indicator that we’ll be seeing a high-TDP many-core design to power a future Mac Pro. If the company is able to continue on their current performance trajectory, it will look extremely impressive.

Dominating Mobile Performance Apple Shooting for the Stars: x86 Incumbents Beware
Comments Locked

644 Comments

View All Comments

  • hecksagon - Tuesday, November 10, 2020 - link

    There is also the issue of the benchmarks not being long enough to cause any significant throttling. This is the reason Apple mobile devices are so strong in this benchmark. Their CPUs provide very strong peak performance that slows down as the device gets heat soaked. That's why it looks like an iPhone can compete with a i7 laptop according to this benchmark.
  • misan - Wednesday, November 11, 2020 - link

    Apple delivers performance comparable to that of best 5.0 ghz x86 chips while running at 3 ghz and drawing under 5 watts. You argument does not make any sense logically. Ys, there will be throttling — but at the same cooling performance and consuming the same power Apple chips will always be faster. In fact, their lead on x86 chips will increase when the CPUs are throttled, since Intel will need to drop the clocks significantly — Apple doesn't.
  • hecksagon - Tuesday, November 10, 2020 - link

    No he is saying that Geekbench weight on cache bound workloads to no represent reality.
  • techconc - Wednesday, November 11, 2020 - link

    GB5 scores are inline with Spec results, so there is no merit to the claim that they don't match reality.
  • chlamchowder - Wednesday, November 11, 2020 - link

    The large lsq/other ooo resource queues and high MLP numbers are there to cover for the very slow L3 cache. With 39ns latency on A13 and similar looking figures here, you're looking at over 100 cycles to get to L3. That's worse than Bulldozer's L3, which was considered pretty bad.
  • name99 - Wednesday, November 11, 2020 - link

    Why not try to *understand* Apple's architecture rather than concentrating on criticism?
    (a) Apple's design is *their* design, it is not a copy of AMD or Intel's design
    (b) Apple's design is optimized for the SoC as a whole, not just the CPU.

    The L3 on Apple SoC's does not fulfill the role of a traditional L3, that is why Apple calls it an SLC (System Level Cache). For traditional CPU caching, Apple has a large L2 (8MiB A14, 12MiB M1).

    The role of the L3 is PRIMARILY
    - to save power (everything, especially on the GPU side, that can be kept there rather than in DRAM is a power advantage)
    - to communicate between different elements of the SoC.
    The fact that the SLC can act as a large (slow, but still faster than DRAM) L3 is just a bonus, it is not the design target.

    Why did Apple keep pushing the UMA theme at their event? The stupid think it's Apple claiming that they are first with UMA; but Apple never said that. The point is that UMA is part of what enables Apple's massive cross-SoC accelerator interaction; while the SLC is what makes that interaction fast and low power. How many accelerators do you think are on the A14/M1? We don't know -- what we do know is that there are 42 on the A12.
    42 accelerators! Did you have a clue that it was anything close to that?
    Sure, you know the big picture, things like ISP, GPU and NPU working together for computational photography, but there is so much more. And they can all interact together and efficiently via SLC.

    https://arxiv.org/pdf/1907.02064v1.pdf
    discusses all this, along with pointing out just how important it is to have fast low energy communication between the accelerators.
  • techconc - Wednesday, November 11, 2020 - link

    Why are we still arguing about the validity of Geekbench? The article even states the following from their own testing...
    "There’s been a lot of criticism about more common benchmark suites such as GeekBench, but frankly I've found these concerns or arguments to be quite unfounded."
  • BlackHat - Wednesday, November 11, 2020 - link

    Because the creator of the benchmark themselves admitted that their old version were somehow inaccurate.
  • Spunjji - Thursday, November 12, 2020 - link

    Just as well we're not really discussing those here, then 😬
  • hecksagon - Tuesday, November 10, 2020 - link

    Too bad the links are all for Geekbench. This is about as far from a real world benchmark you can get.

Log in

Don't have an account? Sign up now