The 2018 Apple iPad Pro (11-Inch) Review: Doubling Down On Performanceby Brett Howse & Andrei Frumusanu on December 4, 2018 10:00 AM EST
Powering iPad Pro: A12X
Section by Andrei Frumusanu
Apple’s new A12X SoC continues the tradition of representing an up-scaled version of the newest generation phone SoCs, in this case the A12. We’ve covered the A12 extensively in our review of the iPhone XS and XS Max, including more extensive coverage of the microarchitectural characteristics of Apple’s newest generation Vortex and Tempest cores.
The A12X, microarchitecturally, looks to be largely sharing the same generation IP blocks as the A12, with the only big difference being that the number of units has been increased in all major aspects of the SoC. Apple now for the first time employs a total of 8 CPU cores on the A12X, and compared to the A12, the figure was achieved by doubling the amount of the big Vortex cores from a dual-core to a quad-core setup. On the GPU side of things, we’ve seen the core count increase from 4 cores in the A12 to a 7-core setup in the A12X. Most important for GPU performance though – and a key aspect of the A-X series from Apple – is that the company has doubled the memory interface width 64-bit wide to 128-bit.
In terms of CPU frequencies, Apple largely follows the same scaling pattern as on the A12, with the only difference being that there’s of course two more big cores at play. Here the A12X can support up to 8 total threads without having to resort to time-slicing their processing times.
|Maximum Frequency vs Loaded Threads
Per-Core Maximum MHz
On a single CPU core, the A12X clocks up to an identical 2500MHz as the A12. The similarities continue with two active big cores where the peak frequency is limited to 2380MHz. Finally, when three or four Vortex cores are active, the peak frequency on the A12X is limited to 2325MHz. The small Tempest cores see an identical frequency scaling as on the A12, with one core clocking up to 1587MHz, two to three cores at 1562MHz and finally 1538MHz when all four cores are active.
Another way of scaling performance of a SoC is not only increasing the amount of computation blocks, but also changing aspects such as the amount of cache available to the various blocks. Let’s see if the A12X differs from the A12 in this regard:
Analysing the memory latency behaviour of the A12X and comparing it to the A12, it looks like Apple hasn’t really changed the cache configuration. The L1 caches are still at 128KB – but that was something to be expected. What is more interesting, is it seems the L2 cache also has an identical size to the A12.
The A12 L2 cache was quite unique in its behaviour in that it seemingly is of 8MB of physical size on the SoC die, however there looked to be a sort of logical partitioning in a way that a single core only has access to 6MB at fast access latencies. Furthermore in the latency graph we also see the separation of the L2 cache into two banks, with the second half of the 6MB region being slightly slower for full random access patterns.
Overall there’s a bit of a conundrum here, as we can’t really reach an accurate conclusion: Has Apple maintained the L2 cache at 8MB, or do we simply not see more of it because of a continuation of this logical partitioning? While we don’t have access to a die shot of the A12X to confirm this, I still think this is simply a case of Apple maintaining 8MB of L2 and sharing it among the four Vortex cores.
One of the A12’s big improvements was the new system level cache coming in at 8MB and employing a new microarchitecture. It looks like this part of the memory cache hierarchy has also remained unchanged as we just see minor differences in the latency behaviour past 8MB (Past the L2).
We’ve extensively covered the SPEC2006 performance of the A12 in our review of the iPhone XS/XS Max and analysed the microarchitectural characteristics of the new Vortex CPU as observed in the performance results of the various SPEC workloads.
We continue the same exercise on the A12X: In general we shouldn’t expect too big variations here from the iPhone results, with the only big difference between the two SoCs being the increased memory bandwidth available to the A12X.
In SPECin2006, the A12X’s performance is pretty much in line with the A12, with the scores being pretty much within the margin of error, although the A12X ends up a percentage point or two faster than the A12 in some benchmarks.
The big outlier here is 462.libquantum: The benchmark workload here is characterised as heavily vectorised and extremely memory bandwidth intensive. Here the 21% increase could very well be attributed to the doubled memory bandwidth available to the A12X.
In SPECfp2006, a more significant amount of workloads are characterised by their memory and latency intensive nature. Here, the advantages of the A12X are more visible as we’re seeing an average 7% increase, ranging from 3% as the lowest increase in 444.namd up to more significant 14 and 12% increases in 433.milc and 470.lbm.
The overall SPEC2006 scores for the A12X are 45.95 in SPECint2006 and 58.78 in the C/C++ workloads of SPECfp2006 – representing 2% and 7% increases over the A12.
For the sake of comparison against the various existing mobile SoCs, here’s again a large graph overview comparison containing all relevant recent Arm microarchitectures.
Unfortunately for this review we weren’t able to probe the power and efficiency of the A12X – however I’m expecting figures not all too different to what we’ve published on the A12, meaning single-core active system power should be in the 3.6-4.3W range. Naturally this should be a bit higher on the A12X due to the doubled memory controller interfaces.
Overall, the A12X is a beast of a SoC and very much follows the performance of the A12. The CPU on the A12X seems pretty much the same as on the A12 – with slight performance increases in workloads that more heavily stress the memory subsystem, here the doubled memory bandwidth helps the A12X to rise above the A12.