SoC Analysis: CPU Performance

Now that we’ve had a chance to take a look at A9X’s design and a bit on the difference between the x86 and ARM ISAs, let’s take a look at A9X’s performance at a lower level.

From a CPU perspective A9X is just a higher clocked implementation of the dual-core Twister CPU design we first saw on A9 last year. As a result the fundamentals of the CPU architecture have not changed relative to A9. However A9X relative to A8X drops down from three CPU cores to two, so among the factors we’ll want to look at is how Apple has been impacted by dropping down to two faster cores.

We’ll start things off with Geekbench, 3, which gives us a fairly low-level look at CPU performance.

Geekbench 3 - Integer Performance
  A9X A8X % Advantage
AES ST
1.17 GB/s
0.98 GB/s
19%
AES MT
2.85 GB/s
3.16 GB/s
-10%
Twofish ST
120.7 MB/s
64.0 MB/s
89%
Twofish MT
228.3 MB/s
182.7 MB/s
25%
SHA1 ST
1.03 GB/s
0.53 GB/s
94%
SHA1 MT
1.95 GB/s
1.48 GB/s
32%
SHA2 ST
205.8 MB/s
119.1 MB/s
73%
SHA2 MT
395.5 MB/s
330.6 MB/s
20%
BZip2Comp ST
8.95 MB/s
5.71 MB/s
57%
BZip2Comp MT
17.0 MB/s
16.6 MB/s
2%
Bzip2Decomp ST
14.7 MB/s
8.98 MB/s
64%
Bzip2Decomp MT
28.1 MB/s
25.2 MB/s
12%
JPG Comp ST
33.7 MP/s
20.6 MP/s
64%
JPG Comp MT
64.4 MP/s
60.8 MP/s
6%
JPG Decomp ST
89.2 MP/s
53.0 MP/s
68%
JPG Decomp MT
166.5 MP/s
153.9 MP/s
8%
PNG Comp ST
2.11 MP/s
1.35 MP/s
56%
PNG Comp MT
4.04 MP/s
3.82 MP/s
6%
PNG Decomp ST
31.5 MP/s
18.7 MP/s
68%
PNG Decomp MT
56.9 MP/s
56.3 MP/s
1%
Sobel ST
138.3 MP/s
82.5 MP/s
68%
Sobel MT
258.7 MP/s
225.6 MP/s
15%
Lua ST
3.25 MB/s
1.68 MB/s
93%
Lua MT
6.02 MB/s
4.60 MB/s
31%
Dijkstra ST
10.1 Mpairs/s
6.70 Mpairs/s
51%
Dijkstra MT
17.6 Mpairs/s
16.0 Mpairs/s
10%

The interesting thing about Geekbench is that as a result of being a lower-level test the bulk of its tests scale up well with CPU core counts, as the benchmark can just spawn more threads. Consequently I wasn’t entirely sure what to expect here, as this presents the tri-core A8X with a much better than average scaling opportunity, making it especially harsh on the A9X.

But what the results show us is that even by dropping back down to two CPU cores, A9X does very well overall. The single-threaded results are greatly improved, with A9X offering better than a 50% single-threaded perf gain in the majority of the sub-tests. Meanwhile even with the multi-threaded tests, A9X only loses once, on AES. Otherwise two higher clocked Twister cores are beating three lower clocked Typhoon cores by anywhere between a few percent up to 32%. In this sense Geekbench is something of a worst-case scenario, as real-world software rarely benefits from additional cores this well (this being part of the reason why A8 and A9 did so well relative to quad Cortex-A57 designs), so it’s promising to see that even in this worst-case scenario A9X can deliver meaningful performance gains over A8X.

Geekbench 3 - Floating Point Performance
  A9X A8X % Advantage
BlackScholes ST
14.9 Mnodes/s
8.52 Mnodes/s
75%
BlackScholes MT
28.2 Mnodes/s
24.9 Mnodes/s
13%
Mandelbrot ST
2.23 GFLOPS
1.27 GFLOPS
76%
Mandelbrot MT
4.27 GFLOPS
3.66 GFLOPS
17%
Sharpen Filter ST
2.10 GFLOPS
1.08 GFLOPS
94%
Sharpen Filter MT
4.01 GFLOPS
3.12 GFLOPS
29%
Blur Filter ST
2.68 GFLOPS
1.53 GFLOPS
75%
Blur Filter MT
5.08 GFLOPS
4.47 GFLOPS
14%
SGEMM ST
6.77 GFLOPS
4.12 GFLOPS
64%
SGEMM MT
12.7 GFLOPS
11.6 GFLOPS
9%
DGEMM ST
3.32 GFLOPS
2.02 GFLOPS
64%
DGEMM MT
6.21 GFLOPS
5.61 GFLOPS
11%
SFFT ST
3.52 GFLOPS
1.92 GFLOPS
83%
SFFT MT
6.67 GFLOPS
5.40 GFLOPS
24%
DFFT ST
3.21 GFLOPS
1.80 GFLOPS
78%
DFFT MT
6.02 GFLOPS
5.11 GFLOPS
18%
N-Body ST
1.41 Mpairs/s
0.78 Mpairs/s
81%
N-Body MT
2.69 Mpairs/s
2.34 Mpairs/s
15%
Ray Trace ST
4.99 MP/s
2.96 MP/s
69%
Ray Trace MT
9.56 MP/s
8.64 MP/s
11%

The story with Geekbench 3 floating point performance is much the same. Performance never regresses, even in multi-threaded workloads. In lightly threaded floating point workloads A9X is going to walk all over A8X, and in multi-threaded workloads we’re still looking at anywhere between a 9% and a 29% performance gain. This goes to show just how powerful Twister is relative to Typhoon, especially with A9X’s much higher clockspeeds factored in. And it lends a lot of support to Apple’s ongoing design philosophy of favoring a smaller number of high performance (and now higher-clocked) cores.

SPEC CPU 2006

Moving on, our other lower-level benchmark for this review is SPECint2006. Developed by the Standard Performance Evaluation Corporation, SPECint2006 is the integer component of their larger SPEC CPU 2006 benchmark. As was the case with SPEC CPU 2000 before it, SPEC CPU 2006 is designed by a committee of technology firms to offer a consistent and meaningful cross-platform benchmark that can compare systems of different performance levels and architectures. Among cross-platform benchmarks SPEC CPU is generally held in high regard, and while it is but one collection of benchmarks and like all benchmarks should not be taken as the be-all end-all of benchmarks on its own, it provides us with a very important look at CPU performance that we otherwise cannot get.

SPECint2006 is the successor to the SPECint2000 test we’ve been using periodically for the last couple of years now. Initially released in 2006, SPECint2006 is still SPEC’s current-generation CPU integer benchmark. We’ve wanted to switch to SPECint2006 for some time now, but have been held back by the overall low performance of tablet SoCs, which lacked the speed and memory to run SPECint2006 and to do so in a reasonable amount of time. However now thanks to the greater performance and greater memory of A9X, we’re finally able to run SPEC’s current-generation CPU benchmark on a tablet.

SPECint2006 is composed of 12 sub-benchmarks, testing a wide variety of scenarios from video compression to PERL execution to AI. This is a non-graphical benchmark and I believe it’s reasonable to argue that the benchmark set itself leans towards server high performance computing/workstation use cases, but with that said even if it’s not a perfect fit for tablet use cases it offers a lot of real-world tests that give us a good variety of different workloads to benchmark CPUs with. SPECint2006 scores are in turn reported as a ratio, measuring how many times faster a tested system is against the SPEC reference system, a 1997 Sun Ultrasparc Ultra Enterprise 2 server, which is based around a 296 MHz UltraSPARC II CPU.

CINT2006 (Integer Component of SPEC CPU2006):
Benchmark Language Application Area Description
400.perlbench
Programming Language  Derived from Perl V5.8.7. The workload includes SpamAssassin, MHonArc (an email indexer), and specdiff (SPEC's tool that checks benchmark outputs).
401.bzip2
Compression  Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O.
403.gcc
C Compiler  Based on gcc Version 3.2, generates code for Opteron.
429.mcf
Combinatorial Optimization  Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport.
445.gobmk
Artificial Intelligence: Go  Plays the game of Go, a simply described but deeply complex game.
456.hmmer
Search Gene Sequence  Protein sequence analysis using profile hidden Markov models (profile HMMs)
458.sjeng
Artificial Intelligence: chess  A highly-ranked chess program that also plays several chess variants.
462.libquantum
C
Physics / Quantum Computing Simulates a quantum computer, running Shor's polynomial-time factorization algorithm.
464.h264ref
Video Compression  A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2
471.omnetpp
C++ 
Discrete Event Simulation  Uses the OMNet++ discrete event simulator to model a large Ethernet campus network.
473.astar
C++ 
Path-finding Algorithms  Pathfinding library for 2D maps, including the well known A* algorithm.
483.xalancbmk
C++ 
XML Processing  A modified version of Xalan-C++, which transforms XML documents to other document types.

Although designed as a CPU-intensive benchmark, it’s important to note that SPECint2006 is officially labeled as “stressing a system's processor, memory subsystem and compiler.” The memory subsystem aspect is fairly self-explanatory – it’s difficult to test a CPU without testing the memory as well except in the cases of trivial workloads that can fit in a CPU’s caches – however the compiler aspect calls for special attention. As SPECint2006 is a cross-platform benchmark in the truest sense of the word, it’s impossible to offer a single binary for all platforms – especially platforms that had yet to be designed in 2006 such as ARMv8 – and, simply put, the moment you begin compiling benchmarks for different systems using different compilers, the performance of the compiler becomes a factor of benchmark performance as well.

As a result, and unlike many of the other benchmarks we run here, it’s important to note that compilers play a big part in SPECint2006 performance, and this is by design. Compiler authors can and do optimize for SPEC CPU, with the ultimate goal of giving the tested CPU the best chance to achieve the best possible performance in this benchmark; the compiler should not hold back the CPU. However in turn, all results must be validated, so overly aggressive compilers that generate bad code will be caught and failed. The end result is that in a cross-platform scenario with different binaries, SPECint2006 isn’t quite as apples-to-apples as our more traditional benchmarks, but it offers us a unique look at cross-platform CPU performance.

For our testing we’re using optimized binaries generated for Apple’s A8X/A9X SoCs and Intel’s Broadwell/Skylake processors respectively. The following compiler flags were used.

Apple ARMv8: XCode 7 (LLVM), -Ofast

Intel x86: Intel C++ Compiler 16, -xCORE-AVX2 -ipo -mdynamic-no-pic -O3 -no-prec-div -fp-model fast=2 -m32 -opt-prefetch -ansi-alias -stdlib=libstdc++

Finally, of SPECint2006’s 12 sub-benchmarks, our current harness is only able to run 10 of them on the iPad Pro at this time, as 473.astar and 483.xalancbmk are failing on the iPad. So the following is not a complete run of SPECint2006, and for the purposes of SPEC CPU are officially classified as performance estimates.

To start things off, let’s look at the Apple-to-Apple comparison, pitting A9X against A8X.

SPECint_base2006 - Estimated Scores - A9X vs. A8X
  A9X A8X A9X vs. A8X %
400.perlbench
25.0
14.1
78%
401.bzip2
17.6
11.5
54%
403.gcc
20.5
12.4
65%
429.mcf
18.7
N/A
N/A
445.gobmk
23.4
13.0
80%
456.hmmer
25.1
14.1
79%
458.sjeng
23.6
13.6
73%
462.libquantum
74.6
49.2
52%
464.h264ref
41.3
24.0
72%
471.omnetpp
10.3
8.0
29%

Unsurprisingly, A9X is leaps and bounds ahead here. The smallest gain is with 471.omnetpp, a discrete event simulator, where A9X holds a 29% lead. Otherwise A9X takes a significant lead, beating A8X by upwards of 80% in 445.gobmk, a Go (board game) AI benchmark.

Calling back to our iPhone 6s review for a moment, A9X has a much larger advantage vs. A8X with SPECint2006 as compared to A9 vs. A8 on SPECint2000. A good deal of this has to do with A9X’s significant clockspeed bump versus A8X, but at the same time this also illustrates how the newer SPECint2006 rates A9X and Twister even more highly than A8X/Typhoon. As we’ve seen time and time again, Twister is a much faster CPU core than the already fast Typhoon, and this is a big part of why Apple continues to top our ARM benchmarks.

Last but certainly not least however is our main event, A9X versus Intel’s Core M CPUs. As we’re finally able to run SPECint2006 on an Apple SoC, this is the first chance we’ve had to compare Apple and Intel CPUs using SPEC, so it’s exciting to finally be able to make this comparison.

At the same time this comparison not just for academic curiosity; as Apple has significantly improved their CPU design with every generation and has quickly moved to newer manufacturing processes, they have been closing the architecture and manufacturing gap with Intel. Twister and Skylake are fairly similar designs, both implementing a wide execution pipeline with a focus on achieving a high IPC, and in this latest generation of devices, coupling that with a fairly high 2GHz+ clockspeed. Over the years Apple and Intel have approached this problem from different angles – Apple built up from phones to tablets while Intel built down from desktops to tablets – but the end result is that the two have ended up in a similar place in terms of basic architecture design goals. Meanwhile from a manufacturing standpoint Intel is arguably still roughly a generation ahead with their 14nm FinFET process – naming aside, their transistors are smaller than TSMC’s 16nm FinFET – so Apple is the underdog from this point of view.

The burning question is of course is whether Apple’s CPU designs are catching up to the performance of Intel’s Core lineup, thanks to the continual iteration of architecture and manufacturing on the Apple side, versus the slower rate of growth we’ve seen over the last few generations with Intel’s Core lineup. The iPad Pro in turn finally gives us the opportunity to try to answer that question, as the faster SoC coupled with a form factor and TDP closer to regular Core M devices gives us the most apples-to-apples comparison yet.

To that end we have assembled a smorgasbord of Core M devices to compare to the iPad Pro and A9X SoC. Perhaps the most apple-to-apple comparison is the iPad Pro versus the 2015 MacBook; though approaching a year old, this is still Apple’s current generation MacBook, with our base model incorporating an older Broadwell-based Core M-5Y31. Also from the Broadwell generation we have an ASUS Transformer Book T300 Chi, which uses a high-end Core M-5Y71, to showcase the performance of Intel’s highest clocked Core M processors. Finally, from the latest Skylake generation we have the ASUS ZenBook UX305CA, which incorporates Intel’s base-tier Core m3-6Y30 CPU.

Finally, it should be noted that to keep testing as close as possible, all of these devices are passively cooled, and that as a result all of these devices are also TDP/heat throttling though much of the SPECint2006 benchmark. Ultimately what we’re measuring here is not the peak performance of each system, but rather its sustained performance under the TDP limitations of their respective designs. If unrestricted, undoubtedly all of these devices would score higher.

SPECint_base2006 - Estimated Scores - A9X vs. Intel Broadwell/Skylake
  A9X Core M-5Y31
(2015 MacBook)
Core M-5Y71
(Asus T300 Chi)
Core m3-6Y30
(Asus UX305CA)
A9X vs MacBook %
Base/Turbo Freq 2.26GHz 0.9/2.4GHz 1.2/2.9GHz 0.9/2.2GHz  
400.perlbench
25.0
21.7
28.5
24.4
15%
401.bzip2
17.6
14.6
19.6
15.3
21%
403.gcc
20.5
22.8
31.1
28.2
-10%
429.mcf
18.7
35.9
46.7
38
-48%
445.gobmk
23.4
16.9
23.7
18
38%
456.hmmer
25.1
43.9
61.9
48.1
-43%
458.sjeng
23.6
19.2
26.1
19.3
23%
462.libquantum
74.6
292
476
409
-74%
464.h264ref
41.3
38.4
49.7
37.3
8%
471.omnetpp
10.3
16.3
23.7
20.6
-37%

As this is a fairly dense lineup I’m not going to call out every figure, but let’s focus on a few key areas. First, on A9X versus the Core M-5Y31 (MacBook), the advantage flips between each device as each test hits upon different strengths and weaknesses of each CPU’s architecture. Overall each device wins half of the benchmarks, however the Core M powered MacBook wins by a larger average margin. In other words, the iPad Pro is competitive with the MacBook depending on the test, however on average it ends up trailing in performance.

Relative to the MacBook, the iPad Pro does best in 445.gobmk, the Go benchmark, while its largest deficit is with 462.libquantum. The latter is a particularly interesting case as the benchmark is very easy to vectorize, giving us perhaps our best look at the vector performance of Twister versus Broadwell, and how well their respective compilers can actually vectorize it. The end result has the Intel platforms solidly in the lead here, hinting that Intel still has better vector performance at this time.

Shifting gears to the Asus ZenBook UX305CA and its newer Skylake based Core m3-6Y30, to little surprise Skylake closes the gap with A9X in the benchmarks where Core M was losing, and pulls further ahead in the benchmarks where it was winning. Despite this the two systems split the number of wins at 5 each, but in the cases where the ZenBook is winning it’s very clearly winning. Overall Skylake sees some decent performance improvements relative to the Broadwell CPU in our MacBook – with the exact gains depending on the test – allowing it to widen the gap compared to the A9X. Overall A9X is still competitive in specific scenarios, but on average it definitely trails the Skylake Core m3.

Finally, going back to Broadwell we have the ASUS Transformer Book T300 Chi, which incorporates a high-end Core M-5Y71 processor. This is still officially a 4.5W TDP processor, and as a result this essentially measures Broadwell Core M’s best case performance. With a maximum CPU clockspeed of 2.9GHz as compared to the slower low-end Skylake and Broadwell CPUs, the T300 Chi unsurprisingly beats the iPad Pro in every single benchmark. At best the two are neck-and-neck with Apple’s best benchmark, 445.gobmk, but otherwise it’s a clear and very significant lead for Intel’s fastest Broadwell Core M processor.

In the end, what to take away from this depends on how you want to read the results and what you believe the most important CPU comparison is. As Apple doesn’t use multiple bins/clockspeeds of A9X processors, this muddles the comparison some since there’s a significant difference in performance between Intel’s fastest and slowest Core M processors, and at the same time Intel’s official list prices put every CPU except the top-bin Core m7-6Y75 at the same price of $281.

Ultimately I think it’s reasonable to say that Intel’s Core M processors hold a CPU performance edge over iPad Pro and the A9X SoC. Against Intel’s slowest chips A9X is competitive, but as it stands A9X can’t keep up with the faster chips. However by the same metric there’s no question that Apple is closing the gap; A9X can compete with both Broadwell and Skylake Core M processors, and that’s something Apple couldn’t claim even a generation ago. That it’s only against the likes of Core m3 means that Apple still has a way to go, particularly as A9X still loses by more than it wins, but it’s significant progress in a short period of time. And I’ll wager that it’s closer than Intel would like to be, especially if Apple puts A9X into a cheaper iPad Air in the future.

SoC Analysis: On x86 vs ARMv8 System Performance
Comments Locked

408 Comments

View All Comments

  • willis936 - Monday, January 25, 2016 - link

    I am very interested in energy per calculation comparisons between the A9X and the Core M. Yes Core M will beat out the A9X from a power perspective but are both within the same power budget? If so then Intel has done some impressive work.
  • Constructor - Monday, January 25, 2016 - link

    That's not even cut and dried. The Anandtech performance comparison leaves quite a number of question marks. It looks a lot as if some of the tests were written originally so autovectorization would work with known desktop compilers but LLVM for iOS just didn't catch on to it.

    The drastic swings between the various tests are not very plausible otherwise.

    Which makes that comparison utterly useless if that's the case. And that the testers didn't even bother to check the generated code is highly disappointing.
  • Icecreamfarmer - Monday, January 25, 2016 - link

    I just registered to post this but I have a question?
    How so cant you draw diagonal lines with a surface 3?
    I just tried it several times with and without ruler but they are flawless?

    Could you explain?
  • VictorBd - Tuesday, January 26, 2016 - link

    Surface diagonals: The MS N-Trig pen tech manifests a subtle but distinct anomaly when drawing slow diagonal lines in that the lines waver a bit. If you search on this you can see it demonstrated. It is a genuine defect in the current tech. For my use case it is not a concern. I use the pen extensively for interview and meeting note taking (and for light sketching for fun).

    For my own purposes, the SP4 provides the most compelling overall device available on the market at this time: the power, form factor, desktop docking, OS and apps, ports, and pen when taken all together cannot be matched. It is my primary device every day all day.

    At night when I just want to consume web, video, or music, I use an iPad Air 2. Perfect for that. I bought and returned an iPad Pro. I could never try to do production work on it. And it's price and bulk are not worth the beauty of its screen. So I'm keeping the Air for casual consumption. But for work its the SP4 (with a Toshiba dynaPad as a light backup).
  • Constructor - Tuesday, January 26, 2016 - link

    There also seem to be problems properly following the pen near the edges of the screen, even requiring calibration by the user, apparently.

    The Apple Pencil has neither of those problems. It works very precisely and consistently in any direction and right up to the edges.
  • VictorBd - Tuesday, January 26, 2016 - link

    I initially had pen issues at the edge with the SP4, but it was completely resolved for me by a pen calibration reset. The only thing left is the subtle diagonal - which does not impact me.

    I also note that the iPad's palm rejection isn't perfect. It allowed my palm to make marks on the screen in OneNote, and it will register finger input as drawing from your "non pen hand" as well in some apps. And right now there's no way to switch off touch input while using the pen so you can grab it however you want. (Another IOS "protected garden" limitation.)
  • Constructor - Tuesday, January 26, 2016 - link

    Nope. The Procreate app, for instance, ignores my fingers completely for any drawing tools but I can still simultaneously draw with the Pencil and operate the UI with my fingers (such as the opacity and size sliders, or the two- and three-finger undo/redogestures.

    That bit about the "protected garden" is pure rubbish – iOS provides separate APIs for the Pencil and apps already make use of that.

    By the way: Palm rejection (in apps where you can't disable finger touch drawing on the canvas) can be trained to some degree. if you're setting your palm clearly on the glass with a larger area touching the surface, it works best. Avoid just light touches with a knuckle of your pinkie finger, for instance (which is when palm rejection can't distinguish it from an intended finger touch), but actually fully rest your hand on the glass for drawing and trust palm rejection to filter that out.
  • VictorBd - Tuesday, January 26, 2016 - link

    Glad to hear that Procreate has done it right. Users will benefit greatly if other apps follow suit. Until they do (and many likely will not)

    On my other point, I think it unlikely that Apple will either provide or allow others to provide (in the controlled garden of the app store) a utility that toggles the touch input off and on while using the pen. If you haven't used a pen tablet with this feature it may not be obvious at first. But many of those who do discover and use it find it to be a "game changer." All of a sudden your tablet can be handled like a physical piece of paper without any concern for unintended touch inputs. It is the first thing I install on a pen tablet. If iPad Pro had it the experience for me would greatly improve. But I predict that Apple won't allow it. But there is much to the iPad Pro to love no doubt.
  • VictorBd - Tuesday, January 26, 2016 - link

    EDIT: "Until they do.... I don't think Apple will allow the touch toggle ability....."
  • Constructor - Tuesday, January 26, 2016 - link

    Your theory is completely wrong. Apple doesn't "disallow" anything!

    Any app can distinguish between passive finger touch and active Pencil dtection at their own discretion. The APIs already provide that distinction, and I have no idea where you get that idea from that Apple would have any interest to interfere with that.

    Again: In Procreate I can simultaneously draw with the Pencil and during the same time move the size slider with a finger while the Pencil keeps drawing – there is no "toggling" of any kind.

    It is purely on the application to decide how to treat fingers on the one hand (ahem) and the Pencil on the other – and both are clearly distinguishable at the same time!

Log in

Don't have an account? Sign up now