SoC Analysis: CPU Performance

Now that we’ve had a chance to take a look at A9X’s design and a bit on the difference between the x86 and ARM ISAs, let’s take a look at A9X’s performance at a lower level.

From a CPU perspective A9X is just a higher clocked implementation of the dual-core Twister CPU design we first saw on A9 last year. As a result the fundamentals of the CPU architecture have not changed relative to A9. However A9X relative to A8X drops down from three CPU cores to two, so among the factors we’ll want to look at is how Apple has been impacted by dropping down to two faster cores.

We’ll start things off with Geekbench, 3, which gives us a fairly low-level look at CPU performance.

Geekbench 3 - Integer Performance
  A9X A8X % Advantage
AES ST
1.17 GB/s
0.98 GB/s
19%
AES MT
2.85 GB/s
3.16 GB/s
-10%
Twofish ST
120.7 MB/s
64.0 MB/s
89%
Twofish MT
228.3 MB/s
182.7 MB/s
25%
SHA1 ST
1.03 GB/s
0.53 GB/s
94%
SHA1 MT
1.95 GB/s
1.48 GB/s
32%
SHA2 ST
205.8 MB/s
119.1 MB/s
73%
SHA2 MT
395.5 MB/s
330.6 MB/s
20%
BZip2Comp ST
8.95 MB/s
5.71 MB/s
57%
BZip2Comp MT
17.0 MB/s
16.6 MB/s
2%
Bzip2Decomp ST
14.7 MB/s
8.98 MB/s
64%
Bzip2Decomp MT
28.1 MB/s
25.2 MB/s
12%
JPG Comp ST
33.7 MP/s
20.6 MP/s
64%
JPG Comp MT
64.4 MP/s
60.8 MP/s
6%
JPG Decomp ST
89.2 MP/s
53.0 MP/s
68%
JPG Decomp MT
166.5 MP/s
153.9 MP/s
8%
PNG Comp ST
2.11 MP/s
1.35 MP/s
56%
PNG Comp MT
4.04 MP/s
3.82 MP/s
6%
PNG Decomp ST
31.5 MP/s
18.7 MP/s
68%
PNG Decomp MT
56.9 MP/s
56.3 MP/s
1%
Sobel ST
138.3 MP/s
82.5 MP/s
68%
Sobel MT
258.7 MP/s
225.6 MP/s
15%
Lua ST
3.25 MB/s
1.68 MB/s
93%
Lua MT
6.02 MB/s
4.60 MB/s
31%
Dijkstra ST
10.1 Mpairs/s
6.70 Mpairs/s
51%
Dijkstra MT
17.6 Mpairs/s
16.0 Mpairs/s
10%

The interesting thing about Geekbench is that as a result of being a lower-level test the bulk of its tests scale up well with CPU core counts, as the benchmark can just spawn more threads. Consequently I wasn’t entirely sure what to expect here, as this presents the tri-core A8X with a much better than average scaling opportunity, making it especially harsh on the A9X.

But what the results show us is that even by dropping back down to two CPU cores, A9X does very well overall. The single-threaded results are greatly improved, with A9X offering better than a 50% single-threaded perf gain in the majority of the sub-tests. Meanwhile even with the multi-threaded tests, A9X only loses once, on AES. Otherwise two higher clocked Twister cores are beating three lower clocked Typhoon cores by anywhere between a few percent up to 32%. In this sense Geekbench is something of a worst-case scenario, as real-world software rarely benefits from additional cores this well (this being part of the reason why A8 and A9 did so well relative to quad Cortex-A57 designs), so it’s promising to see that even in this worst-case scenario A9X can deliver meaningful performance gains over A8X.

Geekbench 3 - Floating Point Performance
  A9X A8X % Advantage
BlackScholes ST
14.9 Mnodes/s
8.52 Mnodes/s
75%
BlackScholes MT
28.2 Mnodes/s
24.9 Mnodes/s
13%
Mandelbrot ST
2.23 GFLOPS
1.27 GFLOPS
76%
Mandelbrot MT
4.27 GFLOPS
3.66 GFLOPS
17%
Sharpen Filter ST
2.10 GFLOPS
1.08 GFLOPS
94%
Sharpen Filter MT
4.01 GFLOPS
3.12 GFLOPS
29%
Blur Filter ST
2.68 GFLOPS
1.53 GFLOPS
75%
Blur Filter MT
5.08 GFLOPS
4.47 GFLOPS
14%
SGEMM ST
6.77 GFLOPS
4.12 GFLOPS
64%
SGEMM MT
12.7 GFLOPS
11.6 GFLOPS
9%
DGEMM ST
3.32 GFLOPS
2.02 GFLOPS
64%
DGEMM MT
6.21 GFLOPS
5.61 GFLOPS
11%
SFFT ST
3.52 GFLOPS
1.92 GFLOPS
83%
SFFT MT
6.67 GFLOPS
5.40 GFLOPS
24%
DFFT ST
3.21 GFLOPS
1.80 GFLOPS
78%
DFFT MT
6.02 GFLOPS
5.11 GFLOPS
18%
N-Body ST
1.41 Mpairs/s
0.78 Mpairs/s
81%
N-Body MT
2.69 Mpairs/s
2.34 Mpairs/s
15%
Ray Trace ST
4.99 MP/s
2.96 MP/s
69%
Ray Trace MT
9.56 MP/s
8.64 MP/s
11%

The story with Geekbench 3 floating point performance is much the same. Performance never regresses, even in multi-threaded workloads. In lightly threaded floating point workloads A9X is going to walk all over A8X, and in multi-threaded workloads we’re still looking at anywhere between a 9% and a 29% performance gain. This goes to show just how powerful Twister is relative to Typhoon, especially with A9X’s much higher clockspeeds factored in. And it lends a lot of support to Apple’s ongoing design philosophy of favoring a smaller number of high performance (and now higher-clocked) cores.

SPEC CPU 2006

Moving on, our other lower-level benchmark for this review is SPECint2006. Developed by the Standard Performance Evaluation Corporation, SPECint2006 is the integer component of their larger SPEC CPU 2006 benchmark. As was the case with SPEC CPU 2000 before it, SPEC CPU 2006 is designed by a committee of technology firms to offer a consistent and meaningful cross-platform benchmark that can compare systems of different performance levels and architectures. Among cross-platform benchmarks SPEC CPU is generally held in high regard, and while it is but one collection of benchmarks and like all benchmarks should not be taken as the be-all end-all of benchmarks on its own, it provides us with a very important look at CPU performance that we otherwise cannot get.

SPECint2006 is the successor to the SPECint2000 test we’ve been using periodically for the last couple of years now. Initially released in 2006, SPECint2006 is still SPEC’s current-generation CPU integer benchmark. We’ve wanted to switch to SPECint2006 for some time now, but have been held back by the overall low performance of tablet SoCs, which lacked the speed and memory to run SPECint2006 and to do so in a reasonable amount of time. However now thanks to the greater performance and greater memory of A9X, we’re finally able to run SPEC’s current-generation CPU benchmark on a tablet.

SPECint2006 is composed of 12 sub-benchmarks, testing a wide variety of scenarios from video compression to PERL execution to AI. This is a non-graphical benchmark and I believe it’s reasonable to argue that the benchmark set itself leans towards server high performance computing/workstation use cases, but with that said even if it’s not a perfect fit for tablet use cases it offers a lot of real-world tests that give us a good variety of different workloads to benchmark CPUs with. SPECint2006 scores are in turn reported as a ratio, measuring how many times faster a tested system is against the SPEC reference system, a 1997 Sun Ultrasparc Ultra Enterprise 2 server, which is based around a 296 MHz UltraSPARC II CPU.

CINT2006 (Integer Component of SPEC CPU2006):
Benchmark Language Application Area Description
400.perlbench
Programming Language  Derived from Perl V5.8.7. The workload includes SpamAssassin, MHonArc (an email indexer), and specdiff (SPEC's tool that checks benchmark outputs).
401.bzip2
Compression  Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O.
403.gcc
C Compiler  Based on gcc Version 3.2, generates code for Opteron.
429.mcf
Combinatorial Optimization  Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport.
445.gobmk
Artificial Intelligence: Go  Plays the game of Go, a simply described but deeply complex game.
456.hmmer
Search Gene Sequence  Protein sequence analysis using profile hidden Markov models (profile HMMs)
458.sjeng
Artificial Intelligence: chess  A highly-ranked chess program that also plays several chess variants.
462.libquantum
C
Physics / Quantum Computing Simulates a quantum computer, running Shor's polynomial-time factorization algorithm.
464.h264ref
Video Compression  A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2
471.omnetpp
C++ 
Discrete Event Simulation  Uses the OMNet++ discrete event simulator to model a large Ethernet campus network.
473.astar
C++ 
Path-finding Algorithms  Pathfinding library for 2D maps, including the well known A* algorithm.
483.xalancbmk
C++ 
XML Processing  A modified version of Xalan-C++, which transforms XML documents to other document types.

Although designed as a CPU-intensive benchmark, it’s important to note that SPECint2006 is officially labeled as “stressing a system's processor, memory subsystem and compiler.” The memory subsystem aspect is fairly self-explanatory – it’s difficult to test a CPU without testing the memory as well except in the cases of trivial workloads that can fit in a CPU’s caches – however the compiler aspect calls for special attention. As SPECint2006 is a cross-platform benchmark in the truest sense of the word, it’s impossible to offer a single binary for all platforms – especially platforms that had yet to be designed in 2006 such as ARMv8 – and, simply put, the moment you begin compiling benchmarks for different systems using different compilers, the performance of the compiler becomes a factor of benchmark performance as well.

As a result, and unlike many of the other benchmarks we run here, it’s important to note that compilers play a big part in SPECint2006 performance, and this is by design. Compiler authors can and do optimize for SPEC CPU, with the ultimate goal of giving the tested CPU the best chance to achieve the best possible performance in this benchmark; the compiler should not hold back the CPU. However in turn, all results must be validated, so overly aggressive compilers that generate bad code will be caught and failed. The end result is that in a cross-platform scenario with different binaries, SPECint2006 isn’t quite as apples-to-apples as our more traditional benchmarks, but it offers us a unique look at cross-platform CPU performance.

For our testing we’re using optimized binaries generated for Apple’s A8X/A9X SoCs and Intel’s Broadwell/Skylake processors respectively. The following compiler flags were used.

Apple ARMv8: XCode 7 (LLVM), -Ofast

Intel x86: Intel C++ Compiler 16, -xCORE-AVX2 -ipo -mdynamic-no-pic -O3 -no-prec-div -fp-model fast=2 -m32 -opt-prefetch -ansi-alias -stdlib=libstdc++

Finally, of SPECint2006’s 12 sub-benchmarks, our current harness is only able to run 10 of them on the iPad Pro at this time, as 473.astar and 483.xalancbmk are failing on the iPad. So the following is not a complete run of SPECint2006, and for the purposes of SPEC CPU are officially classified as performance estimates.

To start things off, let’s look at the Apple-to-Apple comparison, pitting A9X against A8X.

SPECint_base2006 - Estimated Scores - A9X vs. A8X
  A9X A8X A9X vs. A8X %
400.perlbench
25.0
14.1
78%
401.bzip2
17.6
11.5
54%
403.gcc
20.5
12.4
65%
429.mcf
18.7
N/A
N/A
445.gobmk
23.4
13.0
80%
456.hmmer
25.1
14.1
79%
458.sjeng
23.6
13.6
73%
462.libquantum
74.6
49.2
52%
464.h264ref
41.3
24.0
72%
471.omnetpp
10.3
8.0
29%

Unsurprisingly, A9X is leaps and bounds ahead here. The smallest gain is with 471.omnetpp, a discrete event simulator, where A9X holds a 29% lead. Otherwise A9X takes a significant lead, beating A8X by upwards of 80% in 445.gobmk, a Go (board game) AI benchmark.

Calling back to our iPhone 6s review for a moment, A9X has a much larger advantage vs. A8X with SPECint2006 as compared to A9 vs. A8 on SPECint2000. A good deal of this has to do with A9X’s significant clockspeed bump versus A8X, but at the same time this also illustrates how the newer SPECint2006 rates A9X and Twister even more highly than A8X/Typhoon. As we’ve seen time and time again, Twister is a much faster CPU core than the already fast Typhoon, and this is a big part of why Apple continues to top our ARM benchmarks.

Last but certainly not least however is our main event, A9X versus Intel’s Core M CPUs. As we’re finally able to run SPECint2006 on an Apple SoC, this is the first chance we’ve had to compare Apple and Intel CPUs using SPEC, so it’s exciting to finally be able to make this comparison.

At the same time this comparison not just for academic curiosity; as Apple has significantly improved their CPU design with every generation and has quickly moved to newer manufacturing processes, they have been closing the architecture and manufacturing gap with Intel. Twister and Skylake are fairly similar designs, both implementing a wide execution pipeline with a focus on achieving a high IPC, and in this latest generation of devices, coupling that with a fairly high 2GHz+ clockspeed. Over the years Apple and Intel have approached this problem from different angles – Apple built up from phones to tablets while Intel built down from desktops to tablets – but the end result is that the two have ended up in a similar place in terms of basic architecture design goals. Meanwhile from a manufacturing standpoint Intel is arguably still roughly a generation ahead with their 14nm FinFET process – naming aside, their transistors are smaller than TSMC’s 16nm FinFET – so Apple is the underdog from this point of view.

The burning question is of course is whether Apple’s CPU designs are catching up to the performance of Intel’s Core lineup, thanks to the continual iteration of architecture and manufacturing on the Apple side, versus the slower rate of growth we’ve seen over the last few generations with Intel’s Core lineup. The iPad Pro in turn finally gives us the opportunity to try to answer that question, as the faster SoC coupled with a form factor and TDP closer to regular Core M devices gives us the most apples-to-apples comparison yet.

To that end we have assembled a smorgasbord of Core M devices to compare to the iPad Pro and A9X SoC. Perhaps the most apple-to-apple comparison is the iPad Pro versus the 2015 MacBook; though approaching a year old, this is still Apple’s current generation MacBook, with our base model incorporating an older Broadwell-based Core M-5Y31. Also from the Broadwell generation we have an ASUS Transformer Book T300 Chi, which uses a high-end Core M-5Y71, to showcase the performance of Intel’s highest clocked Core M processors. Finally, from the latest Skylake generation we have the ASUS ZenBook UX305CA, which incorporates Intel’s base-tier Core m3-6Y30 CPU.

Finally, it should be noted that to keep testing as close as possible, all of these devices are passively cooled, and that as a result all of these devices are also TDP/heat throttling though much of the SPECint2006 benchmark. Ultimately what we’re measuring here is not the peak performance of each system, but rather its sustained performance under the TDP limitations of their respective designs. If unrestricted, undoubtedly all of these devices would score higher.

SPECint_base2006 - Estimated Scores - A9X vs. Intel Broadwell/Skylake
  A9X Core M-5Y31
(2015 MacBook)
Core M-5Y71
(Asus T300 Chi)
Core m3-6Y30
(Asus UX305CA)
A9X vs MacBook %
Base/Turbo Freq 2.26GHz 0.9/2.4GHz 1.2/2.9GHz 0.9/2.2GHz  
400.perlbench
25.0
21.7
28.5
24.4
15%
401.bzip2
17.6
14.6
19.6
15.3
21%
403.gcc
20.5
22.8
31.1
28.2
-10%
429.mcf
18.7
35.9
46.7
38
-48%
445.gobmk
23.4
16.9
23.7
18
38%
456.hmmer
25.1
43.9
61.9
48.1
-43%
458.sjeng
23.6
19.2
26.1
19.3
23%
462.libquantum
74.6
292
476
409
-74%
464.h264ref
41.3
38.4
49.7
37.3
8%
471.omnetpp
10.3
16.3
23.7
20.6
-37%

As this is a fairly dense lineup I’m not going to call out every figure, but let’s focus on a few key areas. First, on A9X versus the Core M-5Y31 (MacBook), the advantage flips between each device as each test hits upon different strengths and weaknesses of each CPU’s architecture. Overall each device wins half of the benchmarks, however the Core M powered MacBook wins by a larger average margin. In other words, the iPad Pro is competitive with the MacBook depending on the test, however on average it ends up trailing in performance.

Relative to the MacBook, the iPad Pro does best in 445.gobmk, the Go benchmark, while its largest deficit is with 462.libquantum. The latter is a particularly interesting case as the benchmark is very easy to vectorize, giving us perhaps our best look at the vector performance of Twister versus Broadwell, and how well their respective compilers can actually vectorize it. The end result has the Intel platforms solidly in the lead here, hinting that Intel still has better vector performance at this time.

Shifting gears to the Asus ZenBook UX305CA and its newer Skylake based Core m3-6Y30, to little surprise Skylake closes the gap with A9X in the benchmarks where Core M was losing, and pulls further ahead in the benchmarks where it was winning. Despite this the two systems split the number of wins at 5 each, but in the cases where the ZenBook is winning it’s very clearly winning. Overall Skylake sees some decent performance improvements relative to the Broadwell CPU in our MacBook – with the exact gains depending on the test – allowing it to widen the gap compared to the A9X. Overall A9X is still competitive in specific scenarios, but on average it definitely trails the Skylake Core m3.

Finally, going back to Broadwell we have the ASUS Transformer Book T300 Chi, which incorporates a high-end Core M-5Y71 processor. This is still officially a 4.5W TDP processor, and as a result this essentially measures Broadwell Core M’s best case performance. With a maximum CPU clockspeed of 2.9GHz as compared to the slower low-end Skylake and Broadwell CPUs, the T300 Chi unsurprisingly beats the iPad Pro in every single benchmark. At best the two are neck-and-neck with Apple’s best benchmark, 445.gobmk, but otherwise it’s a clear and very significant lead for Intel’s fastest Broadwell Core M processor.

In the end, what to take away from this depends on how you want to read the results and what you believe the most important CPU comparison is. As Apple doesn’t use multiple bins/clockspeeds of A9X processors, this muddles the comparison some since there’s a significant difference in performance between Intel’s fastest and slowest Core M processors, and at the same time Intel’s official list prices put every CPU except the top-bin Core m7-6Y75 at the same price of $281.

Ultimately I think it’s reasonable to say that Intel’s Core M processors hold a CPU performance edge over iPad Pro and the A9X SoC. Against Intel’s slowest chips A9X is competitive, but as it stands A9X can’t keep up with the faster chips. However by the same metric there’s no question that Apple is closing the gap; A9X can compete with both Broadwell and Skylake Core M processors, and that’s something Apple couldn’t claim even a generation ago. That it’s only against the likes of Core m3 means that Apple still has a way to go, particularly as A9X still loses by more than it wins, but it’s significant progress in a short period of time. And I’ll wager that it’s closer than Intel would like to be, especially if Apple puts A9X into a cheaper iPad Air in the future.

SoC Analysis: On x86 vs ARMv8 System Performance
POST A COMMENT

408 Comments

View All Comments

  • lilmoe - Friday, January 22, 2016 - link

    ok...... Reply
  • Sc0rp - Friday, January 22, 2016 - link

    Well, I have to disagree with you on one thing here. I don't think Apple has any blame here when it comes to software. iOS9 is faaaaaaaar more powerful and capable than Mac OS 8 and 9 that I used to run on my power PC's back in the late 90's. Those computers were certainly productive. There's nothing on a software level that's really stopping developers from making productive software for the iPad Pro or even the Air. There is an interface challenge, much as there was an interface challenge when GUI's first came out. As I recall, people lambasted GUI's and mouses as being toys and not for serious work back then. The endless whining over the iPad Pro is just a reverberation of that. People don't like change and they don't like things that rub against their doctrine. But, consider this... While many adults actually have some difficulty adapting to this new computing paradigm, youngsters adapt to it like a fish to water.

    I think it is a wild boast to call an iPad Pro a 'useless toy'. I certainly have made a ton of use of mine. Of course, I'm an artist so there's that. Not to mention that my iPads have been my primary communication hub for the last five years.
    Reply
  • Jumangi - Friday, January 22, 2016 - link

    iOS blows as an actual productivity system. It is made for smartphones first(Apple's cash cow) and everything else second. Put a version of Mac OSX on this and you have something. Right now this is an expensive artists toy. Reply
  • strangis - Friday, January 22, 2016 - link

    > While many adults actually have some difficulty adapting to this new computing paradigm, youngsters adapt to it like a fish to water.

    That's why I, as someone of the Commodore Vic 20 era, has to show relatives and clients 25 years younger than me how to use their phones, tablets and computers every week. Regardless of age, some people get it, some don't.

    Similarly, I've never seen the value of an iPad Pro when, as an artist), I need to finish in Photoshop or After Effects. The creative tools available on the iPad Pro are limiting for those of us used to more, and considering its price, better to buy something that will get the job done.
    Reply
  • Murloc - Saturday, January 23, 2016 - link

    I have no doubt people will only use tablets once they'll be able to interact with the interface with their brains. Reply
  • Relic74 - Saturday, February 27, 2016 - link

    Yea but at least Mac OS had a proper file-system, allowed it's users to select their own default apps, appsdidn't require API's in order to talk to the system, all applications used the same resolution, when a new feature was added to the system every app was able to utilize it immediately and didn't require it's developer to update their apps, the user was ablue to customize their desktop and even the UI, supported widgets, applications were windowed and ran desktop software. Actually, I take it back, Mac OS's UI was a lot more powerful, the system not so much, which is reversed in iOS, the UI isn't very powerful, it's actually pretty vanilla, though it's BSD underpinnings are extremely powerful. If I was able to access the BSD system, I would dump iOS's UI in a heart beat and install a X desktop environment like Gnome 3, which actually works fairly well as a tablet OS. Than maybe the iPad Pro would actually be a Pro device. I'm running Arch Linux on a Xiaomi MiPad 2, love it. Reply
  • NEDM64 - Friday, January 22, 2016 - link

    Dude!

    If you were in the 80's, you'll be advocating text user interfaces instead of graphical user interfaces.

    If you were in the 70's, you'll be advocating separate terminals connected to computers, as opposed to "all-in-ones" or "intelligent terminals" like the Apple II, Commodore PET, TRS-80.

    Opinions like yours, with due respect, don't matter, because people like you, already have their rigs in place, and aren't in the market.

    Apple's market position is for people that want the next thing, not the same ol' thing…
    Reply
  • RafaelHerschel - Saturday, January 23, 2016 - link

    Apparently the next thing is a larger iPad. I'm going to be bold and predict the next next thing. It's going to be a slightly thinner version of the larger iPad. Awesome. Reply
  • Murloc - Saturday, January 23, 2016 - link

    you aren't understanding tilmoe's posts.

    You can spend millions developing software for a superpowerful tablet.

    You will still never be able to fit Photoshop's whole interface and abundance of options and menus into the tablet in a way that the user is easily able to reach them, without scrolling through pages of big buttons.

    At the end of the day, you'll get a crippled version of photoshop and the user will have to get on a traditional computer (a WORKstation, not because it's more powerful, not because software houses invest more in it, but because it has human interaction devices and a big screen that enable humans to get work done faster) to get stuff done.

    Tablets are mostly content consumption products exactly because of the limited interfaces. They have the advantage of portability and ease of use, you just open apps while on the couch, and that's why they master content consumptions better than say laptops.
    Reply
  • Constructor - Saturday, January 23, 2016 - link

    It's by now become a quasi-religious belief system for some that "mobile devices cannot ever be used for any professional purposes whatsoever!".

    At the same time more and more people (and businesses!) don't care about such beliefs in the slightest and simple use those devices very much professionally and in many cases with more success and higher productivity than they'd had with conventional computers.

    Part of the reason is that agility and flexibility often beats feature count, all the more so since professional workflows very often just can't afford to even consider most of the myriad theoretical options some desktop programs offer. Heck, most professional uses actually don't need much more than a browser interface anyway!

    Yes, there are some uses for which desktop or mainframe computers will be the only really viable option. But what you and many others didn't seem to have noticed is that those domains have been shrinking rapidly over the last decade(s).
    Reply

Log in

Don't have an account? Sign up now