SoC Analysis: CPU Performance

Now that we’ve had a chance to take a look at A9X’s design and a bit on the difference between the x86 and ARM ISAs, let’s take a look at A9X’s performance at a lower level.

From a CPU perspective A9X is just a higher clocked implementation of the dual-core Twister CPU design we first saw on A9 last year. As a result the fundamentals of the CPU architecture have not changed relative to A9. However A9X relative to A8X drops down from three CPU cores to two, so among the factors we’ll want to look at is how Apple has been impacted by dropping down to two faster cores.

We’ll start things off with Geekbench, 3, which gives us a fairly low-level look at CPU performance.

Geekbench 3 - Integer Performance
  A9X A8X % Advantage
AES ST
1.17 GB/s
0.98 GB/s
19%
AES MT
2.85 GB/s
3.16 GB/s
-10%
Twofish ST
120.7 MB/s
64.0 MB/s
89%
Twofish MT
228.3 MB/s
182.7 MB/s
25%
SHA1 ST
1.03 GB/s
0.53 GB/s
94%
SHA1 MT
1.95 GB/s
1.48 GB/s
32%
SHA2 ST
205.8 MB/s
119.1 MB/s
73%
SHA2 MT
395.5 MB/s
330.6 MB/s
20%
BZip2Comp ST
8.95 MB/s
5.71 MB/s
57%
BZip2Comp MT
17.0 MB/s
16.6 MB/s
2%
Bzip2Decomp ST
14.7 MB/s
8.98 MB/s
64%
Bzip2Decomp MT
28.1 MB/s
25.2 MB/s
12%
JPG Comp ST
33.7 MP/s
20.6 MP/s
64%
JPG Comp MT
64.4 MP/s
60.8 MP/s
6%
JPG Decomp ST
89.2 MP/s
53.0 MP/s
68%
JPG Decomp MT
166.5 MP/s
153.9 MP/s
8%
PNG Comp ST
2.11 MP/s
1.35 MP/s
56%
PNG Comp MT
4.04 MP/s
3.82 MP/s
6%
PNG Decomp ST
31.5 MP/s
18.7 MP/s
68%
PNG Decomp MT
56.9 MP/s
56.3 MP/s
1%
Sobel ST
138.3 MP/s
82.5 MP/s
68%
Sobel MT
258.7 MP/s
225.6 MP/s
15%
Lua ST
3.25 MB/s
1.68 MB/s
93%
Lua MT
6.02 MB/s
4.60 MB/s
31%
Dijkstra ST
10.1 Mpairs/s
6.70 Mpairs/s
51%
Dijkstra MT
17.6 Mpairs/s
16.0 Mpairs/s
10%

The interesting thing about Geekbench is that as a result of being a lower-level test the bulk of its tests scale up well with CPU core counts, as the benchmark can just spawn more threads. Consequently I wasn’t entirely sure what to expect here, as this presents the tri-core A8X with a much better than average scaling opportunity, making it especially harsh on the A9X.

But what the results show us is that even by dropping back down to two CPU cores, A9X does very well overall. The single-threaded results are greatly improved, with A9X offering better than a 50% single-threaded perf gain in the majority of the sub-tests. Meanwhile even with the multi-threaded tests, A9X only loses once, on AES. Otherwise two higher clocked Twister cores are beating three lower clocked Typhoon cores by anywhere between a few percent up to 32%. In this sense Geekbench is something of a worst-case scenario, as real-world software rarely benefits from additional cores this well (this being part of the reason why A8 and A9 did so well relative to quad Cortex-A57 designs), so it’s promising to see that even in this worst-case scenario A9X can deliver meaningful performance gains over A8X.

Geekbench 3 - Floating Point Performance
  A9X A8X % Advantage
BlackScholes ST
14.9 Mnodes/s
8.52 Mnodes/s
75%
BlackScholes MT
28.2 Mnodes/s
24.9 Mnodes/s
13%
Mandelbrot ST
2.23 GFLOPS
1.27 GFLOPS
76%
Mandelbrot MT
4.27 GFLOPS
3.66 GFLOPS
17%
Sharpen Filter ST
2.10 GFLOPS
1.08 GFLOPS
94%
Sharpen Filter MT
4.01 GFLOPS
3.12 GFLOPS
29%
Blur Filter ST
2.68 GFLOPS
1.53 GFLOPS
75%
Blur Filter MT
5.08 GFLOPS
4.47 GFLOPS
14%
SGEMM ST
6.77 GFLOPS
4.12 GFLOPS
64%
SGEMM MT
12.7 GFLOPS
11.6 GFLOPS
9%
DGEMM ST
3.32 GFLOPS
2.02 GFLOPS
64%
DGEMM MT
6.21 GFLOPS
5.61 GFLOPS
11%
SFFT ST
3.52 GFLOPS
1.92 GFLOPS
83%
SFFT MT
6.67 GFLOPS
5.40 GFLOPS
24%
DFFT ST
3.21 GFLOPS
1.80 GFLOPS
78%
DFFT MT
6.02 GFLOPS
5.11 GFLOPS
18%
N-Body ST
1.41 Mpairs/s
0.78 Mpairs/s
81%
N-Body MT
2.69 Mpairs/s
2.34 Mpairs/s
15%
Ray Trace ST
4.99 MP/s
2.96 MP/s
69%
Ray Trace MT
9.56 MP/s
8.64 MP/s
11%

The story with Geekbench 3 floating point performance is much the same. Performance never regresses, even in multi-threaded workloads. In lightly threaded floating point workloads A9X is going to walk all over A8X, and in multi-threaded workloads we’re still looking at anywhere between a 9% and a 29% performance gain. This goes to show just how powerful Twister is relative to Typhoon, especially with A9X’s much higher clockspeeds factored in. And it lends a lot of support to Apple’s ongoing design philosophy of favoring a smaller number of high performance (and now higher-clocked) cores.

SPEC CPU 2006

Moving on, our other lower-level benchmark for this review is SPECint2006. Developed by the Standard Performance Evaluation Corporation, SPECint2006 is the integer component of their larger SPEC CPU 2006 benchmark. As was the case with SPEC CPU 2000 before it, SPEC CPU 2006 is designed by a committee of technology firms to offer a consistent and meaningful cross-platform benchmark that can compare systems of different performance levels and architectures. Among cross-platform benchmarks SPEC CPU is generally held in high regard, and while it is but one collection of benchmarks and like all benchmarks should not be taken as the be-all end-all of benchmarks on its own, it provides us with a very important look at CPU performance that we otherwise cannot get.

SPECint2006 is the successor to the SPECint2000 test we’ve been using periodically for the last couple of years now. Initially released in 2006, SPECint2006 is still SPEC’s current-generation CPU integer benchmark. We’ve wanted to switch to SPECint2006 for some time now, but have been held back by the overall low performance of tablet SoCs, which lacked the speed and memory to run SPECint2006 and to do so in a reasonable amount of time. However now thanks to the greater performance and greater memory of A9X, we’re finally able to run SPEC’s current-generation CPU benchmark on a tablet.

SPECint2006 is composed of 12 sub-benchmarks, testing a wide variety of scenarios from video compression to PERL execution to AI. This is a non-graphical benchmark and I believe it’s reasonable to argue that the benchmark set itself leans towards server high performance computing/workstation use cases, but with that said even if it’s not a perfect fit for tablet use cases it offers a lot of real-world tests that give us a good variety of different workloads to benchmark CPUs with. SPECint2006 scores are in turn reported as a ratio, measuring how many times faster a tested system is against the SPEC reference system, a 1997 Sun Ultrasparc Ultra Enterprise 2 server, which is based around a 296 MHz UltraSPARC II CPU.

CINT2006 (Integer Component of SPEC CPU2006):
Benchmark Language Application Area Description
400.perlbench
Programming Language  Derived from Perl V5.8.7. The workload includes SpamAssassin, MHonArc (an email indexer), and specdiff (SPEC's tool that checks benchmark outputs).
401.bzip2
Compression  Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O.
403.gcc
C Compiler  Based on gcc Version 3.2, generates code for Opteron.
429.mcf
Combinatorial Optimization  Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport.
445.gobmk
Artificial Intelligence: Go  Plays the game of Go, a simply described but deeply complex game.
456.hmmer
Search Gene Sequence  Protein sequence analysis using profile hidden Markov models (profile HMMs)
458.sjeng
Artificial Intelligence: chess  A highly-ranked chess program that also plays several chess variants.
462.libquantum
C
Physics / Quantum Computing Simulates a quantum computer, running Shor's polynomial-time factorization algorithm.
464.h264ref
Video Compression  A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2
471.omnetpp
C++ 
Discrete Event Simulation  Uses the OMNet++ discrete event simulator to model a large Ethernet campus network.
473.astar
C++ 
Path-finding Algorithms  Pathfinding library for 2D maps, including the well known A* algorithm.
483.xalancbmk
C++ 
XML Processing  A modified version of Xalan-C++, which transforms XML documents to other document types.

Although designed as a CPU-intensive benchmark, it’s important to note that SPECint2006 is officially labeled as “stressing a system's processor, memory subsystem and compiler.” The memory subsystem aspect is fairly self-explanatory – it’s difficult to test a CPU without testing the memory as well except in the cases of trivial workloads that can fit in a CPU’s caches – however the compiler aspect calls for special attention. As SPECint2006 is a cross-platform benchmark in the truest sense of the word, it’s impossible to offer a single binary for all platforms – especially platforms that had yet to be designed in 2006 such as ARMv8 – and, simply put, the moment you begin compiling benchmarks for different systems using different compilers, the performance of the compiler becomes a factor of benchmark performance as well.

As a result, and unlike many of the other benchmarks we run here, it’s important to note that compilers play a big part in SPECint2006 performance, and this is by design. Compiler authors can and do optimize for SPEC CPU, with the ultimate goal of giving the tested CPU the best chance to achieve the best possible performance in this benchmark; the compiler should not hold back the CPU. However in turn, all results must be validated, so overly aggressive compilers that generate bad code will be caught and failed. The end result is that in a cross-platform scenario with different binaries, SPECint2006 isn’t quite as apples-to-apples as our more traditional benchmarks, but it offers us a unique look at cross-platform CPU performance.

For our testing we’re using optimized binaries generated for Apple’s A8X/A9X SoCs and Intel’s Broadwell/Skylake processors respectively. The following compiler flags were used.

Apple ARMv8: XCode 7 (LLVM), -Ofast

Intel x86: Intel C++ Compiler 16, -xCORE-AVX2 -ipo -mdynamic-no-pic -O3 -no-prec-div -fp-model fast=2 -m32 -opt-prefetch -ansi-alias -stdlib=libstdc++

Finally, of SPECint2006’s 12 sub-benchmarks, our current harness is only able to run 10 of them on the iPad Pro at this time, as 473.astar and 483.xalancbmk are failing on the iPad. So the following is not a complete run of SPECint2006, and for the purposes of SPEC CPU are officially classified as performance estimates.

To start things off, let’s look at the Apple-to-Apple comparison, pitting A9X against A8X.

SPECint_base2006 - Estimated Scores - A9X vs. A8X
  A9X A8X A9X vs. A8X %
400.perlbench
25.0
14.1
78%
401.bzip2
17.6
11.5
54%
403.gcc
20.5
12.4
65%
429.mcf
18.7
N/A
N/A
445.gobmk
23.4
13.0
80%
456.hmmer
25.1
14.1
79%
458.sjeng
23.6
13.6
73%
462.libquantum
74.6
49.2
52%
464.h264ref
41.3
24.0
72%
471.omnetpp
10.3
8.0
29%

Unsurprisingly, A9X is leaps and bounds ahead here. The smallest gain is with 471.omnetpp, a discrete event simulator, where A9X holds a 29% lead. Otherwise A9X takes a significant lead, beating A8X by upwards of 80% in 445.gobmk, a Go (board game) AI benchmark.

Calling back to our iPhone 6s review for a moment, A9X has a much larger advantage vs. A8X with SPECint2006 as compared to A9 vs. A8 on SPECint2000. A good deal of this has to do with A9X’s significant clockspeed bump versus A8X, but at the same time this also illustrates how the newer SPECint2006 rates A9X and Twister even more highly than A8X/Typhoon. As we’ve seen time and time again, Twister is a much faster CPU core than the already fast Typhoon, and this is a big part of why Apple continues to top our ARM benchmarks.

Last but certainly not least however is our main event, A9X versus Intel’s Core M CPUs. As we’re finally able to run SPECint2006 on an Apple SoC, this is the first chance we’ve had to compare Apple and Intel CPUs using SPEC, so it’s exciting to finally be able to make this comparison.

At the same time this comparison not just for academic curiosity; as Apple has significantly improved their CPU design with every generation and has quickly moved to newer manufacturing processes, they have been closing the architecture and manufacturing gap with Intel. Twister and Skylake are fairly similar designs, both implementing a wide execution pipeline with a focus on achieving a high IPC, and in this latest generation of devices, coupling that with a fairly high 2GHz+ clockspeed. Over the years Apple and Intel have approached this problem from different angles – Apple built up from phones to tablets while Intel built down from desktops to tablets – but the end result is that the two have ended up in a similar place in terms of basic architecture design goals. Meanwhile from a manufacturing standpoint Intel is arguably still roughly a generation ahead with their 14nm FinFET process – naming aside, their transistors are smaller than TSMC’s 16nm FinFET – so Apple is the underdog from this point of view.

The burning question is of course is whether Apple’s CPU designs are catching up to the performance of Intel’s Core lineup, thanks to the continual iteration of architecture and manufacturing on the Apple side, versus the slower rate of growth we’ve seen over the last few generations with Intel’s Core lineup. The iPad Pro in turn finally gives us the opportunity to try to answer that question, as the faster SoC coupled with a form factor and TDP closer to regular Core M devices gives us the most apples-to-apples comparison yet.

To that end we have assembled a smorgasbord of Core M devices to compare to the iPad Pro and A9X SoC. Perhaps the most apple-to-apple comparison is the iPad Pro versus the 2015 MacBook; though approaching a year old, this is still Apple’s current generation MacBook, with our base model incorporating an older Broadwell-based Core M-5Y31. Also from the Broadwell generation we have an ASUS Transformer Book T300 Chi, which uses a high-end Core M-5Y71, to showcase the performance of Intel’s highest clocked Core M processors. Finally, from the latest Skylake generation we have the ASUS ZenBook UX305CA, which incorporates Intel’s base-tier Core m3-6Y30 CPU.

Finally, it should be noted that to keep testing as close as possible, all of these devices are passively cooled, and that as a result all of these devices are also TDP/heat throttling though much of the SPECint2006 benchmark. Ultimately what we’re measuring here is not the peak performance of each system, but rather its sustained performance under the TDP limitations of their respective designs. If unrestricted, undoubtedly all of these devices would score higher.

SPECint_base2006 - Estimated Scores - A9X vs. Intel Broadwell/Skylake
  A9X Core M-5Y31
(2015 MacBook)
Core M-5Y71
(Asus T300 Chi)
Core m3-6Y30
(Asus UX305CA)
A9X vs MacBook %
Base/Turbo Freq 2.26GHz 0.9/2.4GHz 1.2/2.9GHz 0.9/2.2GHz  
400.perlbench
25.0
21.7
28.5
24.4
15%
401.bzip2
17.6
14.6
19.6
15.3
21%
403.gcc
20.5
22.8
31.1
28.2
-10%
429.mcf
18.7
35.9
46.7
38
-48%
445.gobmk
23.4
16.9
23.7
18
38%
456.hmmer
25.1
43.9
61.9
48.1
-43%
458.sjeng
23.6
19.2
26.1
19.3
23%
462.libquantum
74.6
292
476
409
-74%
464.h264ref
41.3
38.4
49.7
37.3
8%
471.omnetpp
10.3
16.3
23.7
20.6
-37%

As this is a fairly dense lineup I’m not going to call out every figure, but let’s focus on a few key areas. First, on A9X versus the Core M-5Y31 (MacBook), the advantage flips between each device as each test hits upon different strengths and weaknesses of each CPU’s architecture. Overall each device wins half of the benchmarks, however the Core M powered MacBook wins by a larger average margin. In other words, the iPad Pro is competitive with the MacBook depending on the test, however on average it ends up trailing in performance.

Relative to the MacBook, the iPad Pro does best in 445.gobmk, the Go benchmark, while its largest deficit is with 462.libquantum. The latter is a particularly interesting case as the benchmark is very easy to vectorize, giving us perhaps our best look at the vector performance of Twister versus Broadwell, and how well their respective compilers can actually vectorize it. The end result has the Intel platforms solidly in the lead here, hinting that Intel still has better vector performance at this time.

Shifting gears to the Asus ZenBook UX305CA and its newer Skylake based Core m3-6Y30, to little surprise Skylake closes the gap with A9X in the benchmarks where Core M was losing, and pulls further ahead in the benchmarks where it was winning. Despite this the two systems split the number of wins at 5 each, but in the cases where the ZenBook is winning it’s very clearly winning. Overall Skylake sees some decent performance improvements relative to the Broadwell CPU in our MacBook – with the exact gains depending on the test – allowing it to widen the gap compared to the A9X. Overall A9X is still competitive in specific scenarios, but on average it definitely trails the Skylake Core m3.

Finally, going back to Broadwell we have the ASUS Transformer Book T300 Chi, which incorporates a high-end Core M-5Y71 processor. This is still officially a 4.5W TDP processor, and as a result this essentially measures Broadwell Core M’s best case performance. With a maximum CPU clockspeed of 2.9GHz as compared to the slower low-end Skylake and Broadwell CPUs, the T300 Chi unsurprisingly beats the iPad Pro in every single benchmark. At best the two are neck-and-neck with Apple’s best benchmark, 445.gobmk, but otherwise it’s a clear and very significant lead for Intel’s fastest Broadwell Core M processor.

In the end, what to take away from this depends on how you want to read the results and what you believe the most important CPU comparison is. As Apple doesn’t use multiple bins/clockspeeds of A9X processors, this muddles the comparison some since there’s a significant difference in performance between Intel’s fastest and slowest Core M processors, and at the same time Intel’s official list prices put every CPU except the top-bin Core m7-6Y75 at the same price of $281.

Ultimately I think it’s reasonable to say that Intel’s Core M processors hold a CPU performance edge over iPad Pro and the A9X SoC. Against Intel’s slowest chips A9X is competitive, but as it stands A9X can’t keep up with the faster chips. However by the same metric there’s no question that Apple is closing the gap; A9X can compete with both Broadwell and Skylake Core M processors, and that’s something Apple couldn’t claim even a generation ago. That it’s only against the likes of Core m3 means that Apple still has a way to go, particularly as A9X still loses by more than it wins, but it’s significant progress in a short period of time. And I’ll wager that it’s closer than Intel would like to be, especially if Apple puts A9X into a cheaper iPad Air in the future.

SoC Analysis: On x86 vs ARMv8 System Performance
Comments Locked

408 Comments

View All Comments

  • ddriver - Sunday, January 24, 2016 - link

    Bing is a professional application for every professional lamer. To the latter, the ipad "pro" is a professional product too.
  • ddriver - Sunday, January 24, 2016 - link

    LOL At most 2 or 3 of those could qualify for "professional" if one is inclined to be generous with the labels.

    Professional applications - photoshop, 3d max, maya, solidworks, coreldraw, indesign, visual studio, cubase, pro tools, after effects, fusion, z-brush, and so on.
  • 10101010 - Friday, January 22, 2016 - link

    Yeah, I'm sure that's why the combined "hammer + screwdriver" tool market is just booming.
  • ddriver - Friday, January 22, 2016 - link

    Yeah, I am sure making good analogies is not your strong point.

    A more appropriate analogy would be those screwdriver kits with a single handle and interchangeable tips, saving you the effort to carry around 20 different screwdrivers, and those kits are GREAT ;)

    But we aren't talking just any hardware here, we are talking computers, and general purpose at that, this is not the case of some special purpose hardware. This is a general purpose computer, and what it does is defined entirely by its software. Absent any software, it is just a paper weight, or a serving tray, absent professional software it is just a toy, intended to milk people out of their money.
  • lmcd - Tuesday, February 9, 2016 - link

    I mean, a lot of the times they are bought in bundles ;)
  • abazigal - Friday, January 22, 2016 - link

    Possibly because there isn't a hybrid that is as good as a dedicated laptop and a dedicated tablet. You are essentially trading one set of compromises for another, and people's mileage will vary.
  • ddriver - Friday, January 22, 2016 - link

    So a "hybrid" being 10% heavier and 10% thicker than a tablet, and 10% slower than a laptop justifies buying and carrying a tablet and a laptop instead of a hybrid?

    Obviously, a hybrid will be a little slower than a laptop and a little heavier than a tablet, but in many cases that is not detrimental. People should have the option to use their devices to the full extent of their capabilities, and whoever needs the extra horsepower will buy a laptop or even a desktop system instead.

    I really don't understand how come people have such a big problem with maximizing a device capability and productivity? IN what way will the availability of professional software for iOS hurt you?
  • 10101010 - Saturday, January 23, 2016 - link

    I just don't see a "hybrid" being defined primarily by size, weight, or speed. If we look at a hybrid such as the "Surface Pro", it is defined mostly by its Windows 10 operating system. This is an insecure loaded-with-spyware-at-the-factory desktop OS that pretends to be a tablet OS, laptop OS, server OS, phone OS, etc. There are really no great Windows apps made specifically for a tablet (although a few work nicely with a pen/stylus). So at the end of the day what is a Surface Pro "hybrid" really? It is a desktop OS and a keyboardless laptop. It's marketed as "best of both" but really it is a Frankenstein computer made of parts that Microsoft sawed off other products.

    Contrast Microsoft's Frankenstein with the iPad Pro -- a tablet built to be a tablet that runs what is widely regarded as the most stable, secure, and highest quality mobile OS. And delivers the closest thing yet to "paper and pencil" functionality to the market. Your point about the professional software is right on. As the apps evolve for the iPad Pro and more professional apps become available, it will only expand what an iPad Pro can be used for, opening the tablet up to being useful for more customers.
  • ddriver - Saturday, January 23, 2016 - link

    I am sure iOS is spying on users as much as Windows 10, after all, M$ was largely inspired by Apple in this regard. And unlike W10, you can't really disable it in iOS.

    Unfortunately, the lack of professional applications, whose UI is usable on a tablet is true, be those windows, android or ios tablets. I do acknowledge that the only reason windows tablets have the upper hand is they can run the good old legacy professional software, which is a pain in the ass to use without a mouse and keyboard.

    It would seem that the industry is rather unimaginative, they keep releasing new versions of their professional products, but don't adopt a better paradigm for user interaction, one that would work equally well on a traditional desktop PC and a tablet. Software giants are just as lazy and unimaginative as hardware giants.

    And it is not like it is impossible, it is well within the realm of possibility to adapt the UI for wider device usage without impairing productivity, if anything, a more clever design will make application interaction easier, a lot of the professional app UIs are a pain to work with, even with a mouse, and practically impossible to use with a touch device.

    One of the projects I am currently working on is a graphical programming language / IDE, capable of producing commercial grade software, and it is equally useful on a desktop with mouse and keyboard and on a tablet or even on a phone with touch. It is 2-3 months away from public release, unfortunately due to apple's policies, I will not be publishing to their store, since they don't really allow the degree of freedom an application development tool requires. It will still be available for jail broken apple hardware.
  • Constructor - Saturday, January 23, 2016 - link

    I am sure iOS is spying on users as much as Windows 10, after all, M$ was largely inspired by Apple in this regard. And unlike W10, you can't really disable it in iOS.

    That is just nonsense. Apple is very careful about looking at user data, and in fact they credibly follow the tenet "the less of your information we look at, the better!".

    That is not how Microsoft is proceeding with Windows 10 – there they seem to go more the Google route.

Log in

Don't have an account? Sign up now