The Apple iPad Air 2 Reviewby Joshua Ho on November 7, 2014 9:30 AM EST
Apple’s A8X SoC: Bigger and Badder
Over the years Apple has gone back and forth on their SoC designs for the full size iPad. In some cases Apple will use their phone SoC – which was the case as far back as the very first iPad – and in other cases they’ll produce a new SoC just for the iPad. Neither strategy is intrinsically right or wrong, but it does mean that it’s anyone’s guess what Apple will do until they announce it.
Most recently, for the A7 generation of products Apple opted to use the A7 SoC for both the iPhone 5S and iPad Air 1. This ended up being the first time in a couple of generations that Apple didn’t mint an iPad-only SoC, and while we’ll gladly take more power, overall this seemed to work out for Apple. The iPad Air was among the most powerful tablets of 2013 (and holds up well in 2014 as well), showing that even in this highly competitive landscape Apple doesn’t necessarily need to build a dedicated tablet SoC to deliver top-notch performance.
Apple's A8X SoC (Image Courtesy iFixit)
Nonetheless, in keeping with their unpredictable nature for 2014 Apple has once more changed their course and gone back to building a tablet SoC for the iPad Air 2. Named A8X, like Apple’s past tablet SoCs this latest SoC is designed to be a bigger and badder version of Apple’s A8 smartphone SoC, taking the A8 design and building it larger for better performance.
A8X’s design is something we’ve spent quite some time mulling over, and while we haven’t found every answer we’d like to have, at this point we have a solid idea of what Apple has been up to. Unfortunately the chip disassembly and analysis experts Chipworks have not released a die shot for A8X, so we aren’t going to be able to do visual identification of the chip, but there are still quite a few aspects we can uncover from Apple’s published statements and from benchmarking.
|Apple SoC Comparison|
|CPU||3x "Enhanced Cyclone"||2x "Enhanced Cyclone"||2x Cyclone||2x Swift|
|CPU Clockspeed||1.5GHz||1.4GHz||1.4GHz (iPad)||1.3GHz|
|GPU||Apple/PVR GXA6850||PVR GX6450||PVR G6430||PVR SGX554 MP4|
|Memory Bus Width||128-bit||64-bit||64-bit||128-bit|
|Manufacturing Process||TSMC(?) 20nm||TSMC 20nm||Samsung 28nm||Samsung 32nm|
First and foremost, A8X is quite large. The lack of a Chipworks die shot means that we can’t pin down an exact die size, but the math behind the numbers doesn’t leave too much wiggle room. Apple has stated that A8X features 3 billion transistors, versus roughly 2 billion transistors in the 89mm2 A8. Die size is a far more complex subject than just doing a linear extrapolation of transistor count – SRAM and logic have different densities, and even then different logics will pack better or worse than others – but we expect that A8X’s die size will be somewhere north of A6X’s 123mm2. Anything approaching A5X’s 165mm2 is very unlikely, especially for a 20nm product this early, but this still means that A8X is almost certainly Apple’s second largest SoC to date.
The fact that Apple used A7 in the previous iPad Air means that on top of already being a serious step up in transistor count versus A8, compared to the iPad Air 1 the gap is even larger. A7 occupied 102mm2 and more than 1 billion transistors, so compared to Apple’s previous tablet Apple has come very close to doubling their transistor count within 1 generation. We’ll see the full impact of these changes later on, but it goes without saying that Apple has aimed for a much larger boost in performance between iPad generations this year than they did iPhone generations (and the iPhone 6 was no slouch).
CPU: 3x Enhanced Cyclone
Perhaps the biggest question since Apple’s keynote has been what Apple has spent those extra billion transistors on. While the GPU picture was relatively clear from the start, we’ve known that a larger GPU alone could not explain such a large transistor increase and that there must be more going on. Sure enough, for the first time since going with a dual-core design with the A5 in 2011, Apple has added another CPU core to their design with A8X.
Previous X SoCs have always focused on graphics and memory, so the addition of another CPU was unexpected. “Enhanced Cyclone” is still at the top of its class for both IPC and overall single-threaded performance, and for the last couple of years now this has been Apple’s strongest hand in the CPU competitive landscape. Even when dual-core Apple SoCs have fallen behind in multi-threaded tests it has rarely been by much, so with a 1.5GHz clockspeed even a pair of CPU cores would offer quite a bit of performance.
|Geekbench 3 Scores|
|iPad Air 2||1798||4468|
The trick with CPU cores – and why this was such an unexpected change – is that they’re only as good as your ability to use them. The move to dual core CPUs made a great deal of sense even early on, because while most software was (and still is) single threaded, a second core allows background operating system activities to take place without infringing on the performance of the active application. With the OS itself almost never needing more than a core’s worth of performance (and almost always much less), a third core is solidly aimed at app developers and giving them more resources for multi-threaded apps.
As far as built-in apps go, right off the bat Safari should be able to put a third core to good use. Otherwise the benefits will be on a case-by-case basis. Most applications are not multi-threaded or are not able to balance their workloads over multiple cores very well, so outside of Safari the biggest gains are likely to be found in video games and to a lesser extent anything doing heavy image manipulation. The fact that Apple just launched the Metal API should greatly help in this respect, as the low-level nature of the API makes dispatching GPU work from multiple CPU threads far more effective than it was under OpenGL ES. I don’t expect the third core was added just for Metal or game developers, but certainly they stand to be one of the big winners initially.
Apple’s own performance estimate for the A8X CPU is 40% faster than the A7 CPU. This is less than the theoretical gains from the additional core (never mind architectural and clockspeed improvements) and looking at the big picture that’s a fair estimate of the type of performance gains to expect. In lightly threaded workloads this is going to be lower, and in heavily threaded workloads we’re potentially looking at quite a bit more.
Ultimately the inclusion of a third core signals that Apple has reached a point where they’re dissatisfied with the overall performance of two CPU cores and is ready to move on to more. Because not every task will benefit from multiple cores Apple still needs to improve on single-threaded performance every generation – a task that will become harder and harder. But with Apple already winning the single-threaded performance race, the relatively small size of an additional core means they can afford to go a bit wider without blowing their die size or power budgets, and this appears to be exactly what they’ve done. Apple has until now resisted in going with large numbers of CPU cores – unlike virtually every other player in the ARM SoC space – and while I don’t think this is vindication for vendors that have tried to push so many (weaker) cores so soon, it does show that a larger core count has its place; that after you’ve gone deep you can still go wide. I suspect in due time we’re going to see the iPhone go the same route – though probably not in A9 – and for the time being a third core, like a larger memory bus, will be a further advantage reserved for the iPad.
GPU: Apple-Modified Imagination PowerVR "GXA6850"
Update: This article has been changed since initial publication. Please see Apple A8X’s GPU - GXA6850, Even Better Than I Thought
Unlike Apple’s CPU choice, Apple’s GPU choice is far more conventional. Ever since Apple went Retina with the iPad 3 and needed to drive more pixels the company has outfitted the X class SoCs with faster GPUs, and A8X is no exception.
With 8 clearly visible GPU clusters on the A8X die shot, it appears that Apple has taken the GX6450 design from A8 and created a new design from it, culminating in an 8 cluster Series6XT design. Officially this design has no public designation – while it’s based on an Imagination design it is not an official Imagination design, and of course Apple doesn’t reveal codenames – but for the sake of simplicity we are calling it the GXA6850.
Compared to the A7 SoC in the iPad Air 1, Apple is touting A8X as having 2.5x the GPU performance as A7. This comes from a combination of the larger GPU configuration, the higher efficiency of the PowerVR Series 6XT architecture, and what we expect is a mild increase in the GPU clockspeed. Getting exactly 2.5x is going to be very situational since it relies on exploiting those aforementioned architectural improvements, but even in cases where that cannot happen the GXA6850 in A8X should be much faster than A7 or even A8.
The only real unexpected part of this is that Apple went with a larger eight-cluster configuration despite retaining the same 2048x1536 resolution display. With the iPad 3 and 4, a larger GPU made perfect sense since the iPad had such a higher resolution than the iPhone. Now in the last year Apple has shown that a four-cluster GPU configuration is capable of driving this display, and with the resolution bump on the iPhone 6 the iPad no longer has the large resolution advantage it once did.
Ultimately from a performance standpoint an eight-cluster configuration is greatly appreciated, but comparing iPad Air 2 to iPhone 6 Plus, the iPad Air 2 is nowhere near twice as many pixels as the iPhone 6 Plus. So the iPad Air 2 is “overweight” on GPU performance on a per-pixel basis versus its closest phone counterpart, offering roughly 30% better performance per pixel. Apple certainly has gaming ambitions with the iPad Air 2, and this will definitely help with that.
Memory: 2GB LPDDR3, 128-bit Memory Bus
The third piece of the A8X puzzle is the memory configuration. In keeping with Apple’s traditional designs for X class SoCs and the iPad in general, Apple has moved to off-package memory on the iPad Air 2. In place of A8’s POP 1GB of LPDDR3, the iPad Air 2 takes on two LPDDR3 modules, for a total of 2GB of RAM. This is the first time Apple has shipped an iOS device with 2GB of RAM – having unexpectedly stuck to 1GB for the iPhone – which should go a long way towards alleviating any memory pressure from the combination of the Retina display and 64bit applications.
A8X w/2GB RAM (Image Courtesy iFixit)
iFixit’s teardown has revealed a pair of Elpida modules, though like the iPhone Apple is most likely multi-sourcing here. These modules are rated for the same LPDDR3-1600 speeds as those found on the POP RAM on A8, so Apple hasn’t touched the clockspeed here at all, instead they have merely doubled up on the RAM and widened the memory bus to match.
Given that Apple is now using a six-cluster PowerVR GPU configuration, the return of a 128-bit memory bus was to be expected. Even feeding a four-cluster configuration is no doubt memory bandwidth limited at times, and a eight-cluster configuration would only make that much worse. The good news here is that by doubling their available memory bandwidth Apple should have quite a bit of memory bandwidth to play with – 25.6GB/sec to be precise – which admittedly still isn’t a lot by PC or console standards, but it’s huge for an SoC. More importantly, this keeps up with the doubled GPU performance, as the GXA6850 should have no problem consuming that much bandwidth. In the end the 128-bit memory bus has long been one of the traits of the X series SoCs, and like in previous iPads it should deliver quite a bit of performance for Apple.
Meanwhile as this is the first X class SoC from Apple since the introduction of the A7, it’s interesting to note that the additional memory bandwidth from the 128-bit memory bus is available to the CPU as well as the GPU. On A6X and A5X the CPUs cores were not able to access the full memory bandwidth as halfof it was reserved for the GPU. However A7 introduced the 4MB L3 cache, which is shared by the GPU and the CPU, which means bandwidth segregation is no longer practical. As a result our bandwidth numbers come close to the theoretical bandwidth available to A8X, though it looks like a single core would have a hard time consuming quite that much bandwidth.
Speaking of the L3 cache, we can confirm that it’s unchanged for A8X. It remains at 4MB, just as it was for A8 and A7. Though as we can see from our latency test, we don't start hitting the 4MB of L3 cache for another 1MB, as the larger L2 cache is able to hold our transfers for longer.
Wrapping things up, the increases in the memory bus, GPU cluster count, and CPU core count should account for almost all of A8X’s roughly 1 billion transistor increase. Adding so much GPU and CPU horsepower doesn't come cheap from a die space perspective, but it makes for an extremely potent processor in A8X.