A7 SoC Explained

I’m still surprised by the amount of confusion around Apple’s CPU cores, so that’s where I’ll start. I’ve already outlined how ARM’s business model works, but in short there are two basic types of licenses ARM will bestow upon its partners: processor and architecture. The former involves implementing an ARM designed CPU core, while the latter is the creation of an ARM ISA (Instruction Set Architecture) compatible CPU core.

NVIDIA and Samsung, up to this point, have gone the processor license route. They take ARM designed cores (e.g. Cortex A9, Cortex A15, Cortex A7) and integrate them into custom SoCs. In NVIDIA’s case the CPU cores are paired with NVIDIA’s own GPU, while Samsung licenses GPU designs from ARM and Imagination Technologies. Apple previously leveraged its ARM processor license as well. Until last year’s A6 SoC, all Apple SoCs leveraged CPU cores designed by and licensed from ARM.

With the A6 SoC however, Apple joined the ranks of Qualcomm with leveraging an ARM architecture license. At the heart of the A6 were a pair of Apple designed CPU cores that implemented the ARMv7-A ISA. I came to know these cores by their leaked codename: Swift.

At its introduction, Swift proved to be one of the best designs on the market. An excellent combination of performance and power consumption, the Swift based A6 SoC improved power efficiency over the previous Cortex A9 based design. Swift also proved to be competitive with the best from Qualcomm at the time. Since then however, Qualcomm has released two evolutions of its CPU core (Krait 300 and Krait 400), and pretty much regained performance leadership over Apple. Being on a yearly release cadence, this is Apple’s only attempt to take back the crown for the next 12 months.

Following tradition, Apple replaces its A6 SoC with a new generation: A7.

With only a week to test battery life, performance, wireless and cameras on two phones, in addition to actually using them as intended, there wasn’t a ton of time to go ridiculously deep into the new SoC’s architecture. Here’s what I’ve been able to piece together thus far.

First off, based on conversations with as many people in the know as possible, as well as just making an educated guess, it’s probably pretty safe to say that the A7 SoC is built on Samsung’s 28nm HK+MG process. It’s too early for 20nm at reasonable yields, and Apple isn’t ready to move some (not all) of its operations to TSMC.

The jump from 32nm to 28nm results in peak theoretical scaling of 76.5% (the same design on 28nm can be no smaller than 76.5% of the die area at 32nm). In reality, nothing ever scales perfectly so we’re probably talking about 80 - 85% tops. Either way that’s a good amount of room for new features.

At its launch event Apple officially announced both die size for the A7 (102mm^2) as well as transistor count (over 1 billion). Don’t underestimate the magnitude of both of these disclosures. The technical folks at Cupertino are clearly winning some battle to talk more about their designs and not less. We’re not yet at the point where I’m getting pretty diagrams and a deep dive, but it’s clear that Apple is beginning to open up more (and it’s awesome).

Apple has never previously disclosed transistor count. I also don’t know if this “over 1 billion” figure is based on a schematic or layout transistor count. The only additional detail I have is that Apple is claiming a near doubling of transistors compared to the A6. Looking at die sizes and taking into account scaling from the process node shift, there’s clearly a more fundamental change to the chip’s design. It is possible to optimize a design (and transistors) for area, which seems to be what has happened here.

The CPU cores are, once again, a custom design by Apple. These aren’t Cortex A57 derivatives (still too early for that), but rather some evolution of Apple’s own Swift architecture. I’ll dive into specifics of what I’ve been able to find in a moment. To answer the first question on everyone’s mind, I believe there are two of these cores on the A7. Before I explain how I arrived at this conclusion, let’s first talk about cores and clock speeds.

I always thought the transition from 2 to 4 cores happened quicker in mobile than I had expected. Thankfully there are some well threaded apps that have been able to take advantage of more than two cores and power gating keeps the negative impact of the additional cores down to a minimum. As we saw in our Moto X review however, two faster cores are still better for most uses than four cores running at lower frequencies. NVIDIA forced everyone’s hand in moving to 4 cores earlier than they would’ve liked, and now you pretty much can’t get away with shipping anything less than that in an Android handset. Even Motorola felt necessary to obfuscate core count with its X8 mobile computing system. Markets like China seem to also demand more cores over better ones, which is why we see such a proliferation of quad-core Cortex A5/A7 designs. Apple has traditionally been sensible in this regard, even dating back to core count decisions in its Macs. I remembering reviewing an old iMac and pitting it against a Dell XPS One at the time. This was in the pre-power gating/turbo days. Dell went the route of more cores, while Apple chose for fewer, faster ones. It also put the CPU savings into a better GPU. You can guess which system ended out ahead.

In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have yet to see any mobile SoC vendor (with the exception of Intel with Bay Trail) do this properly, so until we hit that point the optimal target is likely two cores. You only need to look back at the evolution of the PC to come to the same conclusion. Before the arrival of Nehalem and Lynnfield, you always had to make a tradeoff between fewer faster cores and more of them. Gaming systems (and most users) tended to opt for the former, while those doing heavy multitasking went with the latter. Once we got architectures with good turbo, the 2 vs 4 discussion became one of cost and nothing more. I expect we’ll follow the same path in mobile.

Then there’s the frequency discussion. Brian and I have long been hinting at the sort of ridiculous frequency/voltage combinations mobile SoC vendors have been shipping at for nothing more than marketing purposes. I remember ARM telling me the ideal target for a Cortex A15 core in a smartphone was 1.2GHz. Samsung’s Exynos 5410 stuck four Cortex A15s in a phone with a max clock of 1.6GHz. The 5420 increases that to 1.7GHz. The problem with frequency scaling alone is that it typically comes at the price of higher voltage. There’s a quadratic relationship between voltage and power consumption, so it’s quite possibly one of the worst ways to get more performance. Brian even tweeted an image showing the frequency/voltage curve for a high-end mobile SoC. Note the huge increase in voltage required to deliver what amounts to another 100MHz in frequency.

The combination of both of these things gives us a basis for why Apple settled on two Swift cores running at 1.3GHz in the A6, and it’s also why the A7 comes with two cores running at the same max frequency. Interestingly enough, this is the same max non-turbo frequency Intel settled at for Bay Trail. Given a faster process (and turbo), I would expect to see Apple push higher frequencies but without those things, remaining conservative makes sense. I verified frequency through a combination of reporting tools and benchmarks. While it’s possible that I’m wrong, everything I’ve run on the device (both public and not) points to a 1.3GHz max frequency.

Verifying core count is a bit easier. Many benchmarks report core count, I also have some internal tools that do the same - all agreed on the same 2 cores/2 threads conclusion. Geekbench 3 breaks out both single and multithreaded performance results. I checked with the developer to ensure that the number of threads isn’t hard coded. The benchmark queries the max number of logical CPUs before spawning that number of threads. Looking at the ratio of single to multithreaded performance on the iPhone 5s, it’s safe to say that we’re dealing with a dual-core part:

Geekbench 3 Single vs. Multithreaded Performance - Apple A7
  Integer FP
Single Threaded 1471 1339
Multi Threaded 2872 2659
A7 Advantage 1.97x 1.99x
Peak Theoretical 2C Advantage 2.00x 2.00x

Now the question is, what’s changed in these cores?

 

Introduction, Hardware & Cases After Swift Comes Cyclone
POST A COMMENT

464 Comments

View All Comments

  • BrooksT - Wednesday, September 18, 2013 - link

    Nobody will disagree because you've completely destroyed your credibility by insulting the credibility, integrity, and competence of the reviewer, the site, and Apple because the evidence doesn't conform to your speculations and bias. You are not to be taken seriously, and at this point I think everyone sees that.

    Post evidence of this conspiracy or STFU.
    Reply
  • ddriver - Thursday, September 19, 2013 - link

    How a whiff of reality for you - my credibility is and has not been on the line on this one. You don't know who I am, you don't know my credentials. This is not the case for Anand, even if I am right he is not in the position to admit to compiling the review in a manner that creates an unrealistically good presentation of a product, because unlike for me, that would be a huge credibility calamity for him. If anything, his responses are very "political" carefully dancing around the pivot points of my concerns. While his response did partially bring light to a few of my concerns, my key points remain valid - the article continues to not compare A7 with ARMv7 head to head in the sole native CPU benchmark present in the article, "CPU performance" was not renamed to JS performance or moved to browser performance or something like that. See, just because he didn't agree with my points and admit to being biased does not mean I am wrong and that is not the case, considering he is not in the position to do that. I didn't really expect anything more or less than the same "carefully dancing" answer as the article itself, my main motivation was to show him that not all AT readers are incapable of reading between the lines, for the sake of future articles, I did not expect that he would make any revision to the article at hand. Honesty is for those who have nothing to lose, and while his credibility is no the line, my isn't, make the conclusions, if you can ;) Reply
  • CyberAngel - Thursday, September 19, 2013 - link

    Don't worry! I believe you...conditionally!
    I put it this way: I greatly doubt that the tests would reveal any points that are less than favorable the Apple. ANY company would do the same: promote the best parts and highlight the strength of the product.
    Reply
  • akdj - Thursday, September 19, 2013 - link

    "You don't know who I am, you don't know my credentials."
    I'm not sure anyone here is interested---you've already made clear you're a conspiracy theorist, that you believe Apple is paying off reviewers, that you disrespect folks MUCH more intelligent than yourself when it comes to chip architecture...and that your "main motivation was (Is) to show him that not all AT readers are incapable of reading between the lines". You've shown NO one ANYthing substantiated. You continue to argue baseless facts and accuse respected individuals and groups/teams of intelligent members of being bias towards Apple. Nothing in this review supports your claims---NOTHING! And, as I pointed out earlier---even the biggest anti-apple sites are applauding Apple's efforts with this SoC effort.
    You're in the minority---and to be so vain that we would care about who you are and what your credentials are is silly. It sounds to me like you're a 17 year old with a decent vocabulary and not enough paper in the pocket to pick up an iPhone 5s for yourself. But...what do I know. I don't know you, your credentials...or how you lean politically, nor do I care.
    IMO---you're an insult to the entire Anand crew. I'm not sure why I continue to read your responses, they're all the same, just worded differently. Again...you're in the (extreme) minority. You're certainly not an engineer, chip designer, app developer or technological guru---if you were, you would understand the feat Apple has achieved with this SoC architecture.
    J
    Reply
  • Nurenthapa - Friday, September 20, 2013 - link

    I've been enjoying reading this in China, but you, sir, are really annoying me with your sniveling drivel. You have an axe to grind and simply won't shut up. Hope you disappear from this forum. BTW, I use a HTC One and iPad 2, and occasionally my old original 2007 iPhone. I love IOS and iPhones, but won't be buying one until they come out with a somewhat bigger screen. Reply
  • oryades - Wednesday, September 18, 2013 - link

    Intel, now Apple, the same featured reviews. Reply
  • edward kuebler - Wednesday, September 18, 2013 - link

    We are talking about 64 bits too much. The story is new instruction set in ARMv8. Instead complicating the hardware for backwards compatibility (e.g. look at x86 still supporting 16bit code) they wrote a new instruction set faster and less energy demanding. There is still ARMv7 compatibility, but the 64bit mode is independent. And the thing is, once you redesign your architecture, why not go 64bit? what´s the point of staying 32 bit? Moving more data is both slower and faster. More and wider registers help compiler optimizations and media decoding. I didn't get all this “cunning deceitful conspiracy” feeling you talk about. Staying in 32 bit land, *that* would keep me guessing. Reply
  • Anand Lal Shimpi - Wednesday, September 18, 2013 - link

    Our browser based suite (stressing js/HTML5 and other browser based workloads) remains unchanged from all of the other mobile SoC reviews we've done. There's no way of getting around the software differences on these mobile devices as you buy hardware+software together. Unfortunately it's still our best option for non-GPU cross platform comparisons, there just aren't many good cross platform CPU tests.

    I called out the inclusion of hardware accelerated AES/SHA when referencing those tests, there were no attempts to hide that fact. The fact remains that those algorithms will see a speedup on ARMv8 hardware because of those instructions. Note this is no different than when we run the TrueCrypt benchmarks on AES-NI enabled processors vs. those that don't have it (e.g. http://images.anandtech.com/graphs/graph5626/44765...

    Apple provided absolutely zero guidelines on how the review was to be conducted. The only stipulations were around making sure we didn't disclose the fact that we had devices. In fact, most manufacturers don't - at least not with us. Whenever there are any stipulations presented, we always disclose them on the site (e.g. see our early look at Trinity desktop performance).

    Krait implements ARMv7, so that's 64-bit wide registers for its NEON units. It expanded the width of the execution units, but the registers themselves have to adhere to the ARMv7 ISA.

    I think we explained why 64-bit makes sense (doing so at the last minute doesn't make sense, immediate SIMD/Crypto perf increases today, and helps build up the ecosystem), and even highlighted cases where a performance degradation does happen (see: Dijkstra tests). Keep in mind that iOS has always erred on the side of being more thrifty with RAM to begin with. I would like to see more but I don't know how necessary it is today.

    Take care,
    Anand
    Reply
  • ddriver - Wednesday, September 18, 2013 - link

    Anand, maybe you should hire a developer to write native cross platform benchmark tools. This is the only way to avoid all caveats like sponsored exclusive optimizations, different implementations, eliminate unrealistic low footprint synthetics, "selective compilers" (*cough Intel*) and whatnot. Considering the amount of reviews you are doing and the fact that C/C++ compilers have caught up with ARM for a long time, this is nothing hard and something that entirely makes sense, especially relative to using different JS engine implementations to measure CPU performance. JS should go in the "browser" department, not CPU performance.

    According to wikipedia, Krait implements 128bit SIMD, so maybe that is a mistake on wikipedia's behalf?

    I still think encryption results belong in their own chart, and have no place in a chart that is supposed to be indicative of the integer performance delta between 32 and 64bit execution modes. Even with the clarification you made, it creates an unrealistic impression, not to mention some people skimp over the text and only look at the numbers. Encryption is encryption, integer performance is integer performance. Why mix the two (except for the reason I already mentioned and you deny)?

    I wish you'd reflected a bit on the marketing aspect of the transition to 64, considering how much apple is riding it this time around. No one argues 64bit is good and more performance is good, but this brings up the issue of the particular implementation, e.g. a fast chip with only a single gigabyte of ram, and how will that play out with an actual performance demanding real world application.

    Thanks for addressing my concerns.
    Reply
  • Wilco1 - Wednesday, September 18, 2013 - link

    ARMv7 has 32 64-bit SIMD registers but they can also be used as 16 128-bit SIMD registers. Modern CPUs like Cortex-A15 and Krait support many 128-bit SIMD operations in a single cycle, but not all operations are supported (such as double precision FP). ARMv8 has 32 128-bit SIMD registers and supports SIMD of 2 64-bit doubles. Reply

Log in

Don't have an account? Sign up now