A7 SoC Explained

I’m still surprised by the amount of confusion around Apple’s CPU cores, so that’s where I’ll start. I’ve already outlined how ARM’s business model works, but in short there are two basic types of licenses ARM will bestow upon its partners: processor and architecture. The former involves implementing an ARM designed CPU core, while the latter is the creation of an ARM ISA (Instruction Set Architecture) compatible CPU core.

NVIDIA and Samsung, up to this point, have gone the processor license route. They take ARM designed cores (e.g. Cortex A9, Cortex A15, Cortex A7) and integrate them into custom SoCs. In NVIDIA’s case the CPU cores are paired with NVIDIA’s own GPU, while Samsung licenses GPU designs from ARM and Imagination Technologies. Apple previously leveraged its ARM processor license as well. Until last year’s A6 SoC, all Apple SoCs leveraged CPU cores designed by and licensed from ARM.

With the A6 SoC however, Apple joined the ranks of Qualcomm with leveraging an ARM architecture license. At the heart of the A6 were a pair of Apple designed CPU cores that implemented the ARMv7-A ISA. I came to know these cores by their leaked codename: Swift.

At its introduction, Swift proved to be one of the best designs on the market. An excellent combination of performance and power consumption, the Swift based A6 SoC improved power efficiency over the previous Cortex A9 based design. Swift also proved to be competitive with the best from Qualcomm at the time. Since then however, Qualcomm has released two evolutions of its CPU core (Krait 300 and Krait 400), and pretty much regained performance leadership over Apple. Being on a yearly release cadence, this is Apple’s only attempt to take back the crown for the next 12 months.

Following tradition, Apple replaces its A6 SoC with a new generation: A7.

With only a week to test battery life, performance, wireless and cameras on two phones, in addition to actually using them as intended, there wasn’t a ton of time to go ridiculously deep into the new SoC’s architecture. Here’s what I’ve been able to piece together thus far.

First off, based on conversations with as many people in the know as possible, as well as just making an educated guess, it’s probably pretty safe to say that the A7 SoC is built on Samsung’s 28nm HK+MG process. It’s too early for 20nm at reasonable yields, and Apple isn’t ready to move some (not all) of its operations to TSMC.

The jump from 32nm to 28nm results in peak theoretical scaling of 76.5% (the same design on 28nm can be no smaller than 76.5% of the die area at 32nm). In reality, nothing ever scales perfectly so we’re probably talking about 80 - 85% tops. Either way that’s a good amount of room for new features.

At its launch event Apple officially announced both die size for the A7 (102mm^2) as well as transistor count (over 1 billion). Don’t underestimate the magnitude of both of these disclosures. The technical folks at Cupertino are clearly winning some battle to talk more about their designs and not less. We’re not yet at the point where I’m getting pretty diagrams and a deep dive, but it’s clear that Apple is beginning to open up more (and it’s awesome).

Apple has never previously disclosed transistor count. I also don’t know if this “over 1 billion” figure is based on a schematic or layout transistor count. The only additional detail I have is that Apple is claiming a near doubling of transistors compared to the A6. Looking at die sizes and taking into account scaling from the process node shift, there’s clearly a more fundamental change to the chip’s design. It is possible to optimize a design (and transistors) for area, which seems to be what has happened here.

The CPU cores are, once again, a custom design by Apple. These aren’t Cortex A57 derivatives (still too early for that), but rather some evolution of Apple’s own Swift architecture. I’ll dive into specifics of what I’ve been able to find in a moment. To answer the first question on everyone’s mind, I believe there are two of these cores on the A7. Before I explain how I arrived at this conclusion, let’s first talk about cores and clock speeds.

I always thought the transition from 2 to 4 cores happened quicker in mobile than I had expected. Thankfully there are some well threaded apps that have been able to take advantage of more than two cores and power gating keeps the negative impact of the additional cores down to a minimum. As we saw in our Moto X review however, two faster cores are still better for most uses than four cores running at lower frequencies. NVIDIA forced everyone’s hand in moving to 4 cores earlier than they would’ve liked, and now you pretty much can’t get away with shipping anything less than that in an Android handset. Even Motorola felt necessary to obfuscate core count with its X8 mobile computing system. Markets like China seem to also demand more cores over better ones, which is why we see such a proliferation of quad-core Cortex A5/A7 designs. Apple has traditionally been sensible in this regard, even dating back to core count decisions in its Macs. I remembering reviewing an old iMac and pitting it against a Dell XPS One at the time. This was in the pre-power gating/turbo days. Dell went the route of more cores, while Apple chose for fewer, faster ones. It also put the CPU savings into a better GPU. You can guess which system ended out ahead.

In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have yet to see any mobile SoC vendor (with the exception of Intel with Bay Trail) do this properly, so until we hit that point the optimal target is likely two cores. You only need to look back at the evolution of the PC to come to the same conclusion. Before the arrival of Nehalem and Lynnfield, you always had to make a tradeoff between fewer faster cores and more of them. Gaming systems (and most users) tended to opt for the former, while those doing heavy multitasking went with the latter. Once we got architectures with good turbo, the 2 vs 4 discussion became one of cost and nothing more. I expect we’ll follow the same path in mobile.

Then there’s the frequency discussion. Brian and I have long been hinting at the sort of ridiculous frequency/voltage combinations mobile SoC vendors have been shipping at for nothing more than marketing purposes. I remember ARM telling me the ideal target for a Cortex A15 core in a smartphone was 1.2GHz. Samsung’s Exynos 5410 stuck four Cortex A15s in a phone with a max clock of 1.6GHz. The 5420 increases that to 1.7GHz. The problem with frequency scaling alone is that it typically comes at the price of higher voltage. There’s a quadratic relationship between voltage and power consumption, so it’s quite possibly one of the worst ways to get more performance. Brian even tweeted an image showing the frequency/voltage curve for a high-end mobile SoC. Note the huge increase in voltage required to deliver what amounts to another 100MHz in frequency.

The combination of both of these things gives us a basis for why Apple settled on two Swift cores running at 1.3GHz in the A6, and it’s also why the A7 comes with two cores running at the same max frequency. Interestingly enough, this is the same max non-turbo frequency Intel settled at for Bay Trail. Given a faster process (and turbo), I would expect to see Apple push higher frequencies but without those things, remaining conservative makes sense. I verified frequency through a combination of reporting tools and benchmarks. While it’s possible that I’m wrong, everything I’ve run on the device (both public and not) points to a 1.3GHz max frequency.

Verifying core count is a bit easier. Many benchmarks report core count, I also have some internal tools that do the same - all agreed on the same 2 cores/2 threads conclusion. Geekbench 3 breaks out both single and multithreaded performance results. I checked with the developer to ensure that the number of threads isn’t hard coded. The benchmark queries the max number of logical CPUs before spawning that number of threads. Looking at the ratio of single to multithreaded performance on the iPhone 5s, it’s safe to say that we’re dealing with a dual-core part:

Geekbench 3 Single vs. Multithreaded Performance - Apple A7
  Integer FP
Single Threaded 1471 1339
Multi Threaded 2872 2659
A7 Advantage 1.97x 1.99x
Peak Theoretical 2C Advantage 2.00x 2.00x

Now the question is, what’s changed in these cores?

 

Introduction, Hardware & Cases After Swift Comes Cyclone
Comments Locked

464 Comments

View All Comments

  • Dug - Wednesday, September 18, 2013 - link

    "maybe you should hire a developer to write native cross platform benchmark tools"
    WHY? It is not going to make any difference. Developers aren't writing native cross platform programs. If they can take advantage of anything that's in the system, then show it off.
    That would be like telling car manufacturers to redesign a hybrid to gas only to compare with all the other gas only cars.
  • ddriver - Wednesday, September 18, 2013 - link

    "Developers aren't writing native cross platform programs"

    Maybe it is about time you crawl from under the rock you are living under... Any even remotely concerned with performance and efficiency application pretty much mandates it is a native application. It would be incredibly stupid to not do it, considering the "closest" to native language Java is like 2-3 times slower and users 10-20 times as much memory.
  • Dug - Wednesday, September 18, 2013 - link

    Exactly my point! "native cross platform" Each cross-platform solution can only support a subset of the functionality included in each native platform.

    It doesn't get you anywhere to produce a native cross platform benchmark tool.

    Again you have to mitigate to names and snide comments because you are wrong.
  • ddriver - Wednesday, September 18, 2013 - link

    What you talk about is I/O, events and stuff like that. When it comes to pure number crunching the same code can execute perfectly well for every platform it is complied against. Actually, some modern frameworks go even further than that and provide ample abstractions. For example, the same GUI application can run on Windows, Linux, MacOS, iOS and Android, apart from a few other minor platforms.
  • Anand Lal Shimpi - Wednesday, September 18, 2013 - link

    Ultimately the benchmarking problem is being fixed, just not on the time scale that we want it to. I figured we'd be better off by now, and in many ways we are (WebXPRT, Browsermark are both steps in the right direction, we have more native tools under Android now) but part of the problem is there was a long period of uncertainty around what OSes would prevail. Now that question is finally being answered and we're seeing some real investment in benchmarks. Trust me, I tried to do a lot behind the scenes over the past 4 years (some of which Brian and I did recently) but this stuff takes time. I remember going through this in the early days of the PC industry too though, I know how it all ends - it'll just take a little time to get there.

    Actually I think 128-bit registers might've been optional on v7.

    The only reason encryption results are in that table is because that's how Geekbench groups them. There's no nefarious purpose there (note that it's how we've always reported the Geekbench results, as they are reported in the test themselves).

    In my experience with the 5s I haven't noticed any performance regressions compared to the 5/5c. I'm not saying they don't exist and I'll continue to hunt, it's just that they aren't there now. I believe I established the reasoning for why you'd want to do this early, and again we're talking about at most 12 months before they should start the move to 64-bit anyways. Apple tends to like its ISA transitions to be as quick and painless as possible, and moving early to ARMv8 makes a lot of sense in that light. Sure they are benefiting from the marketing benefits of having a feature that no one else does, but what company doesn't do that?

    I don't believe the move to 64-bit with Cyclone was driven first and foremost by marketing. Keep in mind that this architecture was designed when a bunch of certain ex-AMDers were over there too...

    Take care,
    Anand
  • BrooksT - Wednesday, September 18, 2013 - link

    Why would Anand write cross-platform benchmarks that have no connection to real world usage? Especially when you then complain that the 64 bit coverage isn't real world enough?
  • ddriver - Wednesday, September 18, 2013 - link

    For starters, putting the encryption results in their own graph, like every other review before that, and side to side comparison between geekbench ST/MT scores for A7 and competing v7 chips would be a good start toward a more objective and less biased article.

    And I know I am asking a lot, but an edit feature in the comment section is long overdue...
  • TheBretz - Wednesday, September 18, 2013 - link

    For what it's worth this is NOT a case of LITERALLY comparing "Apples" and "Oranges" - it is a case of comparing "Apple" and many other manufacturers, but there was no fruit involved in the comparison, only smarthphones and tablets.
  • ddriver - Wednesday, September 18, 2013 - link

    Apples to oranges is a figure of speech, it has nothing to do with the company apple... It concerns comparing incomparable objects which is the case of completely different JS implementations on iOS and Android.
  • Arbee - Wednesday, September 18, 2013 - link

    Please name any case when AT's benchmarks and reviews have been proven to be biased or inaccurate. There's a reason the writers at other sites consider AT the gold standard for solid technical commentary (Engadget, Gizmodo, and the Verge all regularly credit AT on technical stories). As far as bias, have you *heard* Brian cooing about practically wanting to marry the Nexus 5? ;-)

    I think what actually happened here is that apparently Apple engineers listen to the AT podcast, because aside from 802.11ac and the screen size the 5S is designed almost perfectly to AT's well-known and often-stated specifications. It hits all of Anand's chip architecture geekery hot buttons in a way that Samsung's mashups of off-the-shelf parts never will, and they used Brian's exact line "Bigger pixels means better pictures" in the presentation. And naturally, if someone gives you what you want, you're likely to be happy with it. This is why people have Amazon gift lists ;-)

    Krait's 128 bit SIMD definitely helps, but it won't match true v8 architecture designs. I've written commercially shipping ARM assembly, and there's a *lot* of cruft in the older ISA that v8 cleans right up. And it lets compilers generate *much* more favorable code. I'll be surprised if the next Snapdragons aren't at least 32-bit v8. Qualcomm has been pretty forward-looking aside from their refusal to cooperate with the open-source community (Freedreno FTW).

    As far as 64 bit on less than 4 GB of RAM, it enables applications to more freely operate on files in NAND without taking up huge amounts of RAM (via mmap(), which the Linux kernel in Android of course also has). Apps like Loopy HD and MultiTrack DAW (not to mention Apple's own iMovie and GarageBand) will definitely be able to take advantage.

Log in

Don't have an account? Sign up now