After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • ClarkGoble - Wednesday, September 18, 2013 - link

    On OSX most apps are 64 bit. Developers I've talked with say you get a 20%-30% speed increase by going 64 bit. Oddly Apple's iWork apps are among the few on my system still 32bit. (And that'll probably change next month) With regards to iOS7 I worry that they didn't increase the RAM but will, for multiprocessing tasks, be having to load both 32bit and 64bit frameworks in RAM at the same time. I assume they have a way to do this well but extra memory would have made it less painful (although perhaps have hurt the battery life)
  • DeciusStrabo - Wednesday, September 18, 2013 - link

    Now, now, that's not really true any more. Taking my Windows 8 machine her, about 2/3 of the programs and background processes currently running are 64bit, 1/3 32bit. On MacOS it is more like 90 % 64bit, 10 % 32bit.
  • name99 - Thursday, September 19, 2013 - link

    You would get more useful answers if you asked decent questions. What does "bloat your program by 25" mean?
    - 25% larger CODE footprint?
    - 25% larger ACTIVE CODE footprint?
    - 25% larger DATA footprint?
    - 25% larger ACTIVE DATA footprint?
    - 25% larger shipped binary?
    The last (shipped binary) is what most people seem to mean when they talk about bloat. It's also the one for which the claim is closest to bullshit because most of what takes up space in a binary is data assets --- images, translated strings, that sort of thing. Even duplicating the code resources to include both 64 and 32 bit code will, for most commercial apps, add only negligible size to the shipping binary.
  • Devfarce - Tuesday, September 17, 2013 - link

    The performance of the A7 chip sounds amazing. Similar performance to the original 11" MBA is pretty incredible. Makes me realize that I have a 2007 Merom 1.8 GHz Core 2 Duo in my laptop, that it's running Win7 32 bit (again!!!!) and that is within striking distance of the iPhone 5s. I don't even want to think about GPU or memory performance, I'm sure that ship sailed long ago with GMA X3100.
  • tipoo - Tuesday, September 17, 2013 - link

    Closing in on or maybe surpassing Intel HD2500 now at least, I think. HD4000 is still a bit away, probably within striking range of A7X.
  • dylan522p - Tuesday, September 17, 2013 - link

    Hopefully HD6000 is really good. They are doing a big design change then.
  • Krysto - Wednesday, September 18, 2013 - link

    Intel will be focusing mostly on power consumption from now on, not performance, even on the GPU side. Although I'm sure they'll try to be misleading again, by showing off the "high-end PC version" of their new GPU, to make everyone think that's what they're getting in their laptops (even though they're not), just like they did with Haswell.
  • Mondozai - Wednesday, September 18, 2013 - link

    You have no clue, Krysto.
  • Devfarce - Wednesday, September 18, 2013 - link

    I wouldn't say Intel is misleading on performance, however very few companies will demand the parts with the biggest GPU like Apple does. People just don't demand the parts with the big GPUs although they should. Which is why Intel currently sells mostly HD4400 in the windows Haswell chips on the market.

    But back to the iPhone, this is truly incredible even if people don't want to believe it.
  • akdj - Thursday, September 19, 2013 - link

    Not sure you know what you're talking about. The 5000 & 51(2?)00 iGPUs are incredible. Especially when you take in to count the efficiency and power increase between its (Haswell) architecture in comparison with the HD4000 in Ivy Bridge. I think Apple's demand here is a big motivation for Intel to continue to innovate with their iGPUs...regardless of what the other 'ultra book' OEMs are demanding. They just don't have the pull...or the 'balls' to stand up to Intel. I also think Intel has impressed themselves with the performance gains from the Hd3000--->40000--->>4600/5&5100 transitions. As they progress and shut the gap of what a normal consumer that enjoys gaming and video editing (not the GPU guru that's demanding the latest SLI nVidia setup)...when directly compared with discrete cards, they'll enjoy a big win. Already the ultra book sales are being subsidized by Intel...to the tune of $300,000,000. I think they're motivated and Apple absolutely IS using the high power GPUs. Not the 4600 all others have chosen. The 5000s are already in the new MBA. The rMBP refresh is close and my bet is they'll be using the high end iGPU in the 13/15" rMBP updates. Hopefully still maintaining the discreet option on the 15"...but as the performance increase, in the portable laptop sector....I'm not so sure most consumers wouldn't value all day battery performance vs an extra 10fps in the latest FPS;). The 13" MBA is already getting 10-12 hours of battery life on Haswell with the HD 5000. And able to play triple A games at decent frame rates, albeit not on the 'ultimate' settings with anti aliasing. For those interested, they'll augment their day long use laptop with a gaming console. I think the whole big beige desktop's days are limited. We'll see. While I don't disagree Intel tends to embellish their performance...in this case, they're going the right direction. Too much competition...including from the ultra low voltage SoC developers making such massive in roads (this review is all the proof you need).

Log in

Don't have an account? Sign up now