Apple's Swift: Pipeline Depth & Memory Latency

Section by Anand Shimpi

For the first time since the iPhone's introduction in 2007, Apple is shipping a smartphone with a CPU clock frequency greater than 1GHz. The Cortex A8 in the iPhone 3GS hit 600MHz, while the iPhone 4 took it to 800MHz. With the iPhone 4S, Apple chose to maintain the same 800MHz operating frequency as it moved to dual-Cortex A9s. Staying true to its namesake, Swift runs at a maximum frequency of 1.3GHz as implemented in the iPhone 5's A6 SoC. Note that it's quite likely the 4th generation iPad will implement an even higher clocked version (1.5GHz being an obvious target).

Clock speed alone doesn't tell us everything we need to know about performance. Deeper pipelines can easily boost clock speed but come with steep penalties for mispredicted branches. ARM's Cortex A8 featured a 13 stage pipeline, while the Cortex A9 moved down to only 8 stages while maintining similar clock speeds. Reducing pipeline depth without sacrificing clock speed contributed greatly to the Cortex A9's tangible increase in performance. The Cortex A15 moves to a fairly deep 15 stage pipeline, while Krait is a bit more conservative at 11 stages. Intel's Atom has the deepest pipeline (ironically enough) at 16 stages.

To find out where Swift falls in all of this I wrote two different codepaths. The first featured an easily predictable branch that should almost always be taken. The second codepath featured a fairly unpredictable branch. Branch predictors work by looking at branch history - branches with predictable history should be, well, easy to predict while the opposite is true for branches with a more varied past. This time I measured latency in clocks for the main code loop:

Branch Prediction Code
  Apple A3 (Cortex A8 @ 600MHz Apple A5 (2 x Cortex A9 @ 800MHz Apple A6 (2 x Swift @ 1300MHz
Easy Branch 14 clocks 9 clocks 12 clocks
Hard Branch 70 clocks 48 clocks 73 clocks

The hard branch involves more compares and some division (I'm basically branching on odd vs. even values of an incremented variable) so the loop takes much longer to execute, but note the dramatic increase in cycle count between the Cortex A9 and Swift/Cortex A8. If I'm understanding this data correctly it looks like the mispredict penalty for Swift is around 50% longer than for ARM's Cortex A9, and very close to the Cortex A8. Based on this data I would peg Swift's pipeline depth at around 12 stages, very similar to Qualcomm's Krait and just shy of ARM's Cortex A8.

Note that despite the significant increase in pipeline depth Apple appears to have been able to keep IPC, at worst, constant (remember back to our scaled Geekbench scores - Swift never lost to a 1.3GHz Cortex A9). The obvious explanation there is a significant improvement in branch prediction accuracy, which any good chip designer would focus on when increasing pipeline depth like this. Very good work on Apple's part.

The remaining aspect of Swift that we have yet to quantify is memory latency. From our iPhone 5 performance preview we already know there's a tremendous increase in memory bandwidth to the CPU cores, but as the external memory interface remains at 64-bits wide all of the changes must be internal to the cache and memory controllers. I went back to Nirdhar's iOS test vehicle and wrote some new code, this time to access a large data array whose size I could vary. I created an array of a finite size and added numbers stored in the array. I increased the array size and measured the relationship between array size and code latency. With enough data points I should get a good idea of cache and memory latency for Swift compared to Apple's implementation of the Cortex A8 and A9.

At relatively small data structure sizes Swift appears to be a bit quicker than the Cortex A8/A9, but there's near convergence around 4 - 16KB. Take a look at what happens once we grow beyond the 32KB L1 data cache of these chips. Swift manages around half the latency for running this code as the Cortex A9 (the Cortex A8 has a 256KB L2 cache so its latency shoots up much sooner). Even at very large array sizes Swift's latency is improved substantially. Note that this data is substantiated by all of the other iOS memory benchmarks we've seen. A quick look at Geekbench's memory and stream tests show huge improvements in bandwidth utilization:

Couple the dedicated load/store port with a much lower latency memory subsystem and you get 2.5 - 3.2x the memory performance of the iPhone 4S. It's the changes to the memory subsystem that really enable Swift's performance.

 

Apple's Swift: Visualized Six Generations of iPhones: Performance Compared
Comments Locked

276 Comments

View All Comments

  • Sufo - Tuesday, October 16, 2012 - link

    Agreed. If his goal is to fly the flag for apple (who clearly need no flag flying - look at their stock prices, but i digress...), and discredit its detractors, he's doing an awful job. But then again, I do detect a whiff of troll.
  • Spunjji - Friday, October 19, 2012 - link

    Word.
  • doobydoo - Saturday, October 20, 2012 - link

    Bragging? About being an engineer?

    LOL
  • dagamer34 - Tuesday, October 16, 2012 - link

    If you wanted a "should I upgrade to this phone" review, there are hundreds of those reviews online. But AnandTech is pretty much the only place where you get a definitive review worth reading 5 years from now. They leave no stone unturned.
  • Arbee - Tuesday, October 16, 2012 - link

    Agreed. "Should I upgrade" is covered by literally dozens of newspapers, TV shows, and websites (Engadget, The Verge, Gizmodo, All Things D just to name 4). AT is the home of the 15+ page deep dive, and they do it just as well for Androids and Windows Phones.

    Also, I'm completely positive that if you sent Brian a GS3 with the iPhone 5's camera he'd write about it in exactly the same way. 2 weeks ago DPReview covered the iPhone 5's camera in a very similar way (including the same suggestions on how to avoid the problem, and a demonstration of inducing similar artifacts on the iPhone 4S and a couple of Android handsets). Optics is not a soft science, there is no room for fanboyism.
  • rarson - Wednesday, October 17, 2012 - link

    I totally disagree. He brings up a completely valid point because Anandtech usually separates the reviews from the in-depth tech examinations. There's absolutely no need for the review to be 20 pages when most people are looking for benchmarks and hands-on impressions. Considering the fact that going this in-depth made the review late, it makes no sense at all.

    At least half of this information in this article doesn't even fall under the category of a review.
  • darkcrayon - Tuesday, October 16, 2012 - link

    I think this type of review (hell, the site in general) is directed at people that want the maximum amount of compiled nitty gritty techy details... Notice his review was weeks after the larger more general consumer oriented sites. I think anyone wanting to know whether they should upgrade, that isn't interested in the technical details of the A6, would be better served reading those reviews anyway.

    Anand has said in previous reviews that he felt that iOS was intended to be more of an "appliance" OS. It's a pretty apt comparison of the two actually. That focus is why you can side load and more easily put custom software on Android, and also why you'll need anti-malware software for it before long as well. The point of an appliance is to have a reliable, consistent device that you spend more time using than tweaking.
  • daar - Wednesday, October 17, 2012 - link

    Point taken, darkcrayon.

    I prefer AT's reviews because they do a thorough and unbiased job at detailing/benchmarking and comparing different products. The suggestion was that the info about the SoC be split on it's own. If Intel released a new chip, call it i9, and the first sample was from an Alienware notebook, I would simply be suggesting that the technical info about the chip have it's own post and not be combined with the review of the notebook is all.

    I find it a bit strange that people are suggesting to go to other websites when I made the comment of comparisons to other products, and quite unlike most posts in AT reviews. If I make a comment about a few ATI features not being compared with Nvidia's, I would have been surprised to have people to tell me to go visit Tom's Hardware or the like.

    Not to say there wasn't any comparisons, but rather in contrast to say, for example, the One X review where Brian made the comment of how the construction of the device felt better than the GS3. It felt like punches were being pulled in this review is all.
  • phillyry - Sunday, October 21, 2012 - link

    Anand,

    I would like to know, however, how an Android device serves more as an all purpose device than an iPhone.

    Did you mean because of its customisable skins or because it can do some things that an iPhone cannot - presumably because of Apple's strong hold ('death grip') on the OS?

    This is pretty important to me because I am near the end of the term of my agreement and am in the market for a new 'phone'. I've considered W8P for precisely this reason but am waiting to see if they flop or not. I've always thought of Android as pretty darn similar to iOS but with slightly different interfaces and less user restrictions.

    Is there some other factor that makes an Android any more like a pocket computer, like the future x86 W8P phones will presumably eventually be, and less like an iPhone than I have imagined?
  • phillyry - Sunday, October 21, 2012 - link

    I also took notice of it when Anand referred to the iPhone as an appliance. Your remark saying, "The point of an appliance is to have a reliable, consistent device that you spend more time using than tweaking" would be comforting but I don't think that that's quite how Anand meant it. I was actually quite put off by the term because I think that he meant that the iPhone is made to be more of a tag along device that goes with your other Macs and plays a support role rather than a stand alone device. He pretty much says as much.

    Like I said, I found this a bit off putting but I think he's just saying how he sees it in terms of the respective companies' product lines and agendas. It actually makes a fair bit of sense. I found that when I got an iPhone it made me want an iPad. And then when I got an iPad it made me want a MacBook. Call it what you will but I remember thinking that they should be able to make it so that I can do everything I need to on an iPad but distinctly felt like I really needed a MacBook to really do all that I wanted. It could be argued, along the lines of Anand's original comment, that this is Apple's approach / business model.

    It also points to a distinction between Apple and the other big player that no one in this forum is talking about - Microsoft. Windows 8 appears to be meant to be the exact opposite of this approach. Instead of one device for each purpose it's one device for all purposes. It will be interesting to see if Microsoft's approach with Windows 8 will turn things around or simply flop, at least on the handheld device side of things.

Log in

Don't have an account? Sign up now