Custom Code to Understand a Custom Core

Section by Anand Shimpi

All Computer Engineers at NCSU had to take mandatory programming courses. Given that my dad is a Computer Science professor, I always had exposure to programming, but I never considered it my strong suit - perhaps me gravitating towards hardware was some passive rebellious thing. Either way I knew that in order to really understand Swift, I'd have to do some coding on my own. The only problem? I have zero experience writing Objective-C code for iOS, and not enough time to go through a crash course.

I had code that I wanted to time/execute in C, but I needed it ported to a format that I could easily run/monitor on an iPhone. I enlisted the help of a talented developer friend who graduated around the same time I did from NCSU, Nirdhar Khazanie. Nirdhar has been working on mobile development for years now, and he quickly made the garbled C code I wanted to run into something that executed beautifully on the iPhone. He gave me a framework where I could vary instructions as well as data set sizes, which made this next set of experiments possible. It's always helpful to know a good programmer.

So what did Nirdhar's app let me do? Let's start at the beginning. ARM's Cortex A9 has two independent integer ALUs, does Swift have more? To test this theory I created a loop of independent integer adds. The variables are all independent of one another, which should allow for some great instruction level parallelism. The code loops many times, which should make for some easily predictable branches. My code is hardly optimal but I did keep track of how many millions of adds were executed per second. I also reported how long each iteration of the loop took, on average.

Integer Add Code
  Apple A5 (2 x Cortex A9 @ 800MHz Apple A5 Scaled (2 x Cortex A9 @ 1300MHz Apple A6 (2 x Swift @ 1300MHz Swift / A9 Perf Advantage @ 1300MHz
Integer Add Test 207 MIPS 336 MIPS 369 MIPS 9.8%
Integer Add Latency in Clocks 23 clocks   21 clocks  

The code here should be fairly bound by the integer execution path. We're showing a 9.8% increase in performance. Average latency is improved slightly by 2 clocks, but we're not seeing the sort of ILP increase that would come from having a third ALU that can easily be populated. The slight improvement in performance here could be due to a number of things. A quick look at some of Apple's own documentation confirms what we've seen here: Swift has two integer ALUs and can issue 3 operations per cycle (implying a 3-wide decoder as well). I don't know if the third decoder is responsible for the slight gains in performance here or not.

What about floating point performance? ARM's Cortex A9 only has a single issue port for FP operations which seriously hampers FP performance. Here I modified the code from earlier to do a bunch of single and double precision FP multiplies:

FP Add Code
  Apple A5 (2 x Cortex A9 @ 800MHz Apple A5 Scaled (2 x Cortex A9 @ 1300MHz Apple A6 (2 x Swift @ 1300MHz Swift / A9 Perf Advantage @ 1300MHz
FP Mul Test (single precision) 94 MFLOPS 153 MFLOPS 143 MFLOPS -7%
FP Mul Test (double precision) 87 MFLOPS 141 MFLOPS 315 MFLOPS 123%

There's actually a slight regression in performance if we look at single precision FP multiply performance, likely due to the fact that performance wouldn't scale perfectly linearly from 800MHz to 1.3GHz. Notice what happens when we double up the size of our FP multiplies though, performance goes up on Swift but remains unchanged on the Cortex A9. Given the support for ARM's VFPv4 extensions, Apple likely has a second FP unit in Swift that can help with FMAs or to improve double precision FP performance. It's also possible that Swift is a 128-bit wide NEON machine and my DP test compiles down to NEON code which enjoys the benefits of a wider engine. I ran the same test with FP adds and didn't notice any changes to the data above.

Sanity Check with Linpack & Passmark

Section by Anand Shimpi

Not completely trusting my own code, I wanted some additional data points to help understand the Swift architecture. I first turned to the iOS port of Linpack and graphed FP performance vs. problem size:

Even though I ran the benchmark for hundreds of iterations at each data point, the curves didn't come out as smooth as I would've liked them to. Regardless there's a clear trend. Swift maintains a huge performance advantage, even at small problem sizes which supports the theory of having two ports to dedicated FP hardware. There's also a much smaller relative drop in performance when going out to main memory. If you do the math on the original unscaled 4S scores you get the following data:

Linpack Throughput: Cycles per Operation
  Apple Swift @ 1300MHz (iPhone 5) ARM Cortex A9 @ 800MHz (iPhone 4S)
~300KB Problem Size 1.45 cycles 3.55 cycles
~8MB Problem Size 2.08 cycles 6.75 cycles
Increase 43% 90%

Swift is simply able to hide memory latency better than the Cortex A9. Concurrent FP/memory operations seem to do very well on Swift...

As the last sanity check I used Passmark, another general purpose iOS microbenchmark.

Passmark CPU Performance
  Apple A5 (2 x Cortex A9 @ 800MHz Apple A5 Scaled (2 x Cortex A9 @ 1300MHz Apple A6 (2 x Swift @ 1300MHz Swift / A9 Perf Advantage @ 1300MHz
Integer 257 418 614 47.0%
FP 230 374 813 118%
Primality 54 87 183 109%
String qsort 1065 1730 2126 22.8%
Encryption 38.1 61.9 93.5 51.0%
Compression 1.18 1.92 2.26 17.9%

The integer math test uses a large dataset and performs a number of add, subtract, multiply and divide operations on the values. The dataset measures 240KB per core, which is enough to stress the L2 cache of these processors. Note the 47% increase in performance over a scaled Cortex A9.

The FP test is identical to the integer test (including size) but it works on 32 and 64-bit floating point values. The performance increase here despite facing the same workload lends credibility to the theory that there are multiple FP pipelines in Swift.

The Primality benchmark is branch heavy and features a lot of FP math and compares. Once again we see huge scaling compared to the Cortex A9.

The qsort test features integer math and is very branch heavy. The memory footprint of the test is around 5MB, but the gains here aren't as large as we've seen elsewhere. It's possible that Swift features a much larger branch mispredict penalty than the A9.

The Encryption test works on a very small dataset that can easily fit in the L1 cache but is very heavy on the math. Performance scales very well here, almost mirroring the integer benchmark results.

Finally the compression test shows us the smallest gains once you take into account Swift's higher operating frequency. There's not much more to conclude here other than we won't always see greater than generational scaling from Swift over the previous Cortex A9.

Decoding Swift Apple's Swift: Visualized
Comments Locked

276 Comments

View All Comments

  • Sufo - Tuesday, October 16, 2012 - link

    Agreed. If his goal is to fly the flag for apple (who clearly need no flag flying - look at their stock prices, but i digress...), and discredit its detractors, he's doing an awful job. But then again, I do detect a whiff of troll.
  • Spunjji - Friday, October 19, 2012 - link

    Word.
  • doobydoo - Saturday, October 20, 2012 - link

    Bragging? About being an engineer?

    LOL
  • dagamer34 - Tuesday, October 16, 2012 - link

    If you wanted a "should I upgrade to this phone" review, there are hundreds of those reviews online. But AnandTech is pretty much the only place where you get a definitive review worth reading 5 years from now. They leave no stone unturned.
  • Arbee - Tuesday, October 16, 2012 - link

    Agreed. "Should I upgrade" is covered by literally dozens of newspapers, TV shows, and websites (Engadget, The Verge, Gizmodo, All Things D just to name 4). AT is the home of the 15+ page deep dive, and they do it just as well for Androids and Windows Phones.

    Also, I'm completely positive that if you sent Brian a GS3 with the iPhone 5's camera he'd write about it in exactly the same way. 2 weeks ago DPReview covered the iPhone 5's camera in a very similar way (including the same suggestions on how to avoid the problem, and a demonstration of inducing similar artifacts on the iPhone 4S and a couple of Android handsets). Optics is not a soft science, there is no room for fanboyism.
  • rarson - Wednesday, October 17, 2012 - link

    I totally disagree. He brings up a completely valid point because Anandtech usually separates the reviews from the in-depth tech examinations. There's absolutely no need for the review to be 20 pages when most people are looking for benchmarks and hands-on impressions. Considering the fact that going this in-depth made the review late, it makes no sense at all.

    At least half of this information in this article doesn't even fall under the category of a review.
  • darkcrayon - Tuesday, October 16, 2012 - link

    I think this type of review (hell, the site in general) is directed at people that want the maximum amount of compiled nitty gritty techy details... Notice his review was weeks after the larger more general consumer oriented sites. I think anyone wanting to know whether they should upgrade, that isn't interested in the technical details of the A6, would be better served reading those reviews anyway.

    Anand has said in previous reviews that he felt that iOS was intended to be more of an "appliance" OS. It's a pretty apt comparison of the two actually. That focus is why you can side load and more easily put custom software on Android, and also why you'll need anti-malware software for it before long as well. The point of an appliance is to have a reliable, consistent device that you spend more time using than tweaking.
  • daar - Wednesday, October 17, 2012 - link

    Point taken, darkcrayon.

    I prefer AT's reviews because they do a thorough and unbiased job at detailing/benchmarking and comparing different products. The suggestion was that the info about the SoC be split on it's own. If Intel released a new chip, call it i9, and the first sample was from an Alienware notebook, I would simply be suggesting that the technical info about the chip have it's own post and not be combined with the review of the notebook is all.

    I find it a bit strange that people are suggesting to go to other websites when I made the comment of comparisons to other products, and quite unlike most posts in AT reviews. If I make a comment about a few ATI features not being compared with Nvidia's, I would have been surprised to have people to tell me to go visit Tom's Hardware or the like.

    Not to say there wasn't any comparisons, but rather in contrast to say, for example, the One X review where Brian made the comment of how the construction of the device felt better than the GS3. It felt like punches were being pulled in this review is all.
  • phillyry - Sunday, October 21, 2012 - link

    Anand,

    I would like to know, however, how an Android device serves more as an all purpose device than an iPhone.

    Did you mean because of its customisable skins or because it can do some things that an iPhone cannot - presumably because of Apple's strong hold ('death grip') on the OS?

    This is pretty important to me because I am near the end of the term of my agreement and am in the market for a new 'phone'. I've considered W8P for precisely this reason but am waiting to see if they flop or not. I've always thought of Android as pretty darn similar to iOS but with slightly different interfaces and less user restrictions.

    Is there some other factor that makes an Android any more like a pocket computer, like the future x86 W8P phones will presumably eventually be, and less like an iPhone than I have imagined?
  • phillyry - Sunday, October 21, 2012 - link

    I also took notice of it when Anand referred to the iPhone as an appliance. Your remark saying, "The point of an appliance is to have a reliable, consistent device that you spend more time using than tweaking" would be comforting but I don't think that that's quite how Anand meant it. I was actually quite put off by the term because I think that he meant that the iPhone is made to be more of a tag along device that goes with your other Macs and plays a support role rather than a stand alone device. He pretty much says as much.

    Like I said, I found this a bit off putting but I think he's just saying how he sees it in terms of the respective companies' product lines and agendas. It actually makes a fair bit of sense. I found that when I got an iPhone it made me want an iPad. And then when I got an iPad it made me want a MacBook. Call it what you will but I remember thinking that they should be able to make it so that I can do everything I need to on an iPad but distinctly felt like I really needed a MacBook to really do all that I wanted. It could be argued, along the lines of Anand's original comment, that this is Apple's approach / business model.

    It also points to a distinction between Apple and the other big player that no one in this forum is talking about - Microsoft. Windows 8 appears to be meant to be the exact opposite of this approach. Instead of one device for each purpose it's one device for all purposes. It will be interesting to see if Microsoft's approach with Windows 8 will turn things around or simply flop, at least on the handheld device side of things.

Log in

Don't have an account? Sign up now