iPhone Performance Across Generations

 

We did this in the iPhone 5 review, so I thought I'd continue the trend here. For those users who have no desire to leave iOS and are looking to find the best time to upgrade, these charts offer a unique historical look at iPhone performance over the generations. I included almost all iPhone revisions here, the sole exception being the iPhone 3G which I couldn't seem to find. 
 
All of the devices were updated to the latest supported version of iOS. That's iOS 7 for the iPhone 4 and later, iOS 6.1.3 for the iPhone 3GS and iOS 3.1.3 for the original iPhone.
 
At its keynote, Apple talked about the iPhone 5s offering up to 41x the CPU performance of the original iPhone. Looking at SunSpider however, we get a very different story:

iPhone Generations - SunSpider 1.0

Performance improved by a factor of 100x compared to the original iPhone. You can cut that in half if the iPhone could run iOS 4. Needless to say, Apple's CPU performance estimates aren't unreasonable. We've come a long way since the days when ARM11 cores were good enough.

Even compared to a relatively modern phone like the iPhone 4, the jump to a 5s is huge. The gap isn't quite at the level of an order of magnitude, but it's quickly approaching it. Using the single core iPhone 4 under iOS 7 just feels incredibly slow. Starting with the 4S things get a lot better, but I'd say the iPhone 4 is at the point now where it's starting to feel too slow even for normal consumers (at least with iOS 7 installed).

iPhone Generations - Browsermark 2.0

Browsermark 2.0 gives us a good indication of less CPU bound performance gains. Here we see over a 5x increase in performance compared to the original iPhone, and an 83% increase compared to the iPhone 4.

I wanted to have a closer look at raw CPU performance so I turned to Geekbench 3. Unfortunately Geekbench 3 won't run on anything older than iOS 6, so the original iPhone bows out of this test.

iPhone Generations - Geekbench 3 (Single Threaded)

Single threaded performance scaled by roughly 9x from the 3GS to the iPhone 5s. The improvement since the iPhone 4/4S days is around 6.5x. Single threaded performance often influences snappiness and UI speed/feel, so it's definitely an important vector to scale across.

iPhone Generations - Geekbench 3 (Multi Threaded)

Take into account multithreaded performance and the increase over the 3GS is even bigger, almost 17x now.

The only 3D test I could get to reliably run across all of the platforms (outside the original iPhone) was Basemark X. Again I had issues getting Basemark X running in offscreen mode on iOS 7 so all of the tests here are run at each device's native resolution. In the case of the 3GS to 4 transition, that means a performance regression as the 3GS had a much lower display resolution to deal with.

iPhone Generations - Basemark X (Onscreen)

Apple has scaled GPU performance pretty much in line with CPU performance over the years. The 5s scores 15x the frame rate of the iPhone 4, at a higher resolution too.

iPhone 5s vs. Bay Trail

I couldn't help but run Intel's current favorite mobile benchmark on the iPhone 5s. WebXPRT by Principled Technologies is a collection of browser based benchmarks that use HTML5 and js to simulate a number of workloads (photo editing, face detection, stocks dashboard and offline notes).

iPhone 5s vs. Bay Trail - WebXPRT (Chrome/Mobile Safari)

Granted we're comparing across platforms/browsers here, but the 5s as a platform does extremely well in Intel's favorite benchmark. The 5c by comparison performs a lot more like what we'd expect from a smartphone platform. The iPhone 5s is in a league of its own here. While I don't expect performance equalling the Atom Z3770 across the board, the fact that Apple is getting this close (with two fewer cores at that) is a testament to the work done in Cupertino.

At its launch event Apple claimed the A7 offered desktop class CPU performance. If it really is performance competitive with Bay Trail, I think that statement is a fair one to make. We're not talking about Haswell or even Ivy Bridge levels of desktop performance, but rather something close to mobile Core 2 Duo class. I've broken down the subtests in the table below:

WebXPRT Performance (time in ms, lower is better)
Chrome/Mobile Safari Photo Effects Face Detection Stocks Offline Notes
Apple iPhone 5s (Apple A7 1.3GHz) 878.9 ms 1831.4 ms 436.1 ms 604.6 ms
Intel Bay Trail FFRD (Atom Z3770 1.46GHz) 693.5 ms 1557.0 ms 542.9 ms 737.3 ms
AMD A4-5000 (1.5GHz) 411.2 ms 2349.5 ms 719.1 ms 880.7 ms
Apple iPhone 5c (Apple A6 1.3GHz) 1987.6 ms 4119.6 ms 763.6 ms 1747.6 ms

It's not a clean sweep for the iPhone 5s, but keep in mind that we are comparing to the best AMD and Intel have to offer in this space. I suspect part of why this is close is because both of those companies have been holding back a bit (there's no rush to build the fastest low margin parts), but it doesn't change reality.

 

CPU Performance GPU Architecture & Performance
Comments Locked

464 Comments

View All Comments

  • MatthiasP - Tuesday, September 17, 2013 - link

    Wow, first real review on the web AND deep as always, a very nice job from Anand. :)
  • sfaerew - Wednesday, September 18, 2013 - link

    Benchmarks(GFXBench 2.7,3DMark.Basemark X.etc.) are AArch64 version?
    There are 30~40% performance gap between v32geekbench and v64geekbench.
    INT(ST)1471 vs 1065.
    FP(ST)1339 vs 983
  • Wilco1 - Wednesday, September 18, 2013 - link

    And Bay Trail Geekbench at 2.4GHz: 1063 (INT), 866 (FP)

    So A7 has beaten BT already by a huge margin despite BT not even being for sale yet...
  • TraderHorn - Wednesday, September 18, 2013 - link

    You're comparing 64bit A7 vs 32bit BT. The 32bit #s are dead even. It'll be interesting to see if BT gets a similar performance boost when Win8 64bit versions are released in 1h 2014.
  • Wilco1 - Wednesday, September 18, 2013 - link

    BT's 32-bit result includes hardware accelerated AES, which skews its score (without it, its score is ~936). The 64-bit A7 result does also use hardware acceleration, so it is more comparable.

    Yes BT will get a speedup from 64-bit as well, but won't be nearly as much as A7 gets: its 32-bit result already has the AES acceleration, and x64 nearly isn't as different from x86 as A64 is from A32.

    However the interesting things is that not even in 32-bit A7 wins by a good margin, but that it wins despite running at almost half the frequency of Bay Trail... Forget about Bay Trail, this is Haswell territory - the MacBook Air with the 15W 3.3GHz i7-4650U scores 3024 INT and 3003 FP.

    Now imagine a quad core tablet/laptop version of the A7 running at 2GHz on TSMC 20nm next year.
  • smartypnt4 - Wednesday, September 18, 2013 - link

    Why does the frequency matter? If the TDP of the chips are similar (Bay Trail was tested and verified by Anand as using 2.5W at the SoC level under load), who gives a flip about the frequency?

    If Apple wanted to double the frequency of the chip, they'd need something on the order of 4x the amount of power it already consumes (assuming a back-of-the-napkin quadratic relationship, which is approximately correct), putting it at ~6-8W or so at full load. That's assuming such a scaling could even be done, which is unlikely given that Apple built the thing to run at 1.3GHz max. You can't just say "oh, I want these to switch faster, so let's up the voltage." There's more that goes in to the ability to scale voltage than just the process node you're on.

    Now, I will agree that this does prove that if Apple really wanted to, they could build something to compete with Haswell in terms of raw throughput. Next year's A8 or whatever probably will compete directly with Haswell in raw theoretical integer and FP throughput, if Apple manages to double performance again. That's not a given since they had to use ~50% more transistors to get a performance doubling from the A6 to the A7, and building a 1.5B transistor chip is nontrivial since yields are inversely proportional to the number of transistors you're using.

    Next year will be really interesting, though. What with Apple's next stuff, Broadwell, the first A57 designs, Airmont, and whatever Qualcomm puts out (haven't seen anything on that, which is odd for Qualcomm.)
  • Wilco1 - Wednesday, September 18, 2013 - link

    Frequency & process matters. Current phones use about 2W at max load without the screen (see recent Nexus 7 test), so the claimed 2.5W just for BT is way too much for a phone. That means (as you explained) it must run at a lower frequency and voltage to get into phones - my guess we won't see anything faster than the Z3740 with a max clock of 1.8GHz. Therefore the A7 will extend its lead even further.

    According to TSMC 20nm will give a 30% frequency boost at the same power. So I'd expect that a 2GHz A7 would be possible on 20nm using only 35% more power. That means the A7 would get 75% more performance at a small cost in power consumption. This is without adding any extra transistors.

    Add some tweaks (like faster memory) and such a 2GHz A7 would be similar in performance as the 15W Haswell in MacBook Air. So my point is that with a die shrink and a slight increase in power they already have a Haswell competitor.
  • smartypnt4 - Wednesday, September 18, 2013 - link

    Frequency and process matter in that they affect power consumption. If Intel can get Bay Trail to do 2.4GHz on something like 1.0V, then the power should be fine. Current Haswell stuff tops out its voltage around 1.1V or so in laptops (if memory serves), so that's not unreasonable.

    All of this assumes Geekbench is valid for comparing HSW on Win8 to ARMv8/Cyclone on iOS, which I have serious reservations about attempting to do.

    The other issue I have is this: you're talking about a 50% clock boost giving a 100% increase in performance if we look at the Geekbench scores. That's simply not possible. Had you said "raise the clock to 1.6-1.7GHz and give it 4 cores," I'd be right behind you in a 2x theoretical performance increase. But a 50% clock boost will never yield a 100% increase with the same core, even if you change the memory controller.

    Also, somehow your math doesn't add up for power... Are you hypothesizing that a 2GHz A7 (with 75% of the performance of Haswell 15W, not the same - as per Geekbench) can pull 2.6W while Haswell needs 15W to run that test? Granted, Haswell integrates things that the A7 doesn't. Namely, more advanced I/O (PCIe, SATA, USB, etc.), and the PCH. Using very fuzzy math, you can claim all of that uses 1/2 the power of the chip.

    That brings Haswell's power for compute down to 7-8W, more or less. And you're going to tell me that Apple has figured out how to get 75% of the performance of a 7W part in 2.6W, and Intel hasn't? Both companies have ~100k employees. One is working on a ton of different stuff, and one makes processors, basically exclusively (SSDs and WiFi stuff too, but processors is their main drive). You're telling me that a (relatively) small cadre of guys at Apple have figured out how to do it, and Intel hasn't done it yet on a part that costs ~6x as much after trying to get deep into the mobile space for years. I find that very hard to believe.

    Even with the 14nm shrink next year, you're talking about a 30% power savings for Intel's stuff. That brings the 15W total down to 10.5W, and the (again, super, ridiculously fuzzy) computing power to ~5-6W. On a full node smaller than what Apple has access to. And you're saying they'd hypothetically compete in throughput with a 2.6W part. I'm not sure I believe that.

    Then again, I suppose theoretical bandwidth could be competitive. That's simply a factor of your peak IPC, not your average IPC while the device is running. I don't know enough about the low level architecture of the A7 (no one does), so I'll just leave it here I guess.

    I'm gonna go now... I'm starting to reason in circles.
  • Wilco1 - Wednesday, September 18, 2013 - link

    The sort of "simple" tweaks I was thinking of are: an improved memory controller and prefetcher, doubling of L2, larger branch predictor tables. Assuming a 30% gain due to those tweaks, the result is a 100% speedup at 2GHz (1.3 to 2.0 GHz is a 54% speedup, so you get 1.54 * 1.3 = 2.0x perf). The 30% gain due to tweaks is pure speculation of course, however NVidia claims 15-30% IPC gain for similar tweaks in Tegra 4i, so it's not entirely implausible. As you say a much simpler alternative would be just to double the cores, but then your single threaded performance is still well below that of Haswell.

    You can certainly argue some reduction in the 15W TDP of Haswell due to IO, however with Turbo it will try to use most of that 15W if it can (the Air goes up to 3.3GHz after all).

    Yes I am saying that a relative newcomer like Apple can compete with Intel. Intel may be large, but they are not infallible, after all they made the P4, Itanium and Atom. A key reason AMD cited for moving into ARM servers was that designing an ARM CPU takes far less effort than an equivalent performing x86 one. So the ISA does still matter despite some claiming it no longer does.
  • smartypnt4 - Wednesday, September 18, 2013 - link

    My point wasn't that Apple can't compete; far from it. If anything, the A7 shows they can compete for the most part. However, what you suggest is that Apple could theoretically have the same performance as Intel on a full node process larger at half the power. I

    have no illusions that Intel is infallible. Stuff like Larrabee and the underwhelming GPU in Bay Trail prove that they aren't. I just seriously doubt that Apple could beat Intel at its own game. Specifically, in CPU performance, which is an area it's dominated for years. It's possible, but I find it relatively unlikely, especially this early in Apple's lifetime as a chip designer.

    On a different note, after looking at the Geekbench results more, I feel like it's improperly weighted. The massive performance improvement in AES and SHA encryption may be skewing the overall result... I need to dig more in to Geekbench before coming to an actual conclusion. I'm also still not convinced that comparing cross-platform results is actually valid. I'd like to believe it is, but I've always had reservations about it.

Log in

Don't have an account? Sign up now