An Update on Apple’s A7: It's Better Than I Thought

When I reviewed the iPhone 5s I didn’t have much time to go in and do the sort of in-depth investigation into Cyclone (Apple’s 64-bit custom ARMv8 core) as I did with Swift (Apple’s custom ARMv7 core from A6) the year before. I had heard rumors that Cyclone was substantially wider than its predecessor but I didn’t really have any proof other than hearsay so I left it out of the article. Instead I surmised in the 5s review that the A7 was likely an evolved Swift core rather than a brand new design, after all - what sense would it make to design a new CPU core and then do it all over again for the next one? It turns out I was quite wrong.

Armed with a bit of custom code and a bunch of low level tests I think I have a far better idea of what Apple’s A7 and Cyclone cores look like now than I did a month ago. I’m still toying with the idea of doing a much deeper investigation into A7, but I wanted to share some of my findings here.

The first task is to understand the width of the machine. With Swift I got lucky in that Apple had left a bunch of public LLVM documentation uncensored, referring to Swift’s 3-wide design. It turns out that although the design might be capable of decoding, issuing and retiring up to three instructions per clock, in most cases it behaved like a 2-wide machine. Mix FP and integer code and you’re looking at a machine that’s more like 1.5 instructions wide. Obviously Swift did very well in the market and its competitors at the time, including Qualcomm’s Krait 300, were similarly capable.

With Cyclone Apple is in a completely different league. As far as I can tell, peak issue width of Cyclone is 6 instructions. That’s at least 2x the width of Swift and Krait, and at best more than 3x the width depending on instruction mix. Limitations on co-issuing FP and integer math have also been lifted as you can run up to four integer adds and two FP adds in parallel. You can also perform up to two loads or stores per clock.

I don’t yet have a good understanding of the number of execution ports and how they’re mapped, but Cyclone appears to be the widest ARM architecture we’ve ever seen at this point. I’m talking wider than Qualcomm’s Krait 400 and even ARM’s Cortex A15.

I did have some low level analysis in the 5s review, where I pointed out the significantly reduced memory latency and increased bandwidth to the A7. It turns out that I was missing a big part of the story back then as well…

A Large System Wide Cache

In our iPhone 5s review I pointed out that the A7 now featured more computational GPU power than the 4th generation iPad. For a device running at 1/8 the resolution of the iPad, the A7’s GPU either meant that Apple had an application that needed tons of GPU performance or it planned on using the A7 in other, higher resolution devices. I speculated it would be the latter, and it turns out that’s indeed the case. For the first time since the iPad 2, Apple once again shares common silicon between the iPhone 5s, iPad Air and iPad mini with Retina Display.

As Brian found out in his investigation after the iPad event last week all three devices use the exact same silicon with the exact same internal model number: S5L8960X. There are no extra cores, no change in GPU configuration and the biggest one: no increase in memory bandwidth.

Previously both the A5X and A6X featured a 128-bit wide memory interface, with half of it seemingly reserved for GPU use exclusively. The non-X parts by comparison only had a 64-bit wide memory interface. The assumption was that a move to such a high resolution display demanded a substantial increase in memory bandwidth. With the A7, Apple takes a step back in memory interface width - so is it enough to hamper the performance of the iPad Air with its 2048 x 1536 display?

The numbers alone tell us the answer is no. In all available graphics benchmarks the iPad Air delivers better performance at its native resolution than the outgoing 4th generation iPad (as you'll soon see). Now many of these benchmarks are bound more by GPU compute rather than memory bandwidth, a side effect of the relative lack of memory bandwidth on modern day mobile platforms. Across the board though I couldn’t find a situation where anything was smoother on the iPad 4 than the iPad Air.

There’s another part of this story. Something I missed in my original A7 analysis. When Chipworks posted a shot of the A7 die many of you correctly identified what appeared to be a 4MB SRAM on the die itself. It's highlighted on the right in the floorplan diagram below:


A7 Floorplan, Courtesy Chipworks

While I originally assumed that this SRAM might be reserved for use by the ISP, it turns out that it can do a lot more than that. If we look at memory latency (from the perspective of a single CPU core) vs. transfer size on A7 we notice a very interesting phenomenon between 1MB and 4MB:

That SRAM is indeed some sort of a cache before you get to main memory. It’s not the fastest thing in the world, but it’s appreciably quicker than going all the way out to main memory. Available bandwidth is also pretty good:

We’re only looking at bandwidth seen by a single CPU core, but even then we’re talking about 10GB/s. Lookups in this third level cache don’t happen in parallel with main memory requests, so the impact on worst case memory latency is additive unfortunately (a tradeoff of speed vs. power).

I don’t yet have the tools needed to measure the impact of this on-die memory on GPU accesses, but in the worst case scenario it’ll help free up more of the memory interface for use by the GPU. It’s more likely that some graphics requests are cached here as well, with intelligent allocation of bandwidth depending on what type of application you’re running.

That’s the other aspect of what makes A7 so very interesting. This is the first Apple SoC that’s able to deliver good amounts of memory bandwidth to all consumers. A single CPU core can use up 8GB/s of bandwidth. I’m still vetting other SoCs, but so far I haven’t come across anyone in the ARM camp that can compete with what Apple has built here. Only Intel is competitive.

 

Introduction, Hardware & Cases CPU Changes, Performance & Power Consumption
POST A COMMENT

443 Comments

View All Comments

  • over9k - Tuesday, October 29, 2013 - link

    Two paragraphs in and this is already better than all the other "reviews" out there. Reply
  • Beautyspin - Tuesday, October 29, 2013 - link

    You should not really call any review by Anandtech of Apple products as reviews. They are homages paid to their shrine. This is a ritual with them.. Reply
  • Drumsticks - Tuesday, October 29, 2013 - link

    I always hear people complaining about bias here and elsewhere for apple products. But what exactly is the reason for that? The majority of the review is seriously objective - you can't argue that apple has some of the best performance in he game right now, and the best display to boot. He only thing rivaling it is probably the higher clocked Z3770, while Qualcomm will probably pass Apple's GPU early next year.

    as far as subjectivity goes, even if you don't like the design, the materials are solid. And it manages to be lighter than every other ten inch tablet on the market (and thinner) withot sacrificing battery life. The only subjective things I could possibly see are maybe the sound quality and the OS itself, of which he criticized a few times. Where does the bias come in?
    Reply
  • Fleeb - Tuesday, October 29, 2013 - link

    "and the best display to boot"

    We have yet to wait for the Kindle HDX review but it is lighter, packs more pixels and with 100%RGB gamut.
    Reply
  • darwinosx - Wednesday, October 30, 2013 - link

    Its lighter because it is cheap plastic. It is also a far more limited device. Really laughable to think it compares to an Air. Reply
  • dsumanik - Wednesday, October 30, 2013 - link

    Read this review with a grain of salt. Anand lai shimpi is heavenly vested in apple stock, doing everything he can to boost the dismal situation.

    Thinner bezels and light weight do not hide the fact that functionally, this iPad is the same as the previous 2 generations.

    Sent from my ipad3, which will be upgraded when apple actually updates the product line.

    Here's some basic ideas mr cook:

    Wireless charging
    Fingerprint scanner
    Thunderbolt sync or usb3
    Haptic feedback
    NFC
    Reply
  • John2k13 - Wednesday, October 30, 2013 - link

    You know what's disgusting about your comment, and those similar to yours? That you basically accuse the author of being a liar, a shill, and completely lacking in integrity- without a shred of evidence. I read the entire 10 page review, and it was incredibly detailed, precise, and well-written, something that would be obvious to most sane, rational, objective people.

    "Anand lai shimpi is heavenly vested in apple stock, doing everything he can to boost the dismal situation."

    First of all, what "dismal situation"? Apple stock is up around $130 from a few months ago, or almost a third. Hardly "dismal". Also, do you think a single review from a website visited primarily by tech geeks is going to have any fucking effect on the stock? I mean, are you for real? Don't assume the author holds the same amount of ignorant stupidity that you apparently do, to think for a second this review would have a snowball's chance in hell in affecting stock. You clearly know nothing about how the financial market works.

    "Thinner bezels and light weight do not hide the fact that functionally, this iPad is the same as the previous 2 generations."

    Functionality on a tablet is primarily based on software, and the iPad has 475,000+ optimized apps which are getting more powerful all the time. The hardware simply enables better software. A tablet is basically a blank slate for the software, and better hardware helps in enabling better software. Every single aspect of this iPad is improved, so yes, it is more "functional". That list you made, though, is pretty ridiculous, and obviously a desperate attempt to list anything you can think of that the iPad doesn't have and pretend its significant.

    Wireless charging- why? This makes the device more functional, how?
    Fingerprint scanner- Wow, brilliant "idea". You probably mocked touch ID when it appeared on the 5S. Again, this would be nice to have I guess, but in no way impedes "functionality" of the tablet.
    Thunderbolt sync or usb3- I have no idea what "thunderbolt sync" means, and its pretty ridiculous you're harping on a USB3 port. It will never happen, nor should it.
    Haptic feedback- Utterly useless gimmick, but heym why not, right?
    NFC- I have NFC on my Nexus 4, and not ONCE have I even run into an opportunity or a reason to use it. But yeah, I'm sure you honestly think it's needed or useful on an iPad. Again, another meaningless bulletpoint you were desperate to add mindlessly.

    Next time you want to baselessly accuse an author of being a liar, a shill, a sellout, and having no honesty or integrity, try to make a coherent post that actually contains some intelligent, well thought out information. Otherwise, by attacking the author you just embarrass yourself as you did now. Grow up.
    Reply
  • ABR - Wednesday, October 30, 2013 - link

    Actually thunderbolt sync is one of the changes I'm really waiting for. Have you ever tried restoring even a 16GB iPad over USB? Slow agony. I can't even imagine what someone w/a 64 or 128GB model must go through. Even ordinary everyday syncs are far slower than what they could or should be. Reply
  • Howard Ellacott - Wednesday, October 30, 2013 - link

    You clearly don't realise what thunderbolt is, which is why that's such a stupid suggestion. Yes, faster syncs would be amazing, and restoring a 64gb iPhone is a right pain, but thunderbolt isn't the way. Reply
  • Kristian Vättö - Wednesday, October 30, 2013 - link

    USB 2.0 isn't the real bottleneck there, it's NAND. Most eMMC solutions can't even saturate the USB 2.0 link with sequential writes, so Thunderbolt or USB 3.0 would do absolutely. Reply

Log in

Don't have an account? Sign up now