Back to Article

  • III-V - Friday, June 30, 2017 - link

    That's a beefy L2 Reply
  • Kevin G - Friday, June 30, 2017 - link

    Helpful to keep the memory bus powered down more often, not explicitly for raw performance gains (though it helps there too). 8 MB on a mobile SoC is still a lot of cache. Reply
  • thesandbenders - Saturday, July 01, 2017 - link

    The RAM is DDR, if you power it down you lose your data. A larger L2 does let you run the the RAM at a lower clock speed with less perceived performance impact for the user. A lower clock speed will generally lower power consumption. Reply
  • Nokiya Cheruhone - Tuesday, July 04, 2017 - link

    I see you have no clue about how DDR-SDRAM works. Reply
  • thesandbenders - Tuesday, July 04, 2017 - link

    I didn't realize "powered down" was commonly used to refer to idle and LP states, I stand corrected. Reply
  • name99 - Friday, June 30, 2017 - link

    Not really. It only seems large compared to recent Intel designs which have focussed on having large L3 and small (but low-latency and high throughput L2).
    Compare with Penryn, for example, in 2007, which gave 3MiB to each core. Apple is giving 2.67MiB to each core --- basically the same sort of capacity.

    The main thing to take away, I think, is that the exact details of a cache system (even at the most basic level of the sizes and the inclusivity) don't have a single correct answer --- the space of "good design" is fairly voluminous, and it doesn't take much of a change in exactly what you're trying to optimize for to shift the design in a way that looks substantial, but is still only a percent or so different in performance.
  • Santoval - Sunday, July 02, 2017 - link

    I believe he implicitly (yet very obviously) meant "... for a mobile SoC". Your comparison is from an entirely different product category, so it really makes no sense. Reply
  • Eug - Friday, June 30, 2017 - link

    First! 10 nm FF
  • StevoLincolnite - Friday, June 30, 2017 - link

    It is not a real 10nm process. Reply
  • RPE33 - Friday, June 30, 2017 - link

    Fake news confirmed by StevoLincolnite!!! Reply
  • kfishy - Friday, June 30, 2017 - link

    According to the article it is a full node scaling, which is pretty impressive these days. Reply
  • StevoLincolnite - Friday, June 30, 2017 - link

    TSMC is using a 14nm BEOL for it's 10nm process.

    It might be a "full node" scaling from it's 16nm FF process, which actually used a 20nm BEOL. But a true 10nm process it is not.


    I'll assume RPE33 is a sarcastic troll.
  • Morawka - Saturday, July 01, 2017 - link

    Welcome to chip fab 101.. Even Intels 14nm BEOL is using a larger metal interconnect. None of them are proper Reply
  • Santoval - Sunday, July 02, 2017 - link

    Actually no BEOL is, or can possibly be, 10 or 14nm for a 10 or 14nm process, because if you make the copper wires of all the BEOL layers that thin you will both increase their resistance and they will just sublimate from the heat. BEOLs have multiple layers, and the real problem with them is not with the upper layers (where the copper wires get progressively bigger) but with the lowest couple of layers that interface with the FEOL part (aka the transistors). Only these one or two bottom layers need to be very thin, because they interface with the multitude of tiny transistors, and only these are the layer(s) that can potentially be almost as small as the lithography process (the part Intel calls "14nm BEOL", which is actually only the bottom BEOL layer). These one or two layers are also the weakest part of a CPU, due to the very high resistance of the copper wires, due to them being very thin. The bottom BEOL layer can also be viewed as the top BEOL layer, depending on how you look at a CPU stack. But it's always the one that directly interfaces with the FEOL segment. Reply
  • EasyListening - Monday, July 03, 2017 - link

    Does this have something to do with delivering electricity throughout the chip? Maybe the wires are larger on purpose to allow for higher voltages on the BEOL layers. (?) Reply
  • omf - Friday, June 30, 2017 - link

    They certainly spent some of the power budget in those new iPads on higher screen refresh rates. Reply
  • name99 - Friday, June 30, 2017 - link

    Unclear. The screen refresh rate is adaptive. It may well be a net win under almost all circumstances. Most of the time when you're just reading something the refresh rate can be lower, likewise for movies; it only has to kick in for UI (animations+tracking).

    My experience (which is of course only anecdotal, not scientific) is that my iPad Pro 12.9" is drawing down battery substantially slower than my iPad Air, even though I mostly use it for reading technical PDFs.
  • kfishy - Friday, June 30, 2017 - link

    The 12.9" Pro also has a substantially larger battery, which would also help when reading static documents. Reply
  • jjj - Friday, June 30, 2017 - link

    So much ink on Apple going 10nm for iPad first when the simple fact is that the timing for this node allowed it, that's all.
    I do find it interesting that a medium volume SoC like this one gets coverage while when Techinsights took a look at SD835 , there was silence and that SoC is many times more relevant.
  • ws3 - Friday, June 30, 2017 - link

    The SD835 is far less interesting technologically.
    Its single core performance is so far behind Intel and Apple that it is uninteresting.
    It runs 8 cores at a time to reach reasonable peak scores in multicore benchmarks, but that that 8-core score is mostly irrelevant to end users, as they won't be running software that takes full advantage of all 8 cores simultaneously.
  • Spunjji - Friday, June 30, 2017 - link

    ...except for the browser, which is a huge part of what people use and takes full advantage of as many cores as you throw at it. Reply
  • WinterCharm - Friday, June 30, 2017 - link

    People spend more time in apps than they do on the web browser of a mobile device. Reply
  • Solandri - Friday, June 30, 2017 - link

    That's actually the problem with mobile devices. You have to install a hundred apps to do the same thing you can do with a single browser on a PC. Because every website out there tries to get you to install their app instead of use their website (probably so they can harvest your data and track everything you do). Heck, some of the forum websites I visit on my phone or tablet spam me with a popup to install their app. If I did that for every site I visit, I'd need 200+ GB of storage on my phone.

    Can you imagine how horrible the Internet would be if each website you visited required you to install a new program to access the site? That's what mobile is like. Programs/apps are for when you're doing stuff locally. Browsers and remote desktops are for when you're accessing data remotely.
  • melgross - Friday, June 30, 2017 - link

    That’s not true either. Where are you getting this from? Reply
  • lefty2 - Friday, June 30, 2017 - link

    Yeah, but app developers need to make a living! Reply
  • RPE33 - Friday, June 30, 2017 - link

    I'm sorry, but what you said is complete tripe. Reply
  • MonkeyPaw - Friday, June 30, 2017 - link

    Nonsense. You can pretty much use the website for many things, but it's nice to have the app for the places you visit often. For me, I've installed eBay and Amazon apps, but I just use the websites for things like PayPal or B&H. Sometimes the apps are a better option, like when notifications matter. Reply
  • asendra - Friday, June 30, 2017 - link

    Browsers may be the best single use case for the necessity of faster cores, because Javascript is not multi-threaded.
  • melgross - Friday, June 30, 2017 - link

    Browsers don’t use that many cores. Reply
  • Ppietra - Friday, June 30, 2017 - link

    The browser rarely makes use of all 8 cores, there aren’t that many processes created by the browser that are multithreaded and in order to take full advantage of 8 cores it really needs a good multithreaded process. The browsers do run more than 1 process at the same time but the main processes are single-threaded and there aren’t that many concurrent processes happening. Reply
  • blackcrayon - Friday, June 30, 2017 - link

    The browser uses all 8 cores? Then you would think it would outperform Safari on the new iPads. Unless the browser code is also extremely inefficient. Reply
  • DarrenR - Friday, June 30, 2017 - link

    Javascript is far and away the biggest CPU drain in any browser, and it's single threaded. Due to the nature of JS and browsers, it's also generally blocking - meaning nothing else the browser wants to do can happen until JS has been processed, including processing other JS. Single core performance is far and away the biggest indicator of browser speed... Reply
  • name99 - Friday, June 30, 2017 - link

    Unclear. What do you define as "the browser" (ie what sort of workload) and what evidence do you have that this workload is substantially accelerated by having 8 cores available?
    It is true that JS CAN be written to use threads; but it's also true that large amounts of the browser experience (basic layout, CSS, DOM manipulation, running JIT and networking on a separate thread, etc) run out of steam after more that two CPUs.

    It is instructive to compare the browser benchmark results for iPad Pro vs iPhone7. We have the essentially same micro-architecture and frequency. iPad has larger L2, but iPhone has exclusive L3 so not THAT different overall. iPad was wider memory bus so same DRAM latency but twice the bandwidth. And, most important, iPad has three cores, iPhone has two.

    You can see the results here:
    Basically across all three browser benchmarks, the iPad results are not THAT larger than iPhone --- the sort of improvement you'd expect from the caches and memory subsystem, but not 50% extra from an extra core.
  • techconc - Friday, June 30, 2017 - link

    Javascript is run in the browser and that's largely only leverages a single core. That's part of the reason why iOS devices trounce anything on Android with Javascript benchmarks. Reply
  • kfishy - Friday, June 30, 2017 - link

    Can you imagine the throttling if browsers use 8 cores all the time... Reply
  • StrangerGuy - Saturday, July 01, 2017 - link

    Besides, Apple is already beating every ARM design with one hand tied to their back by just with 3 cores and the cores collectively only take a small piece of the 96.4mm2 real estate. If they drop the kiddo gloves and throw 8 of them in there, I bet Apple would even give desktop Ryzen a run for its MT money. Reply
  • nikaldro - Friday, June 30, 2017 - link

    This is BS. Mobile apps have been getting very well threaded in the last years. Reply
  • Ppietra - Friday, June 30, 2017 - link

    What you are usually getting are apps that have more than one process, to separate rendering from app logic, something that a 2 core CPU will do just as well. 8 core multithreaded processes is something that you will only find in a few apps in image, video and audio processing, and some games, and it needs good code. Reply
  • serendip - Saturday, July 01, 2017 - link

    What about background multitasking? With a few gigs of RAM, it's conceivable that a rendering process could run at the same time a web page was being loaded.

    Does iOS run more on the timeslicing multitasking model, with a fast CPU constantly switching between tasks, or is it more like the Android throw-more-cores model?
  • Ppietra - Saturday, July 01, 2017 - link

    What you are describing is what I described: apps have more than one process, to separate rendering from app logic. A 2 core CPU can do that just as well and if the cores are faster you will probably see better performance.
    If you want to talk about running another app in the background, or system background tasks, I would think there would be a small responsive advantage of having more than 2 cores CPU in a smartphone, but there isn’t much need for 8 cores in something like that either. But since there isn’t a public benchmark to compare background app multitasking performance between the iPhone and Android phones, it is uncertain which CPU solution has a background performance advantage .
    iOS uses every resource it has available, it has basically the same multicore and multiprocessor support as the macOS - Apple designed SoC have 2 to 6 cores (3 cores available per app).
  • Nokiya Cheruhone - Tuesday, July 04, 2017 - link

    People still don't understand that background multitasking (implemented in a way that makes it possible to run user code in the background without restrictions) is useless on mobile devices. Use the APIs for background tasks (like fetching new info or doing push notifs) instead. Thanks for the troll. Reply
  • metafor - Friday, July 07, 2017 - link

    The OS itself runs many processes in the background though. In theory, with a small enough and efficient enough core, it makes sense to run that on a core that's separate from the large performance core. This lets the performance core somewhat dedicate itself to a heavy javascript process without context switching. Reply
  • kfishy - Friday, June 30, 2017 - link

    Snapdragon 835 was manufactured by Samsung, as the article states this is the first TSMC 10nm SoC shipping in actual devices. Reply
  • pav1 - Friday, June 30, 2017 - link

    Moral of the story - wait till A11 to see more. The IPad Pro flies... so be happy with what you have. Reply
  • gigathlete - Friday, June 30, 2017 - link

    Thanks for this article Ryan, hopefully you guys will be able to give us a performance preview of this A10X. Seems like a true beast. Reply
  • lefty2 - Friday, June 30, 2017 - link

    "One of the more intriguing mysteries in the Apple ecosystem has been the question over what process the company would use for the A10X SoC"
    Hardly a mystery though, there were several rumours that it was on 10nm
  • melgross - Friday, June 30, 2017 - link

    That’s why it was a mystery. They were rumors. Reply
  • MonkeyPaw - Friday, June 30, 2017 - link

    Perhaps the reason Apple went to 10nm with this SOC was to also be able to use it in the next generation iPhone as well? If we're looking at them launching the 7S and an ultra-premium anniversary edition, they might be planning to use this SOC for one of those 2 models. Reply
  • Anticipate - Friday, June 30, 2017 - link

    I don’t understand. If the chip has the same amount of GPU cores as the A9X, and they are the same cores, and they are clocked similarly, how is the GPU benchmarking so much faster than the A9X? Reply
  • tipoo - Friday, June 30, 2017 - link

    10nm allows higher clocks at the same power. Reply
  • melgross - Friday, June 30, 2017 - link

    The clock is only about 5% higher for a 40% improvement in performance. Reply
  • kfishy - Friday, June 30, 2017 - link

    Might be much faster memory performance, mobile GPUs nowadays are pretty bandwidth hungry. Reply
  • StrangerGuy - Friday, June 30, 2017 - link

    Because Apple's SoC design team is has been far and away the best in the entire industry. They also just killed ImgTec in GPU, the rumors are the guys next on the Apple custom design chopping block is Qualcomm and Dialog on baseband and power management IC respectively. Reply
  • melgross - Friday, June 30, 2017 - link

    It’s a later series. Reply
  • Ppietra - Friday, June 30, 2017 - link

    They aren’t the same GPU cores of the A9X. It should use a similar GPU core to the A10 which was a tweaked version from the previous A9 GPU.
    We don’t know if they have a similar clock speed, the numbers that are shown are for the CPU not the GPU.
  • blackcrayon - Friday, June 30, 2017 - link

    Does it say the cores are clocked similarly though? I thought they could have a different relative clock speed to the overall SoC clock. Also, are we sure they are the exact same cores? Or just that there are the same number of them and they have no new "features" from Apple's software standpoint. Reply
  • Nokiya Cheruhone - Tuesday, July 04, 2017 - link

    Apple doesn't use the same GPU, its (now designed in-house) design is evolving constantly. Reply
  • tipoo - Friday, June 30, 2017 - link

    Soo, any chance of a deep dive? Merged into the a10 one? Reply
  • melgross - Friday, June 30, 2017 - link

    What’s really interesting here is that with such a major shrinkage in t gives Apple a chance to add a lot more to the chip. I imagine that the soon to appear A11 is taking advantage of the same process. Since it’s going to be the second chip using the 10nm process, possibly Apple will feel that they can advance it even more.

    Generally, we find that the next generation phone SoC from them has GPU performance about equal to the previous generations iPad GPU, that had double the cores.

    It’s will be interesting to see whether the A11 has performance exceeding the CPU of the A10x, and GPU performance at almost the same level. Of course, it’s not likely to have 3/3 CPU cores - or will it?
  • Kevin G - Friday, June 30, 2017 - link

    This makes me wonder what they have in store for the A11 in the next iPhone due later this year. I think this sets up the expectation that Apple will use 10 nm there as well. I'd still expect a dual big + dual little design. The change maybe that Apple could enable all four cores simultaneously under heavy load. More cache as we've seen on the A10X is probably a given, I'd guess 6 MB. GPU side is where I'd see the big changes happening for the A11 with a new cluster design. I don't think they'll have their custom GPU ready by then but Apple has been known to surprise. I see Apple adopting the latest PowerVR design and increasing the cluster count. Reply
  • name99 - Friday, June 30, 2017 - link

    "I'd still expect a dual big + dual little design."

    This is not a useful way to look at it; it reflects ARM thinking, not Apple thinking.
    Apple, as far as we can tell, does not design of think of these as "dual big" and "dual little", they think of them as a "flexi-core" that consists of a big and a little very tightly coupled. The difference is that the unit of construction is the "big+little" it's not clusters of big and clusters of little.

    We appear to know that switching between a big and its companion little is done by HW. (Apple talks about a "HW performance controller" doing this job). It also seems to be the case that the two can't run independently (big and little running simultaneously) though it's not clear if this is a HW limitation, an OS limitation, or just a policy decision (Apple experimented and could find no circumstances under which it really made sense).

    If I had to bet, my betting would be that as we move forward the big and little cores will become ever closer, ever more like two sides of a single "flexicore", so perhaps even moving to sharing L1 cache for example. We'll see...

    (ARM has STARTED down this path with DynamIQ --- at least now big and little cores can have a tighter association rather than being forced into separate clusters using separate L2s. Not clear yet if DynamIQ allows for HW to control the toggling between big and little rather than software.)
  • Kevin G - Friday, June 30, 2017 - link

    I'm not disagreeing with you but there isn't much in terms of terminology to quickly describe that arrangement. It is an implementation distinctly different from what ARM is doing but they both do the same thing at a high level.

    Sharing L1 cache would be nice as swapping between the two designs wouldn't haven't move data for warming caches. However, I can it being difficult to keep L1 latencies low in such a scenario Perhaps just a shared L1 data cache and dedicated L2 instruction caches?
  • kfishy - Friday, June 30, 2017 - link

    Heck, since it's on the same silicon and using the same process you can theoretically even share the registers and just swap the big/little pipelines and execution units. Reply
  • name99 - Friday, June 30, 2017 - link

    One problem is that you want the little core not just to have a simpler micro-architecture but ALSO to be built of slower transistors. That reality would seem to constrain how aggressively you can push sharing.

    But there are academic designs (Univ North Carolina Chapel Hill has done a lot work in this) that share almost everything and do the big/little transition by shutting down parts of the big microarchitecture. They utilize counters to predict regions of code that will not benefit from the wide micro-architecture (maybe lots of misses to memory, maybe lots of sequentially dependent instructions, maybe lots of hard to predict misses), and switch between the wide and the more narrow configs at around every thousand instructions or so. In theory these give substantial energy savings at a performance loss of 3..5% (which you can easily make up and more just by cranking the frequency higher).
    But I'm guessing it will be some time before the commercial world gets there! Let's see if they're at least headed that way by seeing whether Apple's next config pulls the two CPUs tighter together.
  • kfishy - Sunday, July 02, 2017 - link

    Oh yeah, slower less leaky transistors definitely help, but even just switching fewer transistors with a simpler pipeline would yield non-trivial power savings. Reply
  • iwod - Friday, June 30, 2017 - link

    Actually this got me thinking may be there wont be A11. Apple will use the same A10X for iPhone 8. Since 10nm is a short node, it is only a stepping stone to 7nm. May be the innovation, Apple made GPU, new CPU architecture, will only come next year? Reply
  • Nullify - Friday, June 30, 2017 - link

    Now where are Samsung and Qualcomm with their higher end SoCs for tablet use? Seems ridiculous to use the same one as your phone when a tablet is where you want to extra power. Reply
  • Araa - Friday, June 30, 2017 - link

    The thing is they don't have anything better than what they put in their phones. Reply
  • blackcrayon - Friday, June 30, 2017 - link

    Seems like Apple is the only one with the profit margin and tablet sales to justify developing customized higher end chips. For Samsung their high end phone chips are "good enough". Of course Apple does this too but only in their lower end tablets at this point (lone exception was the iPad Air). Reply
  • 1_rick - Friday, June 30, 2017 - link

    What the heck is a pipecleaner, in this context? Reply
  • artk2219 - Friday, June 30, 2017 - link

    Cleans up the fabrication process, you will take a higher loss on the production of these chips because the process isnt completely mature. It paves the way for better yields on your more profitable or numerous later products. It cleans out the gunk in the pipes. Reply
  • 1_rick - Friday, June 30, 2017 - link

    "It cleans out the gunk in the pipes."

    Ok, that makes sense. Would've been nice to have had the term explained--I don't think I've ever seen it used here before, although I admittedly don't read every article.
  • name99 - Friday, June 30, 2017 - link

    That's because you don't know enough about the internet. Same thing happens there.

    Once a month VZW, Global Crossing, China Telecom and so on, all the big ISPs, pour pipe cleaner into the internet to clean out the pipes. That's why you get occasional hiccups in the speed.
    Has to be done carefully and synchronized around the whole world so that the cleaner poured into China, for example, can flow out in time and doesn't collide with the cleaner poured into the US.
  • Notmyusualid - Friday, June 30, 2017 - link

    @ name99

  • Icehawk - Friday, June 30, 2017 - link

    I hope this increases battery life significantly, I have the 9.7 Pro and it's battery life is much worse than the prior iPads I've owned. Reply
  • blackcrayon - Friday, June 30, 2017 - link

    Probably any savings are eaten up by this sweet 120Hz (when it needs it) screen. Reply
  • Ej24 - Friday, June 30, 2017 - link

    Holy crap that memory bandwidth is nuts. Why can't we have that on desktops?! Reply
  • SydneyBlue120d - Saturday, July 01, 2017 - link

    Is the CPU 64bit only? Reply
  • NetMage - Sunday, July 02, 2017 - link

    iPad Pro 10.5 runs iOS 10 so obviously not. Reply
  • darkich - Monday, July 03, 2017 - link

    The crazy thing is, iPad pro uses 50% less power than surface pro 5, while crushing it in raw performance benchmarks Reply
  • MrJBlacked - Monday, July 24, 2017 - link

    10mn gets me all tingly Reply

Log in

Don't have an account? Sign up now