Conclusion & Thoughts

The Cortex A76 presents itself a solid generational improvement for Arm. We’ve been waiting on a larger CPU microarchitecture for several years now, and while the A76 isn’t quite a performance monster to compete with Apple’s cores, it shows how important it is to have a balanced microarchitecture. This year all eyes were on Samsung and the M3 core, and unfortunately the performance increase came at a great cost of power and efficiency which ended up making the end-product rather uncompetitive. The A76 drives performance up but on every step of the way it still deeply focused on power efficiency which means we’ll get to see the best of both worlds in end products.

In general Arm promises a 35% performance improvement which is a significant generational uplift. Together with the fact that the A76 is targeted to be employed in 7nm designs is also a boost to the projected product.

I’m having some reservations in terms of the performance targets and if vendors will indeed release the SoC with quad-core clock rates of up to 3GHz – based on what I’ve heard from vendors that seems like a rather very optimistic target. Even then, a reduced clock frequency still brings significant benefits, and it’s especially on the efficiency side where Arm should be lauded for continuing to place great focus on.

Whether my projections are correct or not is something we’ll have to see in actual products, but fact is that we *will* see significant efficiency benefits in the next generation of SoCs which should bring both an notable performance improvement as well as battery life improvement to the user. Arm’s focus here on the user experience seems to be exemplary and I hope vendors will be able to implement the core based on Arm’s guidance and reach the targeted metrics.

The Cortex A76 is said to have already come back in working silicon at two partners and we’ll very likely see it shipping in commercial products by the end of the year. I won’t be beating around the bush here as Huawei and HiSilicon’s product cycle schedule makes it obvious that they’re likely one of the launch partners for the product. Qualcomm has also doubled down on using Arm cores in the mobile space so we should also be seeing the next generation Snapdragon SoCs employ the A76. Among the big players, it’s Samsung LSI which is going to have a tough time – the A76 doesn’t seem to greatly outperform the M3, so at least in theory, the M4’s focus will need to be solely on power efficiency. Then again Arm is very open about their design goals; half the area and half the power at similar performance is something that’s going to be hard to compete against.

The Cortex A76 is said to be the baseline microarchitecture on which Arm will iterate over the next 2 generations at least. Arm has been able to execute their yearly beat roadmap on time for 5 generations now and with yearly 20-25% CAGR it’s going to be a very interesting next couple of years as the mobile space is very quickly approaching the performance of desktop CPUs.

Cortex A76 - Performance & Power Projections
Comments Locked

123 Comments

View All Comments

  • serendip - Thursday, May 31, 2018 - link

    Does anyone actually use the full performance of the A11 or A12 in daily tasks? To me, it's pointless to have a power hungry and fast core just for benchmarks. Just make a slightly slower core with less power usage for quick bursts like app loading or Web page rendering, while much slower and more efficient cores handle the usual workload.
  • jOHEI - Thursday, May 31, 2018 - link

    ARM's objectives is to make CPU's that go into a cluster of 4+ another 4 small ones.
    What Apple has does is Making bigger cores >2 times the size of an ARM core and have 1.5x the performance of the Said core. That same CPU is made for very high Power consumption at maximum load and Apple tweaks the ammount of time it stays in those high clocks. Thus its easier to make a laptop chip-tablet and phone. Because you just reuse the same CPU for all of them, maybe add a few cores to the laptop version and tweak the power settings and its relatively easy.
  • jOHEI - Thursday, May 31, 2018 - link

    Forgot to mention that Apple goes for 2 core clusters, not 4. So they must have significantly better single core performance to matchup against the Conpetition.
  • name99 - Friday, June 1, 2018 - link

    Just to correct that you are somewhat living in the past with your numbers.

    ARM no longer cares about 4-sized clusters; that was an artifact of big.LITTLE (and one of the constraints that limited that architecture's performance). The successor to big.LITTLE, brand-named dynamIQ, does not do things in blocks of 4 anymore.

    Likewise Apple first released 3 CPUs in a SoC with the A8X. The A10X likewise has three CPUs. It's entirely likely (though no-one knows for sure) that the A11X (or A12X if the 11X is skipped) will have 4 large cores.
  • BurntMyBacon - Friday, June 1, 2018 - link

    Due to ARM's licensing model, they have every incentive to push designs that cater to more cores. They have little incentive to push single threaded performance any more than necessary as this would result in few cores being licensed due to space and power constraints. I'm not fully convinced that the whole big.LITTLE (and derivative) philosophy was the best way to go either. It could be that it got close enough to what advanced power management could do with the benefit of providing a convincing case for ARM CPU designers to use double the cores or more. When Intel was still in the market, they demonstrated that a dual core chip with clock gating, power gating, power monitoring, dynamic voltage and frequency scaling, and other advanced power management features could provide superior single thread performance and comparable multithread performance in a similar power envelop to competing ARM designs with double the cores (all while burdened with the inefficient x86 decoder). Apple also had good success employing a similar philosophy until their A10 design. Though, it is not necessarily causal, it is interesting to note that they've had more trouble keeping within their thermal and power constraints on their latest A11 big.LITTLE design.

    Note: I don't have any issue with asymmetric / heterogeneous CPUs. I'm just not convinced that they are adequate replacements for good power management built into the cores. DynamIQ does seem to be a push in the right direction allowing simultaneous usage of all cores, providing hooks for accelerators, and providing fine grained dynamic voltage and frequency scaling. This makes a lot of sense when you can assign tasks to processors (or accelerators) with significantly better proficiency for the task in question. Switching processors for no other reason than it is lower power, however, just sounds like the design team had no incentive to further optimize their power management on the high performance core.
  • name99 - Friday, June 1, 2018 - link

    Again a correction.
    Apple's problems with the A10 and A11 are NOT problems of power management; they are problems of CURRENT DRAW. Power management on the chips works just fine (and better than ever; high performance throttling tends to occur less with each successive generation, and it used to be possible to force reboot an iPhone if you got it hot enough, now that seems impossible because of better power management).

    Current draw, on the other hand, is not something the SoCs were designed to track. And so when an aging battery is no longer able to provide max current draw (when everything on the SoC is lined up just wrong) then not enough current IS provided, and the system reboots.
    This is definitely a flaw in the phone as a whole, but it's a system-wide flaw, and you can imagine how it happened. The SoC was designed assuming a certain current drive because no-one thought about aging batteries, because no-one (in Apple or outside) had hit the problem before.

    I expect the A12 will have the same PMU that, today, monitors temperatures everywhere to make sure they remain within bounds, ALSO tracking a variety of proxies for current draw, and will be capable of throttling performance gradually in the face of extreme current draw, just like performance is throttled gradually in the face of extreme temperature.
  • eastcoast_pete - Friday, June 1, 2018 - link

    Different design and use philosophies. Apple's mobile chips are designed to be able to deliver short bursts of very high processing power (opening a complex webpage, switching between apps), and throttle back to Okay fast during the remainder. That requires apps and OS to be tightly controlled and behave really well - one bad app that doesn't behave and keeps driving the CPU hard for longer periods and your phone would get hot (thermal throttling) , plus your battery would run down in a jiffy. For ARM & Co on Android/Linux, it makes more sense to have smaller, less powerful cores, manage energy consumption through other means (BigLittle etc), and increase performance by increading the number of cores/threads. Basically, if you really want to upscale the performance of stock ARM designs for a laptop or similar, you could dump the "little" cores and go for an octacore or decacore BIG, so all A76 cores. Might be interesting if somebody tries it.
  • serendip - Friday, June 1, 2018 - link

    It's not so simple - small A55 cores seem to work better in a quad or hexacore config, whereas A75s are best left in a dual core config at most because their perf/watt is poor. No point having a phone that's crazy fast but overheats and runs out of battery quickly.

    Apple's use of powerful but power-hungry cores could also affect the longevity of older phones. Older batteries might not be able to supply enough power for a big core running at full speed.
  • BurntMyBacon - Friday, June 1, 2018 - link

    The fact that Apple is able to use an even larger and more power hungry core and a (marginally?) smaller battery should tell you that it is doable. Though, you are correct in saying it's not simple. The fact of the matter is, Apple has implemented much better power management features than ARM to allow for their cores to run at higher peak loads while needed and then being able to throttle down to lower power draw very quickly. ARM simply didn't design the A75 to do low power processing. The A75 is designed to rely on the A55 for low power processing as this provides an incentive to sell more core licenses.
  • BillBear - Friday, June 1, 2018 - link

    Traditionally, Apple builds a big ass core and clocks it low.

    It wasn't until FinFET made it to mobile chips that they started clocking higher.

    >Apple has always played it conservative with clockspeeds in their CPU designs – favoring wide CPUs that don’t need to (or don’t like to) clock higher – so an increase like this is a notable event given the power costs that traditionally come with higher clockspeeds. Based on the underlying manufacturing technology this looks like Apple is cashing in their FinFET dividend, taking advantage of the reduction in operating voltages in order to ratchet up the CPU frequency. This makes a great deal of sense for Apple (architectural improvements only get harder), but at the same time given that Apple is reaching the far edge of the performance curve I suspect this may be the last time we see a 25%+ clockspeed increase in a single generation with an Apple SoC.

    https://www.anandtech.com/show/9686/the-apple-ipho...

    Qualcomm has been building small cores and vendors have been clocking them (with corresponding voltage increases) high.

    Remember all the Android vendors getting caught red handed changing clockspeeds when they detected benchmarks running?

Log in

Don't have an account? Sign up now