Conclusion & Thoughts

The Cortex A76 presents itself a solid generational improvement for Arm. We’ve been waiting on a larger CPU microarchitecture for several years now, and while the A76 isn’t quite a performance monster to compete with Apple’s cores, it shows how important it is to have a balanced microarchitecture. This year all eyes were on Samsung and the M3 core, and unfortunately the performance increase came at a great cost of power and efficiency which ended up making the end-product rather uncompetitive. The A76 drives performance up but on every step of the way it still deeply focused on power efficiency which means we’ll get to see the best of both worlds in end products.

In general Arm promises a 35% performance improvement which is a significant generational uplift. Together with the fact that the A76 is targeted to be employed in 7nm designs is also a boost to the projected product.

I’m having some reservations in terms of the performance targets and if vendors will indeed release the SoC with quad-core clock rates of up to 3GHz – based on what I’ve heard from vendors that seems like a rather very optimistic target. Even then, a reduced clock frequency still brings significant benefits, and it’s especially on the efficiency side where Arm should be lauded for continuing to place great focus on.

Whether my projections are correct or not is something we’ll have to see in actual products, but fact is that we *will* see significant efficiency benefits in the next generation of SoCs which should bring both an notable performance improvement as well as battery life improvement to the user. Arm’s focus here on the user experience seems to be exemplary and I hope vendors will be able to implement the core based on Arm’s guidance and reach the targeted metrics.

The Cortex A76 is said to have already come back in working silicon at two partners and we’ll very likely see it shipping in commercial products by the end of the year. I won’t be beating around the bush here as Huawei and HiSilicon’s product cycle schedule makes it obvious that they’re likely one of the launch partners for the product. Qualcomm has also doubled down on using Arm cores in the mobile space so we should also be seeing the next generation Snapdragon SoCs employ the A76. Among the big players, it’s Samsung LSI which is going to have a tough time – the A76 doesn’t seem to greatly outperform the M3, so at least in theory, the M4’s focus will need to be solely on power efficiency. Then again Arm is very open about their design goals; half the area and half the power at similar performance is something that’s going to be hard to compete against.

The Cortex A76 is said to be the baseline microarchitecture on which Arm will iterate over the next 2 generations at least. Arm has been able to execute their yearly beat roadmap on time for 5 generations now and with yearly 20-25% CAGR it’s going to be a very interesting next couple of years as the mobile space is very quickly approaching the performance of desktop CPUs.

Cortex A76 - Performance & Power Projections
Comments Locked

123 Comments

View All Comments

  • iwod - Friday, June 1, 2018 - link

    Even if Apple moved A11 from 10nm to 7nm, and runs at 3Ghz it will still be a huge gap in performance. Let alone they will have A12 and 7nm shipping in a few months time. Compare this to A76, which I don't think will come in 2018.

    So there is still roughly a 3 years gap between ARM and Apple in IPC or Single thread performance.
  • Lolimaster - Friday, June 1, 2018 - link

    And why do you care about IPC, when 99.99% of all smartphone users:

    -Use the phone as a gloried clock
    -A tool for showing off (even with the cancer "dynamic" profile on Samsung AMOLED powered devices, they don't know the "basic" calibrated profile exists)
    -Twitter, facebook, instagram, whatapp

    Where is your need for performance? Unless you buy a phone to run antutu/geekbench all the time you pick the phone out of your pockets.

    The biggest improvement in phone performance was the jump from slow/high latency EMMC to nvme-like nand (apple), UFS for samsung and the others.
  • serendip - Friday, June 1, 2018 - link

    Spot on. I've got a SD650 and a SD625 phone, one with A72 big cores and the other with only A53 cores, and for web browsing and chatting they're almost indistinguishable. The 625 device also has much better battery life.
  • darwiniandude - Friday, June 1, 2018 - link

    Of course a faster device can accomplish a task faster and drop back to idle power effciency to aid battery life. Depends on many factors, but running at (hypothetical) 20 units of performance per second over 5 seconds (total 100) then dropping back to idle might be preferable to 10 units of performance per second over 10 seconds.
    Also, remember Apple’s devices do much on device, the Kinect-like FaceID for one, and unlike Google Photos where images are scanned for content in the cloud (this picture contains a bridge, and a dog) iOS devices scan their libraries on device when on charge.
  • name99 - Friday, June 1, 2018 - link

    That's like saying Intel shouldn't bother with performance any more because 99.99% of PCs run Facebook in the web browser, email, and Word.

    (a) Apple sells delight, and part of delight in your phone is NEVER waiting. If you want to save money, buy a cheaper phone and wait, but part of Apple's value proposition is that, for the money you spend, you reduce the friction of constant short waits. (Compare, eg, how much faster the phone felt when 1st gen TouchID was replaced with the faster 2nd TouchID. Same thing now with FaceID; it works and works well. But it will feel even smoother when the current half second delay is dropped to a tenth of a second [or whatever].)

    (b) Apple chips also go into iPads. And people use iPads (and sometimes iPhones) for more than you claim --- for various artistic tasks (manipulating video and photos, drawing with very fancy [ie high CPU] "brushes" and effects, creating music, etc). One of the reasons these jobs are done on iPads (and sometimes Surfaces) and not Android is because they need a decent CPU.

    (c) Ambition. BECAUSE Apple has a decent CPU, they can put that CPU into their desktops. And, soon enough, also into their data centers...
  • serendip - Friday, June 1, 2018 - link

    I'm curious about all this because I'm an iPad user. No iPhones though. Even an old iPad Mini is smoother than top Android tablets today.

    Does the CPU spike up to maximum speed quickly when loading apps or PDFs, then very quickly throttle down to minimum? I don't know how Apple make their UI so smooth while also having good battery life.
  • varase - Saturday, June 2, 2018 - link

    Smooth is the iPhone X.

    When you touch the screen, touch tracking boosts to 120hz, even though they can only run the OLED screen at 60hz.

    As for PDFs, MacOS (and as a consequence iOS) uses non-computational postscript as their graphics framework ... and PDF is essentially journaled postscript (like a PICT was journaled QuickDraw).

    As for throttling down: yeah, when you've completed your computationally expensive task you throttle down to save power.
  • YaleZhang - Friday, June 1, 2018 - link

    Reducing latency of floating point instructions from 3 cycles to 2 seems quite an accomplishment. For Intel, it's been >= 3 cycles (http://www.agner.org/optimize/instruction_tables.p...

    Skylake: 4 cycles / 4.3 GHz = 0.93 ns
    A76: 2 cycles / 3 GHz = 0.66 ns

    Skylake latency increased to 4 probably to achieve a higher clock, but if A76 can do it in 3, then Skylake should also be able to do it (3 cycles / 4.3 GHz) = 0.70 ns.
    How did ARM do this?
  • tipoo - Tuesday, September 4, 2018 - link

    Lower max clocks, shorter pipeline maybe?
  • Quantumz0d - Friday, June 1, 2018 - link

    Hilarious commenters. Apple's SoC ? Again ? I guess people need to think about how bad their Power envelope is. Their A11 gets beaten by 835 in consistency, dropping to 60% of clocks lol. And the battery killing SoC yes the battery capacity is less on iPhones. But Apple's R&D and the chips costs are very high vs the ARM. Not to forget how 845s GPU performance slaps and drowns that Custom *cough cough *Imagination* IP derived GPU core.

    They rely on the Single Thread performance because of power and optimization it goes for one OS and one HW ecosystem ruled and locked by Apple only where as ARM derived designs or Qcomm are robust for supporting wider hardware pool and can even run Windows OS.

Log in

Don't have an account? Sign up now