Performance Targets: 20-35% Better IPC

The Cortex-A77 saw some interesting microarchitectural changes that promise to increase performance. The question now remains where exactly the targeted performance gains will end up at?

In terms of published performance improvements, Arm opted to stay with SPEC2006, 2017, GeekBench4 and LMBench memory bandwidth. Our focus here will be on SPEC2006 as it’s still the most relevant benchmark among the set for mobile.

On SPECint2006, the A77 promises around a 23% IPC increase, while SPECfp2006 claims a more staggering 35% boost. The 23% increase for integer workloads was more or less in line with what we expected of the CPU core, however the 30-35% increase for FP workloads I must admit came as quite a surprise, particularly since we haven’t seen any significant changes on the FP execution units of the core. An explanation here would be that SPEC’s FP test suite is more memory intensive than the integer suite, and the Cortex-A77’s various microarchitectural improvements would be more visible in these workloads.

Last year I had made performance and efficiency projections for the A76 at two frequency points, and I ended up being quite close to where the Kirin 980 and Snapdragon 855 ended up landing. For the Cortex-A77, things should be a lot more straightforward to project as we won’t see major process node changes in the next generation 7nm SoCs.

Baselining on the current results of the Kirin 980, I simply extrapolated performance based on the published IPC increases for a theoretical 2.6GHz Cortex-A77 SoC. It’s to be noted that although Arm this year again talks about 3GHz target frequencies for the A77, I’m not expecting vendors to quite reach this frequency in upcoming SoCs, thus the 2.6GHz projection.

In terms of performance, the integer suite would see some solid improvement, however the floating-point results are a lot more interesting. If correct, the A77 would exceed the FP performance of Apple’s A11 and make for quite a big generational push even though we’re not expecting big process node improvements. It’s to be noted though that the A77 will have to compete with Apple’s A13 later this year as well as next-gen M5 cores from Samsung.

Arm promises energy efficiency of the A77 will remain the same as current-gen A76 SoCs. Thus at peak performance, both CPU cores would use the same amount of energy to complete a set workload. The increased performance of the A77 would however have one drawback: Increased power usage, linear with the increased performance figures. This latter increased power usage would seemingly reach levels where running more than two cores at peak frequency would be more problematic in a mobile SoC. Luckily, most vendors have moved on from 4 full-speed big cores to 2+2 or 3+1 designs where there’s only one or two high-power big cores.

It’s to be noted although we’re talking about big cores here, the A77 is said to be only 17% bigger than the A76 – still significantly smaller than the next best microarchitecture from the competition.

End Remarks

Overall the Cortex-A77 announcement today isn’t quite as big of a change as what we saw last year with the A76, nor is it as big a change as today’s new announcement of Arm’s new Valhall GPU architecture and G77 GPU IP.

However what Arm managed to achieve with the A77 is a continued execution of their roadmap, which is extremely important in the competitive landscape. The A76 delivered on all of Arm’s promises and ended up being an extremely performant core, all while remaining astonishingly efficient as well as having a clear density lead over the competition. In this regard, Arm’s major clients are still heavily focusing on having the best PPA in their products, and Arm delivers in this regard.

The one big surprise about the A77 is that its floating point performance boost of 30-35% is quite a lot higher than I had expected of the core, and in the mobile space, web-browsing is the killer-app that happens to be floating point heavy, so I’m looking forward how future SoCs with the A77 will be able to perform.

But even in the integer workloads a 20-25% IPC gain is absolutely marvellous improvement, and we do trust Arm to be able to maintain energy efficiency of the A76. Power will go up slightly, but I think the industry has shown that mobile devices today handle at least two higher power cores properly, so future SoCs should continue with big+middle+little CPU configurations.

Coming A77 SoCs from vendors are expected to still be 7nm – Qualcomm and HiSilicon are the two obvious leading customers that would adopt the core and I’m expecting similar timeframes as last generation’s chipsets. For now- Arm’s delivering on their promised 20-25% yearly CAGR and we believe this to continue for the foreseeable next few generations.

The Cortex-A77 µarch: Added ALUs & Better Load/Stores


View All Comments

  • Ryan Smith - Tuesday, May 28, 2019 - link

    Beg your pardon?

    And that's just in the last 36 hours.
  • Raqia - Tuesday, May 28, 2019 - link

    Another interesting development in the big AX CPUs is that they've moved from a more complex cache hierarchy in the A10 to a 2 level hierarchy with a much bigger L2 since the A11 that had better bandwidth and latency; L1's were also further boosted in size and bandwidth in the A12. This likely accounts for the continuation of growth in single threaded benchmark scores but seems to indicate that the CPU complex is oriented toward client type workloads.

    ARM has gone full steam ahead with more multi-processing oriented cache designs with some SoCs sporting a further layer of L4 cache and server designs sporting sophisticated un-cores. Their ambitions seem rather different than Apple's and this year's A77's will likely be implemented into servers designs sometime soon.

    Apple's 3-wide OoOE little cores continue to be even more impressive than their big cores, and hold their own against the A73 in performance with much higher efficiency. One wonders if the 2-wide A73 or even the A75 could be tweaked and underclocked to be the "little" in future designs. It certainly fits the bill in terms of die area.
  • peevee - Tuesday, May 28, 2019 - link

    "The results is that the Kirin 980 as well as the Snapdragon 855 both represented major jumps over their predecessors. Qualcomm has proclaimed a 45% leap in CPU performance compared to the previous generation Snapdragon 855 with Cortex-A76 cores, the biggest generational leap ever."

  • peevee - Tuesday, May 28, 2019 - link

    "In the A77’s case the structure is 1.5K entries big, which if one would assume macro-ops having a similar 32-bit density as Arm instructions, would equate to about 48KB."

    You mean Kb, right? And of course this assumption is nonsense.
  • peevee - Tuesday, May 28, 2019 - link

    "web-browsing is the killer-app that happens to be floating point heavy"

    Why? Because ECMAScript has just one number type?
    I suspect WebAssembly would eliminate this problem.
  • ballsystemlord - Tuesday, May 28, 2019 - link

    Spelling and grammar corrections:

    "Having less capacity would take reduce the hit-rate more significantly, while going for a larger cache would have diminishing returns."
    Extra word "take":
    "Having less capacity would reduce the hit-rate more significantly, while going for a larger cache would have diminishing returns."

    "...and again this imbalance with a more "fat" front-end bandwidth allows the core to hide to quickly hide branch bubbles and pipeline flushes."
    More extra words "to hide":
    "...and again this imbalance with a more "fat" front-end bandwidth allows the core to quickly hide branch bubbles and pipeline flushes."
  • sireangelus - Tuesday, May 28, 2019 - link

    Are there any news or rumors regarding the succesor of the cortex a55? not even just working on reducing power consumption? Reply
  • tuxRoller - Tuesday, May 28, 2019 - link

    "The combination of the brand-new microarchitecture alongside the major improvements that the 7nm TSMC process node has brought some of the biggest performance and efficiency jumps we’ve ever seen in the industry."

    Or, to paraphrase many a cynical AT commenter: same old incremental improvement, nothing exciting... where's my mr fusion?!!!
  • AshlayW - Tuesday, May 28, 2019 - link

    Can someone tell me how this stacks up to a high-performance X86 core, like Zen or Skylake please? If ARM is so powerful and efficient why are they not developing Desktop CPUs? Is it just because the software ecosystem is dominated by proprietary X86? Reply
  • Wilco1 - Wednesday, May 29, 2019 - link

    The IPC is higher than the latest x86 cores. There are Arm server CPUs which are competitive with Skylake and beat it on HPC applications in super computers. Currently you can buy desktops based on ThunderX2 and Ampere, see . Reply

Log in

Don't have an account? Sign up now