Final Thoughts

ARM has certainly been busy, refreshing several key technologies for the next generation of SoCs. DynamIQ might not be as flashy as a new CPU, but as a replacement for big.LITTLE it’s every bit as important. It will be interesting to see how ARM’s partners utilize its flexibility. Will we continue to see the same 4+4 combination of big and little cores at the high end and 8 little cores in the low end to midrange? Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s? Currently only the A75/A55 are compatible with DynamIQ, and the new CPUs cannot be mixed with older cores using big.LITTLE. This means we will not see the A35 used in mobile outside of MediaTek’s Helio X30.

DynamIQ is an upgrade to bL in other ways too. Placing both the big and little cores inside the same cluster brings several benefits: making the L2 caches local to each CPU and adding an optional L3 cache improves overall memory performance, thread migration latency is reduced, and CPUs can be powered up/down more quickly, which could lead to better battery life.

The A55’s extra performance is a welcome change. This should yield tangible improvements to the user experience in mobile applications, certainly for devices that use A55 cores exclusively. Even devices with A75 cores should still see some benefit considering how threads spend most of their time running on the little cores.

ARM already pushed throughput through the A53’s 2-wide in-order core about as far as it could. Given the power and area targets for A53/A55, going wider or out of order are not possible at this stage. Instead, ARM focused on improving the memory system, reducing latency and improving utilization of the in-order core by keeping it fed with data. The increased performance comes with a small bump in power, but overall efficiency is better.

For the A75, the move to 3-wide decode, improvements throughout the cache hierarchy, and tweaks to improve its out-of-order capability should yield clear performance gains over the A73 in both integer and floating-point workloads. At the same frequency, the A72 actually performs better than A73 in some situations. I expect this will not be the case with A75.

According to ARM’s numbers, the A75’s performance gains help it maintain the same efficiency as the A73, but power consumption is higher, which concerns me a little. ARM has an implementation team optimizing its reference design, so its power numbers are sort of a target for SoC vendors. Because of pressure to reduce time to market, vendors do not always have the same amount of time to optimize their designs, resulting in higher power consumption and lower efficiency. Hopefully, vendors put in the effort to match or get close to ARM’s numbers.

ARM’s primary goal for A72 was reducing power, for A73 it was improving power efficiency, and for A75 it's improving performance. What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? Will it look similar to A75, or will there be a significant shift in philosophy like we saw with A72 to A73? There is communication and cross pollination of ideas between teams so there's sure to be some similarities, especially with the execution pipes. The biggest changes should be in the front end, and I would not be surprised to see an extra ALU pipe with the move to 7nm.

If all goes according to plan, we should see the first SoCs using DynamIQ and the A75/A55 in Q1 2018 (maybe Q4 2017) on 10nm.

Cortex-A55 Microarchitecture
Comments Locked

104 Comments

View All Comments

  • Samus - Monday, May 29, 2017 - link

    Ok, point taken, 4 year gap. Even more unacceptable. But you continue to imply the architecture hitting a wall because of the inherently in order nature of RISC so I ask you, why have Samsung and Apple continued to have great success deviating from ARM's reference designs, while Qualcomm has been married to them and paying the performance price (specifically looking at you, 808)
  • Death666Angel - Monday, May 29, 2017 - link

    "die-constrained design" He gave the answer right there, at least as far as Apple is concerned. ARM is imposing a certain, small, die size for the A53/55 cores, that limits their potential performance a lot. Apple doesn't care about die size since their margins are high enough. The Apple A5X was as large as the Intel Ivy Bridge 4C die. Both were around in 2012, though Intel had the process node advantage. Still insane numbers. And Samsung is also very vertically integrated, they don't have to worry as much about die size as Qualcomm since they own the manufacturing as well. And QC doesn't have to play the performance game, since their choke hold on the modem technology allows them to still have plenty of design wins for the moment.
  • Wilco1 - Monday, May 29, 2017 - link

    It's a 3 year gap. And Samsung still uses Cortex-A53 alongside Mongoose.
  • tipoo - Monday, May 29, 2017 - link

    Not RISC constrained to in-order, nothing to do with RISC, rather that in-order is a design choice for this particular model for power and die size.

    A75, Hurricane, etc are of course massively out of order ARM/RISC designs.
  • name99 - Wednesday, May 31, 2017 - link

    Zephyr is NOT a massive OoO design. Probably 2-wide in order. We don't know its performance, but it certainly saves power (compared toA9) while not seeming to slow down the phone.

    ARM seems to hurt itself by an insistence on these TINY designs. (Just like Intel on the other side hurts itself by an insistence on designs that are first server targeted). Apple wins partially by not trying to be everything to everyone...
  • helvete - Wednesday, August 30, 2017 - link

    Ending up being nothing for nobody? /s
  • aryonoco - Monday, May 29, 2017 - link

    There is nothing inherently in-order about RISC. Various ARM designs such as Cortex A57, A72, A73 and A75 are out of order.

    Samsung has not really deviated from ARM's reference design by much, they still use the Cortex A53 in various SoCs including their latest and greatest Exynos 8895. And that 8895 is also falling behind Snapdragon 835, which is a standard A73 implementation for all intents and purposes (Qualcomm's marketing notwithstanding).

    Apple, well, they can afford to dedicate the die area to a big core. No one else can.
  • ZeDestructor - Tuesday, May 30, 2017 - link

    Everyone can go for the giant-die approach, but thanks to how marketing works, people will buy 8 A53s well before they even consider 4 A53 + 1 Cyclone-sized core.
  • name99 - Wednesday, May 31, 2017 - link

    Ding ding ding. We have a winner
  • Wilco1 - Monday, May 29, 2017 - link

    It was late 2014, and the first designs should appear late this year (just like Cortex-A73 was announced last year and appeared the same year). That is a 3 year gap, not 4 years.

    Note Cortex-A53 scaled quite well, from 1.3GHz in Exynos 5433 to 1.8GHz in Kirin (and goes up to 2.5GHz in Helio P25). The big cores scaled via yearly new micro-architectures, so the big/little performance ratio has remained similar.

    > Fact is, there is only so much you can do with an in-order die-constrained design.

    And more importantly not only keeping but actually improving power efficiency. It could go much faster if it were allowed a similar power budget as the big cores.

Log in

Don't have an account? Sign up now