Final Thoughts

ARM has certainly been busy, refreshing several key technologies for the next generation of SoCs. DynamIQ might not be as flashy as a new CPU, but as a replacement for big.LITTLE it’s every bit as important. It will be interesting to see how ARM’s partners utilize its flexibility. Will we continue to see the same 4+4 combination of big and little cores at the high end and 8 little cores in the low end to midrange? Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s? Currently only the A75/A55 are compatible with DynamIQ, and the new CPUs cannot be mixed with older cores using big.LITTLE. This means we will not see the A35 used in mobile outside of MediaTek’s Helio X30.

DynamIQ is an upgrade to bL in other ways too. Placing both the big and little cores inside the same cluster brings several benefits: making the L2 caches local to each CPU and adding an optional L3 cache improves overall memory performance, thread migration latency is reduced, and CPUs can be powered up/down more quickly, which could lead to better battery life.

The A55’s extra performance is a welcome change. This should yield tangible improvements to the user experience in mobile applications, certainly for devices that use A55 cores exclusively. Even devices with A75 cores should still see some benefit considering how threads spend most of their time running on the little cores.

ARM already pushed throughput through the A53’s 2-wide in-order core about as far as it could. Given the power and area targets for A53/A55, going wider or out of order are not possible at this stage. Instead, ARM focused on improving the memory system, reducing latency and improving utilization of the in-order core by keeping it fed with data. The increased performance comes with a small bump in power, but overall efficiency is better.

For the A75, the move to 3-wide decode, improvements throughout the cache hierarchy, and tweaks to improve its out-of-order capability should yield clear performance gains over the A73 in both integer and floating-point workloads. At the same frequency, the A72 actually performs better than A73 in some situations. I expect this will not be the case with A75.

According to ARM’s numbers, the A75’s performance gains help it maintain the same efficiency as the A73, but power consumption is higher, which concerns me a little. ARM has an implementation team optimizing its reference design, so its power numbers are sort of a target for SoC vendors. Because of pressure to reduce time to market, vendors do not always have the same amount of time to optimize their designs, resulting in higher power consumption and lower efficiency. Hopefully, vendors put in the effort to match or get close to ARM’s numbers.

ARM’s primary goal for A72 was reducing power, for A73 it was improving power efficiency, and for A75 it's improving performance. What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? Will it look similar to A75, or will there be a significant shift in philosophy like we saw with A72 to A73? There is communication and cross pollination of ideas between teams so there's sure to be some similarities, especially with the execution pipes. The biggest changes should be in the front end, and I would not be surprised to see an extra ALU pipe with the move to 7nm.

If all goes according to plan, we should see the first SoCs using DynamIQ and the A75/A55 in Q1 2018 (maybe Q4 2017) on 10nm.

Cortex-A55 Microarchitecture
Comments Locked

104 Comments

View All Comments

  • alpha64 - Monday, May 29, 2017 - link

    Or, perhaps, is this talking about Max and Min configurations of the DSU itself, and not the core (which is not clear in the sentence either)?
  • jjj - Monday, May 29, 2017 - link

    Got to be DSU max and min but can't be with the L3$ as 4MB L3$ is huge.
  • hMunster - Monday, May 29, 2017 - link

    In what mobile usage scenario is having 7 small cores and 1 big one an advantage over having just 3 small cores and 1 big one?
  • Meteor2 - Monday, May 29, 2017 - link

    Are you trolling...?

    Actual answer: Android is always doing loads of things at once which don't need doing particularly quickly (I.e. aren't at the direct behest of the user), but do need doing efficiently. The throughput needed is more than three small cores can provide.
  • aryonoco - Monday, May 29, 2017 - link

    This website has itself proven that Android does use 8 cores and these extra cores do bring improvements in overall experience.

    Personally, I'm hopeful that SoC vendors actually do following ARM here and kill off 8 little cores in favour of a 1+7 design. Would translate into a huge single threaded improvement for end users.
  • Eden-K121D - Monday, May 29, 2017 - link

    Better to have 2 + 6 or even 2+4
  • phoenix_rizzen - Monday, May 29, 2017 - link

    Yeah, a 2+6 A75/55 arrangement would be neat to see. 4+4 seems like overkill in a phone, but could be useful in a high-res tablet or Chromebook.

    Wonder what the TDP would be for an 8-core cluster of just A75s. :) Chromebook or laptop?
  • aryonoco - Monday, May 29, 2017 - link

    Well ARM says the TDP of A75 is from 750mW per core to 2W per core based on clock speed. Obviously an SoC has other components as well, and cache and the interconnects use power as well, not to mention the GPU. So depending on how you clock it and how much cache it has and what GPU it has, I'd say that an 8-core A75 SoC on 10nm process would have a TDP of somewhere between 12W to 25W.

    And as much as I'd like to see such a 8-Core A75 SoC, this is pure fantasy. The volume of Chromebooks is too low to design bespoke SoCs for them, they'll be just use whatever is designed for phones.
  • aryonoco - Monday, May 29, 2017 - link

    Sure, I would prefer a 2+4 to a 1+7 configuration as well. I've always thought that 2+4 is the sweet spot for big.LITTLE.

    But 2+4 is going to be a lot bigger than 1+7. These LITTLE cores are extremely tiny, the A53 is about 1mm on 16nm fab process. Which is why the mid-range of the market has gone for 8-core A53s in the last couple of years.

    2+4 is a lot bigger so it's not even a consideration for this end of the market. 2+4 could have been done with big.LITTLE but the fact that there were so few 2+4 SoCs tells you that the market just didn't think it was worth it. 1+7 on the other hand was not possible with big.LITTLE, but is possible with DynamIQ, and according to ARM is only a little bigger than 8 small cores, so I'm hoping that the market sees value in that for the midrange.
  • 0iron - Monday, May 29, 2017 - link

    But die size will be bigger. The reason why ARM introduce 1+7 is because die size is 13% bigger than 8xA53 while 8xA55 is 10% more. So, it's only 3% more area than 8xA55 but provide 2x single core performance.

Log in

Don't have an account? Sign up now