Final Thoughts

ARM has certainly been busy, refreshing several key technologies for the next generation of SoCs. DynamIQ might not be as flashy as a new CPU, but as a replacement for big.LITTLE it’s every bit as important. It will be interesting to see how ARM’s partners utilize its flexibility. Will we continue to see the same 4+4 combination of big and little cores at the high end and 8 little cores in the low end to midrange? Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s? Currently only the A75/A55 are compatible with DynamIQ, and the new CPUs cannot be mixed with older cores using big.LITTLE. This means we will not see the A35 used in mobile outside of MediaTek’s Helio X30.

DynamIQ is an upgrade to bL in other ways too. Placing both the big and little cores inside the same cluster brings several benefits: making the L2 caches local to each CPU and adding an optional L3 cache improves overall memory performance, thread migration latency is reduced, and CPUs can be powered up/down more quickly, which could lead to better battery life.

The A55’s extra performance is a welcome change. This should yield tangible improvements to the user experience in mobile applications, certainly for devices that use A55 cores exclusively. Even devices with A75 cores should still see some benefit considering how threads spend most of their time running on the little cores.

ARM already pushed throughput through the A53’s 2-wide in-order core about as far as it could. Given the power and area targets for A53/A55, going wider or out of order are not possible at this stage. Instead, ARM focused on improving the memory system, reducing latency and improving utilization of the in-order core by keeping it fed with data. The increased performance comes with a small bump in power, but overall efficiency is better.

For the A75, the move to 3-wide decode, improvements throughout the cache hierarchy, and tweaks to improve its out-of-order capability should yield clear performance gains over the A73 in both integer and floating-point workloads. At the same frequency, the A72 actually performs better than A73 in some situations. I expect this will not be the case with A75.

According to ARM’s numbers, the A75’s performance gains help it maintain the same efficiency as the A73, but power consumption is higher, which concerns me a little. ARM has an implementation team optimizing its reference design, so its power numbers are sort of a target for SoC vendors. Because of pressure to reduce time to market, vendors do not always have the same amount of time to optimize their designs, resulting in higher power consumption and lower efficiency. Hopefully, vendors put in the effort to match or get close to ARM’s numbers.

ARM’s primary goal for A72 was reducing power, for A73 it was improving power efficiency, and for A75 it's improving performance. What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? Will it look similar to A75, or will there be a significant shift in philosophy like we saw with A72 to A73? There is communication and cross pollination of ideas between teams so there's sure to be some similarities, especially with the execution pipes. The biggest changes should be in the front end, and I would not be surprised to see an extra ALU pipe with the move to 7nm.

If all goes according to plan, we should see the first SoCs using DynamIQ and the A75/A55 in Q1 2018 (maybe Q4 2017) on 10nm.

Cortex-A55 Microarchitecture
Comments Locked

104 Comments

View All Comments

  • Krysto - Monday, May 29, 2017 - link

    I don't think these chips will ship by late 2018. ARM typically announces its chips 2 years before they are shipped. To be shipped in early 2018, there would have to already be a Cortex A75 tapeout, which I don't think is the case. In 2019, Samsung likely intends to release Galaxy S10 with 7nm chips, so I'm going to assume it will be the A75.
  • jjj - Monday, May 29, 2017 - link

    This is aimed at 10nm and the cycle that starts in early 2018 or before. So SD845 at MWC 2018 and maybe Huawei does it again and has something this year.
    The slides mention 10nm but not 7nm, the article notes repeatedly late 2017-early 2018.

    A73 was announced a year ago and Huawei had Kirin 960 last year, Qualcomm in first half of 2017.
    This is an unveiling for the public not ARM's partners.
    Also do remember that ARM has a new big core every year now.

    As for Samsung, they'll likely stick with their own core next year and remains to be seen what ARM has for 7nm.
    It appears that the Austin team got an extra year to work on the next core and that could be a hint that the core aimed at 7nm is an entirely new design.
  • aryonoco - Monday, May 29, 2017 - link

    You have obviously not read this article then.

    These IPs will be seen in SoCs in late this year/early next year.
  • nandnandnand - Monday, May 29, 2017 - link

    15 W TDP you say... maybe 8x 2 Watt A75s crammed into one laptop?
  • jjj - Monday, May 29, 2017 - link

    Do note that the numbers ARM quotes are for just the core, no cache, interconnect, IO, GPU.
    A SoC with 4x2W would use quite a bit more power than just 8W.
  • Krysto - Monday, May 29, 2017 - link

    Sadly, I think DinamiQ doesn't mean that chip makers will use a single 8-core cluster, but that they will use both DinamiQ and big.Little in configurations like 2+8 or 4+8, or even 8+8, mainly for marketing reasons. So the performance flexibility won't change much.
  • phoenix_rizzen - Monday, May 29, 2017 - link

    DynamIQ and big.LITTLE are not compatible, you can't mix and match. You either use older cores (A72/73 + A53) with big.LITTLE, or you use newer cores (A75+A55) with DynamIQ.

    DynamIQ, IIUC, allows for multiple clusters, so you could get 8+8, 4+8, 2+8 and similar configurations. I doubt anyone would do that in a smartphone; but the Chinese OEMs seem obsessed with core counts, so they may do something weird (like the Helio tri-cluster setup).
  • twotwotwo - Monday, May 29, 2017 - link

    What is branch prediction used for in the *in-order* A55? Is it just to try to prefetch the right instructions into L1I? Or can you do some speculative stuff (e.g. decode the expected next instruction) and still be called in-order?
  • Wilco1 - Monday, May 29, 2017 - link

    In-order doesn't imply no speculation. Instructions after a branch start executing speculatively but cannot complete until the branch direction is determined.

    Branch prediction is as important for an in-order core as it is for an OoO core. Without branch prediction every branch would take 8+ cycles rather than < 0.1 cycles on average.
  • alpha64 - Monday, May 29, 2017 - link

    I am a bit confused by this statement on the second page:

    "Together with L3 cache and other control logic, the DSU is about the same area as an A55 core in its max configuration or half the area of an A55 in its min configuration."

    Is this backwards (half A55 in max, or ~A55 in min), or is the second A55 supposed to be an A75?

Log in

Don't have an account? Sign up now