Final Thoughts

ARM has certainly been busy, refreshing several key technologies for the next generation of SoCs. DynamIQ might not be as flashy as a new CPU, but as a replacement for big.LITTLE it’s every bit as important. It will be interesting to see how ARM’s partners utilize its flexibility. Will we continue to see the same 4+4 combination of big and little cores at the high end and 8 little cores in the low end to midrange? Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s? Currently only the A75/A55 are compatible with DynamIQ, and the new CPUs cannot be mixed with older cores using big.LITTLE. This means we will not see the A35 used in mobile outside of MediaTek’s Helio X30.

DynamIQ is an upgrade to bL in other ways too. Placing both the big and little cores inside the same cluster brings several benefits: making the L2 caches local to each CPU and adding an optional L3 cache improves overall memory performance, thread migration latency is reduced, and CPUs can be powered up/down more quickly, which could lead to better battery life.

The A55’s extra performance is a welcome change. This should yield tangible improvements to the user experience in mobile applications, certainly for devices that use A55 cores exclusively. Even devices with A75 cores should still see some benefit considering how threads spend most of their time running on the little cores.

ARM already pushed throughput through the A53’s 2-wide in-order core about as far as it could. Given the power and area targets for A53/A55, going wider or out of order are not possible at this stage. Instead, ARM focused on improving the memory system, reducing latency and improving utilization of the in-order core by keeping it fed with data. The increased performance comes with a small bump in power, but overall efficiency is better.

For the A75, the move to 3-wide decode, improvements throughout the cache hierarchy, and tweaks to improve its out-of-order capability should yield clear performance gains over the A73 in both integer and floating-point workloads. At the same frequency, the A72 actually performs better than A73 in some situations. I expect this will not be the case with A75.

According to ARM’s numbers, the A75’s performance gains help it maintain the same efficiency as the A73, but power consumption is higher, which concerns me a little. ARM has an implementation team optimizing its reference design, so its power numbers are sort of a target for SoC vendors. Because of pressure to reduce time to market, vendors do not always have the same amount of time to optimize their designs, resulting in higher power consumption and lower efficiency. Hopefully, vendors put in the effort to match or get close to ARM’s numbers.

ARM’s primary goal for A72 was reducing power, for A73 it was improving power efficiency, and for A75 it's improving performance. What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? Will it look similar to A75, or will there be a significant shift in philosophy like we saw with A72 to A73? There is communication and cross pollination of ideas between teams so there's sure to be some similarities, especially with the execution pipes. The biggest changes should be in the front end, and I would not be surprised to see an extra ALU pipe with the move to 7nm.

If all goes according to plan, we should see the first SoCs using DynamIQ and the A75/A55 in Q1 2018 (maybe Q4 2017) on 10nm.

Cortex-A55 Microarchitecture
Comments Locked

104 Comments

View All Comments

  • lilmoe - Monday, May 29, 2017 - link

    All of ARM's hope of having any significant, mainstream, server market share was demolished with the announcement of Naples.
  • Wilco1 - Monday, May 29, 2017 - link

    Check https://community.arm.com/processors/b/blog/posts/... - at the end there is a whole section on how much faster Cortex-A75 is compared to Cortex-A72 on SPECrate in big server designs. And Cortex-A55 also has all the RAS features required for servers.
  • lilmoe - Monday, May 29, 2017 - link

    It's about damn time they're addressing the little cores. We might not get anything better until they go fully OoO on both clusters with 7nm and beyond.
  • Meteor2 - Monday, May 29, 2017 - link

    In-order is more efficient, and that's important for 'workhorse' cores, whatever the node. Why waste energy when you don't need to? Like A53, A55 is about doing enough work quick enough, not doing as much as possible as fast as possible. (I think A35 was too weedy for life outside of IoT.)

    Plus I suspect that the ARM ISA is suited to in-order.
  • lilmoe - Monday, May 29, 2017 - link

    Energy consumption goes down with each process node advancement. The A53 is spending most of its compute time well over 1.2ghz on flagship SoCs. At 1.5ghz and beyond, it starts wasting more energy than the big cluster at similar performance points. Your point is absolutely correct in an "ideal workload", but the real world is far from that with higher resolution screens and all the other crazy peripherals OEMs employ. We'll have to wait and see the performance energy curve first before determining the sweet spot of the a55 vs where most of the actual workload resides.
  • Meteor2 - Friday, June 2, 2017 - link

    Haven't seen an energy/performance curve on Anandtech for ages :(
  • nandnandnand - Monday, May 29, 2017 - link

    Low tier tech news outlets are saying the A75 has 50% more performance and that the A55 is 2.5x more power efficient:

    Compare: https://venturebeat.com/2017/05/28/arm-wants-to-bo... which acknowledges "in some use cases"
    To: http://www.zdnet.com/article/arm-launches-new-cort... which just says "up to"

    50% figure comes from comparing A73 at 2.4 GHz to A75 at 3.0 GHz: https://www.theregister.co.uk/2017/05/29/arm_cpus_...
  • jjj - Monday, May 29, 2017 - link

    The 2.5x claim was on an ARM slide but it included process shrinks.
    The 50% claim seems to be at 2W per core not 3GHz but even then it clearly isn't in SPEC,maybe in Geekbench it gets close to that gain at 2W.
  • Krysto - Monday, May 29, 2017 - link

    I wouldn't be surprised if we do see some configurations at 7nm like that (3GHz). ARM has always tended to give pretty unrealistic clock speed peak performance for its cores, which no chip maker actually ended up implementing, but I think this time it may be different due to Chromebooks and Windows 10 on ARM, where a 15W TDP or even higher is perfectly fine, as long as you also get good performance/W (compared to Intel/AMD).

    However, chip makers and OEMs will also have to take into account Windows 10 on ARM emulation, so they'll need to keep those chips at least 10% lower TDP/power consumption than the Intel/AMD competitors at those levels of performance.
  • jjj - Monday, May 29, 2017 - link

    A75 is not aimed at 7nm, that's where the next core comes in.

Log in

Don't have an account? Sign up now