Final Thoughts

ARM has certainly been busy, refreshing several key technologies for the next generation of SoCs. DynamIQ might not be as flashy as a new CPU, but as a replacement for big.LITTLE it’s every bit as important. It will be interesting to see how ARM’s partners utilize its flexibility. Will we continue to see the same 4+4 combination of big and little cores at the high end and 8 little cores in the low end to midrange? Or will we see new 7+1 or 3+1 combinations with a single A75 surrounded by A55s? Currently only the A75/A55 are compatible with DynamIQ, and the new CPUs cannot be mixed with older cores using big.LITTLE. This means we will not see the A35 used in mobile outside of MediaTek’s Helio X30.

DynamIQ is an upgrade to bL in other ways too. Placing both the big and little cores inside the same cluster brings several benefits: making the L2 caches local to each CPU and adding an optional L3 cache improves overall memory performance, thread migration latency is reduced, and CPUs can be powered up/down more quickly, which could lead to better battery life.

The A55’s extra performance is a welcome change. This should yield tangible improvements to the user experience in mobile applications, certainly for devices that use A55 cores exclusively. Even devices with A75 cores should still see some benefit considering how threads spend most of their time running on the little cores.

ARM already pushed throughput through the A53’s 2-wide in-order core about as far as it could. Given the power and area targets for A53/A55, going wider or out of order are not possible at this stage. Instead, ARM focused on improving the memory system, reducing latency and improving utilization of the in-order core by keeping it fed with data. The increased performance comes with a small bump in power, but overall efficiency is better.

For the A75, the move to 3-wide decode, improvements throughout the cache hierarchy, and tweaks to improve its out-of-order capability should yield clear performance gains over the A73 in both integer and floating-point workloads. At the same frequency, the A72 actually performs better than A73 in some situations. I expect this will not be the case with A75.

According to ARM’s numbers, the A75’s performance gains help it maintain the same efficiency as the A73, but power consumption is higher, which concerns me a little. ARM has an implementation team optimizing its reference design, so its power numbers are sort of a target for SoC vendors. Because of pressure to reduce time to market, vendors do not always have the same amount of time to optimize their designs, resulting in higher power consumption and lower efficiency. Hopefully, vendors put in the effort to match or get close to ARM’s numbers.

ARM’s primary goal for A72 was reducing power, for A73 it was improving power efficiency, and for A75 it's improving performance. What will be the goal for the next core, which will be coming from ARM’s Austin team that produced the A72? Will it look similar to A75, or will there be a significant shift in philosophy like we saw with A72 to A73? There is communication and cross pollination of ideas between teams so there's sure to be some similarities, especially with the execution pipes. The biggest changes should be in the front end, and I would not be surprised to see an extra ALU pipe with the move to 7nm.

If all goes according to plan, we should see the first SoCs using DynamIQ and the A75/A55 in Q1 2018 (maybe Q4 2017) on 10nm.

Cortex-A55 Microarchitecture
Comments Locked

104 Comments

View All Comments

  • melgross - Wednesday, May 31, 2017 - link

    No, we don't know if they're fake. TSMC stated, months ago, that they were delivering 10nm parts to their largest customers, which one would presume, is Apple.

    And my statement stand. If the best the u35 can do is just over 2,000, then these parts are slightly over twice as fast. And if the claim for the multiprocessing score is right, then that's well over the score for any 4 core ARM chip from anyone else.

    ''Traditionally", these scores that leak out, whether real, or not, are remarkably close to what's tested after Apple's product does come out, often being somewhat lower that the "real" scores.
  • jjj - Monday, May 29, 2017 - link

    Apple's ST perf is marketing for folks like you, nothing more.
    ST perf is too high for mobile even with A75 and ST perf is not what matters. We would all be better off with less ST and higher efficiency.
    Sadly people like you are pushing the industry into pushing ST for no reason.
    Apple 's core is huge compared to ARM's core and for what, ST perf you don't need ,lower MT and efficiency.
  • aryonoco - Monday, May 29, 2017 - link

    I'm curious to know what you are basing this on.

    From where I stand, the great majority of times on mobiles are spent either on the web, or in games. Javascript is still very much single-threaded, so higher ST performance directly results in better web experience.

    Why do you think that "I don't need ST perf"?

    Note: I don't have a single iOS device, though I'd have loved to have an Android device with A10 inside.
  • melgross - Wednesday, May 31, 2017 - link

    Nonsense! If this were a Qualcomm or Samsung chip, you wouldn't be saying that, and we both know that. While I don't know what other chips use, Apple's is about 3 watts, which is likely about what the others are. But Apple manages to get far better performance. That's never a bad thing.

    I don't think you understand what smartphones are being used for.
  • melgross - Monday, May 29, 2017 - link

    While we don't know if the benchmarks that have been listed for the new A11 from Apple are real, though they seem to be what we would expect, individual cores are hitting over 4,500 and slightly under 9,000 multicore, with both cores.

    With everything I've read here, I'm still not sure what we would expect from these parts. The highest performing ARM used on Android seems to be well below 2,000 per core, with almost 7,000 for 4 core multicore.

    So, what's to expect here? And how much of this advantage is coming from the process shrink, rather than from core improvements?
  • tipoo - Monday, May 29, 2017 - link

    Yeah that's what I'm wondering, how much is IPC improvement and how much is just clocking it higher on a new node.
  • Wardrive86 - Monday, May 29, 2017 - link

    Shouldn't that be 2-128 bit NEON/FPU pipelines for the A75? If not that's a Max 4 flops per clock and lower than the cores it is replacing
  • serendip - Monday, May 29, 2017 - link

    I hope chip vendors don't push 8x A55 designs for the midrange because they're only good for the low end. Having so many similar cores is pointless because Android rarely uses all 8 cores.

    I'd rather see more 2+4 or 4+4 designs with the A55 and A75, especially something like the old Snapdragon 650/652 with the latest cores and processes. I'm looking to upgrade my Mi Max a year from now and the relevant chips should appear by then. On the other hand, with constant driver updates, this phone could last for a few years still.
  • Wilco1 - Tuesday, May 30, 2017 - link

    A Cortex-A55 at 2.5GHz (same as Helio P25) would get close to ST performance of Galaxy S6 (and match MT perf). That was top-end 2 years ago... So while I agree 1+7 or 2+6 would be much better than 8x A55, I don't think you could call an S6 a low-end phone even in 2018!
  • serendip - Tuesday, May 30, 2017 - link

    The Helios with their decacore design couldn't beat the real world speed and battery life of a Snapdragon 65x. It's foolish to run an A55 at 2.5 GHz when an A75 at lower speed uses similar amounts of power while being much faster. At one point, you move the load from the donkey and put it on a race horse :)

Log in

Don't have an account? Sign up now