Cortex-A520: LITTLE Core with Big Improvements

The third of the Armv9.2 cores is the Cortex-A520, which is little in design, but Arm promises big improvements over previous generations, particularly on power efficiency.

Addressing the biggest question right off the bat: no, Cortex-A520 is not an out-of-order core design. True to Arm's little core design ethos, it's still an in-order core – and in fact, Arm has even removed an ALU in the process.

Arm's smallest core for this generation is a new core in effect, but it is still more of a refinement of the Cortex-A510 than a completely new design. It has the lowest power-to-area ratio of all three announced Cortex Armv9.2 cores. The most significant differences come through optimizations on power, with Arm claiming that the Cortex-A520 is 22% more energy efficient than the previous Cortex-A510 core at iso-process and iso-frequency. The little core in Arm's TCS23 catalog is primarily designed for performing low-intensity and background operational tasks, which takes these loads off bigger cores such as the Cortex-A720/Cortex-X4 to allow better power efficiency overall within the cluster.

Many of Arm's efficiency gains come from small, microarchitectural level changes, mostly around how it implements data prefetch and branch prediction. On the whole, not much has been changed to the little core, but the small changes that have been made have all been about improving efficiency.

One of the non-architectural areas of improvement has been introducing the new QARMA3 Pointer Authentication Code (PAC) algorithm, which Arm claims to reduce PAC's overhead to under 1%. QARMA3 is a cryptographic-based technique designed to ensure the pointers' integrity is correct and accurate. It also provides a secure and efficient way to avoid tampering with the necessary underlying code so that any authorized modifications or tampering if the pointers are eliminated adds a layer of hardware-level security. Arm is not only leveraging QARMA3 PAC to boost security and integrity, but it also allows them to squeeze out additional levels of efficiency, if compared to using PAC with older algorithms.

Much like when Arm announced its armv9 architecture in 2021, the small Cortex-A520 cores can be merged in pairs to share pipelines and improve efficiency. Adopting a pairing of the smaller Cortex-A520 cores can enhance efficiency by combining them through relevant pipelines such as SVE, NEON, and FP. More so in the case of SVE2, which does require a larger area footprint than other executions and makes sense to pair two smaller cores than have just one on its own. However, it is entirely plausible and possible for SoC vendors to use single-core option implementations on their designs if they wish to do so.

Sometimes less is more, and in the case of the Cortex-A520, Arm has removed the third ALU pipeline, which it originally added to the Cortex-A5x DNA with the Cortex-A510. Arm's ideology behind this is it saves power in issue logic and improves forwarding results within the overall complexity of the pipeline. In practice, Arm has figured out how to recover enough of the lost performance through other improvements that they are opting to eat the hit from removing the ALU in order to minimize core size and maximize efficiency.

Ultimately, Arm is looking at a big-picture trade-off as well; reducing the power consumption of the Cortex-A520 frees up energy that can be allocated to the other cores, such as the Cortex-A720 and even the Cortex-X4 where applicable. This makes Armv9.2 IP versatile and scalable, making small savings where it can deliver the savings in other areas where and when it is needed.

Using SPEC2006_int_rate_1copy as its performance metric to judge performance and efficiency, generation on generation (and at iso-process and iso-frequency), Arm is claiming the Cortex-A520 delivers 8% more performance than the Cortex-A510 at similar levels of power consumption. Alternatively, at iso-performance, Cortex-A520 can deliver a significant 22% power savings.

While seemingly small, it can add up in the grand scheme of things, especially across a four-core complex of Cortex-A520 cores. Although there's always a diminishing level of returns in terms of increasing core count when it comes to performance, having lower-powered and more efficient cores typically creates more power for other areas to tap into, such as the big Cortex-X4 core, which requires more grunt to boost those intensive and burst reliant workloads.

Cortex A720: Middle Core, Big on Efficiency New DSU-120: More L3 Cache, Doubling Down on Efficiency
Comments Locked

52 Comments

View All Comments

  • tipoo - Sunday, May 28, 2023 - link

    6 years after iOS went 64 bit only. I'm guessing the cores have also been 64 bit only there for a while?
  • goatfajitas - Monday, May 29, 2023 - link

    IOS is an OS from one company that is made for a few specific products from that one company. You cant evenly compare an open platform to a narrow closed market like that.
  • iAPX - Monday, May 29, 2023 - link

    Yes Apple SoC seems to be 64bit-only for years, that simplify their own design and gives more efficiency.

    As 64bit ARM ISA as nothing in common with 32bit ARM ISA, contrary to the x86 and AMD64, they basically started with a blank page, profiting from experience of various preceding 64bit ISA, I feel it was the right way to go.
  • dotjaz - Monday, May 29, 2023 - link

    No it's not, Apple is not allowed to modify ARM ISA. If it's ARMv8 compliant, it CANNOT possibly be 64bit only.
  • Doug_S - Monday, May 29, 2023 - link

    ARMv8 makes execution of AArch32 optional. Apple may have been responsible for that as they were involved in the spec of ARMv8 and AArch64 - they would have known they'd want to drop 32 bit code as soon as it was practical.
  • dotjaz - Tuesday, May 30, 2023 - link

    That's factually UNTRUE, Aarch32 execution is mandatory in **hardware implementation**, Aarch64 **OS** can choose not to execute Aarch32 codes
  • Doug_S - Tuesday, May 30, 2023 - link

    Sorry but you are wrong, ARMv8 specifically makes support for AArch32 optional for hardware implementations.
  • Jaybird99 - Monday, May 29, 2023 - link

    Apple is a founding partner with an architectural license. They can change anything they wish on the CPU design, then have it fabricated. I thought this was known because of the wildly different core design from Apple. They take the ISA they pick and choose and add/delete what they need. They actually help ARM in the long run as seeing how Apple uses 64bit and finds solutions to their issues, because as stated above 64bit was blank slate for ARM. I'm very fairly certain of this, but if you know something I don't? (I might not..)
  • Doug_S - Monday, May 29, 2023 - link

    An architectural license allows them to implement the ISA, but they can't delete things from it. They are able to add things to it (i.e. TSO, their AMX instructions, etc.) but it still has to pass ARM's conformance tests to show it is capable of running ARM code.

    They were able to "delete" AArch32 because ARMv8 allows that. ARMv9 goes further and makes AArch32 a special license addition or something like that - basically Aarch32 is deprecated with ARMv9 and will probably go away entirely with ARMv10.
  • dotjaz - Tuesday, May 30, 2023 - link

    No, they were not able to "delete" AArch32. They can disallow AArch32 codes execution in their OS just like Google Pixel 7-series, they cannot remove the support from hardware.

    And Apple did not add anything to ARM ISA. AMX is masked as a co-processor only available through frameworks, it doesn't directly execute any code other than a "firmware".

    TSO is not an instruction. It's a **mode**. It pertains to HOW the CPU reorders L/S queue. It has nothing to do with the ISA.

Log in

Don't have an account? Sign up now