Cortex-A520: LITTLE Core with Big Improvements

The third of the Armv9.2 cores is the Cortex-A520, which is little in design, but Arm promises big improvements over previous generations, particularly on power efficiency.

Addressing the biggest question right off the bat: no, Cortex-A520 is not an out-of-order core design. True to Arm's little core design ethos, it's still an in-order core – and in fact, Arm has even removed an ALU in the process.

Arm's smallest core for this generation is a new core in effect, but it is still more of a refinement of the Cortex-A510 than a completely new design. It has the lowest power-to-area ratio of all three announced Cortex Armv9.2 cores. The most significant differences come through optimizations on power, with Arm claiming that the Cortex-A520 is 22% more energy efficient than the previous Cortex-A510 core at iso-process and iso-frequency. The little core in Arm's TCS23 catalog is primarily designed for performing low-intensity and background operational tasks, which takes these loads off bigger cores such as the Cortex-A720/Cortex-X4 to allow better power efficiency overall within the cluster.

Many of Arm's efficiency gains come from small, microarchitectural level changes, mostly around how it implements data prefetch and branch prediction. On the whole, not much has been changed to the little core, but the small changes that have been made have all been about improving efficiency.

One of the non-architectural areas of improvement has been introducing the new QARMA3 Pointer Authentication Code (PAC) algorithm, which Arm claims to reduce PAC's overhead to under 1%. QARMA3 is a cryptographic-based technique designed to ensure the pointers' integrity is correct and accurate. It also provides a secure and efficient way to avoid tampering with the necessary underlying code so that any authorized modifications or tampering if the pointers are eliminated adds a layer of hardware-level security. Arm is not only leveraging QARMA3 PAC to boost security and integrity, but it also allows them to squeeze out additional levels of efficiency, if compared to using PAC with older algorithms.

Much like when Arm announced its armv9 architecture in 2021, the small Cortex-A520 cores can be merged in pairs to share pipelines and improve efficiency. Adopting a pairing of the smaller Cortex-A520 cores can enhance efficiency by combining them through relevant pipelines such as SVE, NEON, and FP. More so in the case of SVE2, which does require a larger area footprint than other executions and makes sense to pair two smaller cores than have just one on its own. However, it is entirely plausible and possible for SoC vendors to use single-core option implementations on their designs if they wish to do so.

Sometimes less is more, and in the case of the Cortex-A520, Arm has removed the third ALU pipeline, which it originally added to the Cortex-A5x DNA with the Cortex-A510. Arm's ideology behind this is it saves power in issue logic and improves forwarding results within the overall complexity of the pipeline. In practice, Arm has figured out how to recover enough of the lost performance through other improvements that they are opting to eat the hit from removing the ALU in order to minimize core size and maximize efficiency.

Ultimately, Arm is looking at a big-picture trade-off as well; reducing the power consumption of the Cortex-A520 frees up energy that can be allocated to the other cores, such as the Cortex-A720 and even the Cortex-X4 where applicable. This makes Armv9.2 IP versatile and scalable, making small savings where it can deliver the savings in other areas where and when it is needed.

Using SPEC2006_int_rate_1copy as its performance metric to judge performance and efficiency, generation on generation (and at iso-process and iso-frequency), Arm is claiming the Cortex-A520 delivers 8% more performance than the Cortex-A510 at similar levels of power consumption. Alternatively, at iso-performance, Cortex-A520 can deliver a significant 22% power savings.

While seemingly small, it can add up in the grand scheme of things, especially across a four-core complex of Cortex-A520 cores. Although there's always a diminishing level of returns in terms of increasing core count when it comes to performance, having lower-powered and more efficient cores typically creates more power for other areas to tap into, such as the big Cortex-X4 core, which requires more grunt to boost those intensive and burst reliant workloads.

Cortex A720: Middle Core, Big on Efficiency New DSU-120: More L3 Cache, Doubling Down on Efficiency
Comments Locked

52 Comments

View All Comments

  • Doug_S - Tuesday, May 30, 2023 - link

    Yes TSO is a mode, which requires a setting IN THE ISA to be able to enable it. That setting does not exist on ARM CPUs, only on Apple Silicon implementations.

    abr2 found what I didn't have time to look for in the ARMv8 architecture reference manual proving your ridiculous claim that ARMv8 required AArch32 support was wrong. Now you're picking on nits trying to twist my words as if I was claiming TSO is an instruction. Give it up you are wrong, everyone knows it, go away quietly instead of making yourself look like even a bigger fool.
  • dotjaz - Tuesday, May 30, 2023 - link

    And your understanding of ARMv9 is abysmal at best. ARMv9-A made Aarch32 EL0 optional, it wasn't possible in ARMv8-A. There is no special license or "something like that".
  • Chelgrian - Tuesday, May 30, 2023 - link

    It has been possible an architecturally permissible since ARMv8.0 to create an AArch64 only implementation. If AArch32 is not supported at a particular exception level then setting the M[4] bit in the SPSR and executing an ERET instruction to that level will produce an illegal exception return exception. Combined with designing the system to only reset in to AArch64 at the highest implemented exception level gives you an AArch64 only design.

    This tangentially referred to in rule R-tytwb in section D1.3.4 of revision J.a of the ARM Architecture Reference Manual.

    A conformant ARMv8.x implementation can (but it not mandated to) implement AArch32 at any exception level.

    A conformant ARMv9.x implementation may only implement AArch32 at EL0. This is documented in section 3.1 of revision J.a of the ARM Architecture Reference Manual.

    There are even documented ARMv8.1 processors out there which are AArch64 only for example the Cavium ThunderX2

    https://en.wikichip.org/wiki/cavium/thunderx2

    "Only the 64-bit AArch64 execution state is support. No 32-bit AArch32 support."
  • abr2 - Tuesday, May 30, 2023 - link

    From:
    Arm® Architecture Reference Manual
    Armv8, for Armv8-A architecture profile
    [2021 version]

    D1.20.2 Support for Exception levels and Execution states
    Subject to the interprocessing rules defined in Interprocessing on page D1-2525, an implementation of the Arm architecture could support:
    • AArch64 state only.
    • AArch64 and AArch32 states.
    • AArch32 state only.
  • techconc - Thursday, June 8, 2023 - link

    @dotjaz - You don’t know what you’re talking about. The Apple A7 chip supported both A32 and A64 instruction set. By the A11 (in 2017), Apple dropped A32 instruction set and was 64bit only.
  • dotjaz - Tuesday, May 30, 2023 - link

    > I'm very fairly certain of this, but if you know something I don't? (I might not..)

    You are clearly wrong, no ARM licensees can alter ARM ISA in any way. That's the fundation of ARM's licensing terms. And that's the sole reason Apple's AMX extention is masked as undocumented "co-processor" not available to anyone. Even if you knew nothing about the fundamental licensing terms, you should be able to figure that out because if this.
  • name99 - Monday, May 29, 2023 - link

    Jesus. The levels of delusion that are required to write a comment like this.
    You really think that
    (a) ARM is going to make a big deal about Apple being, in some legalistic sense, "non-compliant" AND
    (b) that Apple gives a fsck?

    Exactly who do you think gets hurt if Apple are not allowed to call APPLE SILICON (note that branding...) Arm Compliant?
  • Wereweeb - Tuesday, May 30, 2023 - link

    Lmao apple fanboys still as hilarious and ignorant as always
  • Silver5urfer - Sunday, May 28, 2023 - link

    So much of this nonsensical 64Bit bs. Esp in the name of security, News Flash - Qualcomm EDL mode exists and thankfully it helps the folks to unlock their Bootloaders.

    The whole 64Bit thing killed the passion on Android. Google just enforces it brutally by n-1 where n being the latest API SDK, thus making all the old apps go obsolete. Windows and x86 excels massively just because of this, Apple did it because they always want to control everything which they do, and the stupid Google just copies them in hoping to make same but they killed all fun on android now, the UI is so boring garbage and the whole Filesystem nerfs - Scoped Storage, lack of proepr SD Card app support and a ton of other APIs blacklisted. Limited the scope of foreground and background apps utilizing the hardware of a phone.

    What's the use of the ARM processor devices, when your latest and greatest X4 ARM phone will be outdated in 1 year and goes to dumpster after 2-3 years max. Non Removable, non serviceable, no longevity of the OS / HW / Software. Locked like chastity belt for the User tinkering when the core OS, the Kernel runs Linux. A big L to consumers and all that Environment jabber is literally just a worthless cacophony. Literally you have latest V30 class Micro SDs and SD Association even had PCIe / NVMe SSD class but since not a single $1000-$2000 Android phone pushes forward for a real computer in pocket, its rather a spybox and a mere 2FA device with some Navigation, Social Media, Camera attached.

    All this ARM tech is only useful if your device Software API can open it up properly and used a proper pocket computer. But that ship has sailed. All that X4 processing power and multi core non homogeneous compute wasted on basic consumables.
  • rpg1966 - Monday, May 29, 2023 - link

    Could you explain how the UI is affected by the bitness of the OS?

Log in

Don't have an account? Sign up now