big.LITTLE: Heterogeneous ARM MP

The Cortex A15 is going to be a significant step forward in performance for ARM architectures. ARM hopes it will be enough to actually begin to threaten the low end of the x86 space, which gives you an idea of just how powerful these cores are going to be. The A15 will also find its way into smartphones and tablets, ultimately replacing the Cortex A9s used by high-end devices today. 

For heavy workloads, the Cortex A15 is expected to be more power efficient than the A9. The core may draw more instantaneous power, but it will do so for a shorter period of time thus allowing the CPU(s) to get to sleep quicker and reducing average power.

As ARM has often argued (particularly against Intel) however, these big out-of-order microprocessor architectures are inefficient at dealing with lightweight mobile workloads. In particular, things like background tasks running on your phone while it’s locked in your pocket simply don’t demand the performance of a Cortex A15. ARM further argues that the power consumed by an A15 running these tasks, even though only for a short period of time, is greater than it would be on a much simpler in-order architecture. This is where the A7 comes into play.

Although the Cortex A7 is fully capable of being used on its own (and it most definitely will be), ARM’s partners are free to integrate Cortex A7 cores alongside Cortex A15 cores in a big.LITTLE (or little.BIG?) configuration. 

Since the A7 and A15 are equally capable of executing the same ARM instruction set, any applications running on one core can just as easily be migrated to run on the other. In the example above there are a pair of A15s and a pair of A7s on a single SoC. In this particular configuration, the OS only believes there are two cores in the machine. ARM’s own power management firmware determines which core cluster to activate depending on performance states requested by the OS. If the OS wants a high performance state, ARM returns the A15 cores at a high p-state. If it wants a low performance state, the chip will put the A15s to sleep and schedule everything on the A7s. Cache coherency is guaranteed via the CCI-400 interconnect, so any data invalidated by one core cluster will be reflected in the other cluster’s cache. ARM claims it can switch between core clusters in this configuration in as quick as 20 microseconds.

If everything works the way ARM has described it, a big.LITTLE configuration should be perfectly transparent to the OS (similar to what NVIDIA is promising with Kal-el). ARM did add that SoC vendors are free to expose all cores to the OS if they would like, although doing so would obviously require OS awareness of the different core types.

Core Configurations, Process Technology & Final Words

ARM’s Cortex A7 will be available in 1 - 4 core configurations, both as the primary CPU in an SoC as well as in a big.LITTLE configuration alongside some A15s. ARM expects that we will see some 40nm A7 designs as early as the end of next year for use in low end smartphones (~$100). Most smartphone configurations, even at these price points will likely use dual-core A7 implementations. It’s only in emerging markets that ARM is expecting to see single core Cortex A7 smartphone devices. This is pretty big news as it means that even value smartphones will be dual-core by 2013.

Costs will keep the A7 on 40nm for a while although the cores will be offered at 28nm for integration into A15 designs as well as for even higher performance/lower power implementations.

I have to say that I’m pretty excited about the Cortex A7 announcement across the board. It looks like this core will not only enable much better performance at the value end of the device spectrum but it should bring battery life improvements at the high end as well. Chip architects have argued for years that we were going to see heterogeneous computing as the next phase in the evolution of microprocessors, it’s fascinating to see that we may get the first consumer application of it in ultra mobile devices.

Architecture
POST A COMMENT

76 Comments

View All Comments

  • Snowstorms - Thursday, October 20, 2011 - link

    these guys are so close to competing for desktop x86 space, they are already down to 28nm and are using the exact same ASML as Intel and AMD Reply
  • Pessimism - Thursday, October 20, 2011 - link

    Does anyone else find their naming conventions infuriating?
    11->7->8->9->15 HUH?
    plus they label their instruction sets with an A and a similar but different number... and the other licensees of ARM have their own gimpy naming conventions with one for the base core and one for the chip and so on and so forth...
    Reply
  • iwod - Thursday, October 20, 2011 - link

    11 is ARM 11 which is different to Cortex Series

    If we inflate number based on timing they A7 would have been named A16.

    A8 is replaced by A9 and will be subsequently by A15.

    A7 is more like an replacement for the ultra low power A5. It just happen to be even more powerful then A8 therefore they used A8 as comparison.
    Reply
  • mczak - Thursday, October 20, 2011 - link

    I don't think A15 is really a replacement of A9.
    Each member of the Cortex-A family seems to have its place, except the A8 (which is the oldest and obsolete).
    A5: very small, low power - though once you include larger L1 caches and NEON it seems A7 would be a better fit as it's hardly smaller anymore.
    A7: small, low power, probably highest perf/power and perf/area of the whole family. Unlike A5 l1 cache sizes are fixed and NEON always included, and with all features of the A15 (including virtualization for instance).
    A8: a dud, unlike all others not MP capable and with nonpipelined FPU. Worst efficiency of the family (by a large margin) in perf/area and perf/power. I don't think there's any reason at all why you'd want to use this in a new design. In some areas it might be faster than A7 (I think NEON might be twice as fast).
    A9: similar size to A8 but quite a bit more advanced (out-of-order) and with higher efficiency.
    A15: quite a beast compared to A9, much more complex and faster, but much bigger - the first to target low-power servers too. Efficiency might be similar to A9, not sure.

    Of course, this completely ignores the timeframe - A8 was the only option for quite some time, and apart from that only A9 has made it to devices yet (I think we should see A5 soon enough - MSM7227A has Cortex-A5 and possibly quite a few low-end smartphones might use it).
    Reply
  • ET - Thursday, October 20, 2011 - link

    I do see the A15 as a replacement for the A9. The high end was A9 and will be A15. The mid range was A8 and will be A7. Both will offer significant performance increases. A9 will survive a little longer (as will the A8, probably), but I don't think it has a real place between the A15 and A7.

    As for the names, ARM Cortex family names reflects core complexity or size, far as I understand, not how new the core is.
    Reply
  • C300fans - Thursday, October 20, 2011 - link

    Intel has his ACE, atom E6xx, which has already been proved to be much more power efficient than ARM. For example, Sony PRS T1 is using intel 1Ghz processor(Atom E640) in his e-book reader runing on Andorid. Reply
  • mczak - Thursday, October 20, 2011 - link

    Good point though there should be really a rather large difference in performance (and in chip size) between a A7 and a A15, hence I think there's still some room for A9 (which should still be a fair bit faster than A7 after all) - say for a low-end smartphone in 2013. I guess though this will rather depend on the licensing cost differences (afaik the less complex designs are cheaper) between A7/A9/A15. Reply
  • iwod - Thursday, October 20, 2011 - link

    I dont think the replacement are intentional. it is the market overlaping when one product's power and effcieny leap forward.

    It is like a Pentium SandyBridge was never meant to replace the good old Core 2 Duo, but it is just SandyBridge being so much better and cheaper to happens to replaced it.
    Reply
  • nofumble62 - Thursday, October 20, 2011 - link

    is that ARM cannot boost core performance without sucking up more power. The power versus performance chart shows a straight line. So they have to employ this trick to maintain power efficiency, while getting enough horse power for difficult tasks.

    Believe Intel has done this sort of thing already. What else is new?

    What's next for ARM? double the core count?
    Reply
  • iwod - Thursday, October 20, 2011 - link

    No one can boost performance without sucking more power. But ARM still have many more things to do with IPC. And as a matter of fact, a Quad Core Cortex A15 @ 2.5Ghz is very capable, and faster then a Core 2 Duo. Reply

Log in

Don't have an account? Sign up now