Yesterday AMD revealed that in 2014 it would begin production of its first ARMv8 based 64-bit Opteron CPUs. At the time we didn't know what core AMD would use, however today ARM helped fill in that blank for us with two new 64-bit core announcements: the ARM Cortex-A57 and Cortex-A53.

You may have heard of ARM's Cortex-A57 under the codename Atlas, while A53 was referred to internally as Apollo. The two are 64-bit successors to the Cortex A15 and A7, respectively. Similar to their 32-bit counterparts, the A57 and A53 can be used independently or in a big.LITTLE configuration. As a recap, big.LITTLE uses a combination of big (read: power hungry, high performance) and little (read: low power, lower performance) ARM cores on a single SoC. 

By ensuring that both the big and little cores support the same ISA, the OS can dynamically swap the cores in and out of the scheduling pool depending on the workload. For example, when playing a game or browsing the web on a smartphone, a pair of A57s could be active, delivering great performance at a high power penalty. On the other hand, while just navigating through your phone's UI or checking email a pair of A53s could deliver adequate performance while saving a lot of power. A hypothetical SoC with two Cortex A57s and two Cortex A53s would still only appear to the OS as a dual-core system, but it would alternate between performance levels depending on workload.

ARM's Cortex A57

Architecturally, the Cortex A57 is much like a tweaked Cortex A15 with 64-bit support. The CPU is still a 3-wide/3-issue machine with a 15+ stage pipeline. ARM has increased the width of NEON execution units in the Cortex A57 (128-bits wide now?) as well as enabled support for IEEE-754 DP FP. There have been some other minor pipeline enhancements as well. The end result is up to a 20 - 30% increase in performance over the Cortex A15 while running 32-bit code. Running 64-bit code you'll see an additional performance advantage as the 64-bit register file is far simplified compared to the 32-bit RF.

The Cortex A57 will support configurations of up to (and beyond) 16 cores for use in server environments. Based on ARM's presentation it looks like groups of four A57 cores will share a single L2 cache.

ARM's Cortex A53

Similarly, the Cortex A53 is a tweaked version of the Cortex A7 with 64-bit support. ARM didn't provide as many details here other than to confirm that we're still looking at a simple, in-order architecture with an 8 stage pipeline. The A53 can be used in server environments as well since it's ISA compatible with the A57.

ARM claims that on the same process node (32nm) the Cortex A53 is able to deliver the same performance as a Cortex A9 but at roughly 60% of the die area. The performance claims apply to both integer and floating point workloads. ARM tells me that it simply reduced a lot of the buffering and data structure size, while more efficiently improving performance. From looking at Apple's Swift it's very obvious that a lot can be done simply by improving the memory interface of ARM's Cortex A9. It's possible that ARM addressed that shortcoming while balancing out the gains by removing other performance enhancing elements of the core.

Both CPU cores are able to run 32-bit and 64-bit ARM code, as well as a mix of both so long as the OS is 64-bit.

Completed Cortex A57 and A53 core designs will be delivered to partners (including AMD and Samsung) by the middle of next year. Silicon based on these cores should be ready by late 2013/early 2014, with production following 6 - 12 months after that. AMD claimed it would have an ARMv8 based Opteron in production in 2014, which seems possible (although aggressive) based on what ARM told me.

ARM expects the first designs to appear at 28nm and 20nm. There's an obvious path to 14nm as well.

It's interesting to note ARM's commitment to big.LITTLE as a strategy for pushing mobile SoC performance forward. I'm curious to see how the first A15/A7 designs work out. It's also good to see ARM not letting up on pushing its architectures forward.



View All Comments

  • Wilco1 - Tuesday, October 30, 2012 - link

    No, most use ARM's cores. Only Qualcomm, Marvell and now Apple build their own cores. But even they do use ARM designs, for example various Snapdragons use Cortex-A5, Apple used ARM11, Cortex-A8 and A9. Reply
  • wsw1982 - Wednesday, October 31, 2012 - link

    The current atom "medfield", with 5 year arch, 32 nm tech, beat krait, new arch, 28 nm, in power efficiency, die area and lots of benchmark . And quelcomm declared the krait is up 1.5 times better then a9 in ipc, the same as arm declaration of a15. I don't see any advantage of arm. 1 years ago, the arm declare arm a9 is better then atom in performance, and can not beat arm in power efficiency, that's ether means they are bluffing or naive(it's not 10 year prediction which has a lot of unknow factors ) so i don't take their declaration serious unless i see the result. But from what i see now, atom vs krait, where arm is newer and with smaller silicon node, or the battery life of a15, i am not positive about arm Reply
  • Wilco1 - Wednesday, October 31, 2012 - link

    For independent performance benchmarks, check out Geekbench for Z2460 Medfield compared with Galaxy Note 2:

    You can see how a Cortex-A9 beats the 2GHz Z2460 on every single threaded integer and floating point benchmark by a large margin, except for LU decomposition. Atom is obliterated in multithreaded results, even hyperthreading can't save it.

    There is a reason Anand only ever uses Javascript benchmarks, especially Sunspider, and not Geekbench results. Sunspider is the worst imagineable benchmark possible as it is tiny, single threaded and easily gamed using software optimizations.

    Don't you think it is odd that Sunspider is the only benchmark where Medfield seems competitive?
  • andrewaggb - Wednesday, October 31, 2012 - link

    Too bad they are running very different versions of Android. It probably doesn't matter much for the cpu intensive benchmarks provided they are in native code, but I don't know that for sure. Otherwise dalvik improvements could play into it as well.

    Really interested in benchmarks on the atom vs tegra 3 in the surface. assuming we get pcmark, 3dmark, etc for winrt sometime soon... unreal, anything....
  • andrewaggb - Wednesday, October 31, 2012 - link

    This is a bit better. It clearly shows the current atom outperformed by the new cortex a15 by a significant margin. It's still alot of browser benchmarks, but since it's a chromebook and they aren't able to do anything else...

    However power-wise, you'll notice that arm is better, but the power increase in watts from idle to load is about the same on each platform.

    Anyways, silvermont had better be way faster at the same or lower power or they might as well not bother.
  • Wilco1 - Wednesday, October 31, 2012 - link

    Yes the A15 does well on Javascript, but it will do even better on native code. Hopefully there will be a Nexus 10 review soon which shows Geekbench scores, just use Javascript.

    As for power, looking at the increase over idle power doesn't work - the issue with the older Atoms is that idle power is too high. Average power is typically close to idle power, which is why low idle power matters the most.

    Note also that the A15 does 50% more work while still using less power. It will be interesting to see what Silvermont can do, but it will have to compete with 20nm 2+GHz A15's late 2013/early 2014.
  • wsw1982 - Thursday, November 01, 2012 - link

    yes, the Samsung exyons 5 is faster than medfield. No doubt of that. But you are comparing a 6+ w ARM with a 2+ w atom, why don't compare ATOM to the 200+ w tesla? the result could be even more exciting:) by the way, the a15 is, javaspider wise, 2 times worse than the medfield in performance/wat... of course, this comparasion is not fair, my point is this is a apple to orange comparision Reply
  • Wilco1 - Thursday, November 01, 2012 - link

    You meant 8.5W Atom, right? That's the official TDP of the N570 used in the Chromebook (though note how it uses over 12W at load).

    The power results with screen off were 8.32W vs 11.4W. And the A15 is 46% faster on Kraken. So overall the A15 is exactly 2 times as power efficient.

    So I have no idea how you came to the exact opposite conclusion, but you're calculation is wrong.
  • wsw1982 - Thursday, November 01, 2012 - link

    No, I addressed it's medfield in my post. which is in 2+ w range (to my best guess by checking the loading and idle power consumption of razr i during web surfering in iphone 5 review). The N570 is a two to three years old ATOM, and of course is not comparable with a modern ARM. It run javaspider 1.5- slower then sumsung A15, while use arround 1/3 of power. But I said it's not fare to compare them, because they address different market, smart phone and tablet for ATOM, netbook for Samsung a15. I will compare when there is smartphone addressed a15 (use 3 times less power, and God knows how much slower than the a15 in chromebook, could be as fast as krait or apple siwift, or use the same soc but need to be charged every 3 hours:) )come out, or there is atom soc ( I mean SOC and as new as samsung a15) for netbook. It's only fare to compare krait with smartphone atom now, they are the SOCs target the same market Reply
  • Wilco1 - Thursday, November 01, 2012 - link

    Yes it is true that Medfield uses less power than the N570, but mobile phone variants of the A15 will use less power too. Just downclocking a little can easily make a factor 2 difference. Better binning will be possible as A15 production volumes ramp up. Then there is big/little with A7.

    So as you say it is not possible to compare the Chromebook result with Medfield, we have to wait until A15-based phones appear.

Log in

Don't have an account? Sign up now