The ARM Diaries, Part 2: Understanding the Cortex A12

Name: The ARM Diaries, Part 2: Understanding the Cortex A12
Item: The ARM Diaries, Part 2: Understanding the Cortex A12
Author: Anand Lal Shimpi

by Anand Lal Shimpi on July 17, 2013 12:30 PM EST

Posted in
CPUs
Arm
SoCs
Cortex A12

65 Comments | Add A Comment

65 Comments

Performance Expectations & Final Words

ARM’s Cortex A9 was first released to licensees back in 2009, with design work beginning on the core years before that. To say that the smartphone market has changed tremendously over the past several years would be an understatement. Many of the assumptions that were true at the time of the Cortex A9’s development are no longer the case. There’s far more NEON/FP code in use on mobile platforms, higher frequency of memory accesses and much heavier performance demands in general. While the Cortex A9 was a good design for its time, its weaknesses on the FP and memory fronts needed addressing. Thankfully, Cortex A12 modernizes the segment.

Although ARM referred to Cortex A9 as an out-of-order design, in reality it supported out-of-order integer execution with in-order FP and memory operations. ARM’s Cortex A12 moves to an almost completely OoO design. All aspects of the design have been improved as well. Although the Cortex A9 is expected to continue to ramp in frequency over the next year as designs transition to 28nm HPM and beyond, Cortex A12 should deliver much better performance in an more energy efficient manner.

At the same frequency (looking just at IPC), ARM expects roughly a 40% uplift in performance over Cortex A9. The power efficiency and area implications are more interesting. ARM claims that on the same process node as a Cortex A9, a Cortex A12 design should be able to deliver the same or better power efficiency. The design achieves improved power efficiency by throwing more die area at the problem; ARM expects a Cortex A12 implementation to be up to 40% larger than a Cortex A9. Just like the increasing performance of the Cortex A15 line of microarchitectures necessitates development of the Cortex A9/A12 line, the increasing size of this line drives up demand for the Cortex A7/A53 family below it.

ARM’s unique business model allows for the extreme targeting and customization of its microprocessor IP portfolio. If one of its cores gets too large (or power hungry), there’s always a smaller/more energy efficient option downstream.

The Cortex A12 IP has been finalized as of a couple of weeks ago and is now available to licensees for integration. The first designs will likely ship in silicon in a bit over a year, with the first devices implementing Cortex A12 showing up in late 2014 or early 2015. Whether or not the design will be too late once it arrives is the biggest unknown. Qualcomm’s Krait 300 core should provide the smartphone market with an alternative solution, but the question is whether or not the mobile world will need a Cortex A12 when it shows up. We always like to say that there are no bad products, just bad pricing. A more aggressively priced alternative to a Snapdragon 600 class SoC may entice some customers. Until then, the latest revision to the Cortex A9 core (r4) is expected to carry the torch for ARM. ARM also tells us that we might see more power optimized implementations of Cortex A15 in the interim as well.

Back End Improvements

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

65 Comments

View All Comments

Krysto - Saturday, July 20, 2013 - link
The difference is those ARM chips do take full advantage of the maximum core speed. Saying you start a web page - any web page. It WILL activate the maximum clock speed - whereas the Turbo-Boost in Atom doesn't activate all the time.

If we're talking about receiving notifications and such, then obviously the ARM processors won't go to 2 Ghz either, but that's not really what we're talking about here, is it? We're talking about what happens when you're doing normal heavy stuff (web browsing, apps, games).
jeffkibuule - Monday, July 22, 2013 - link
That's the problem I have with performance benchmarks on cell phones. At some point thermal throttling kicks in because you're draining the battery a ton running your CPUs at full tilt. IPC improvements will be felt far more than clock speed ramping. If you ever look at CPU-Z on Android, you'll notice that a Snapdragon 600 with 4 cores clocked at 1.7Ghz tries its hardest to downclock to 1 core at 384Mhz. Even just scrolling up and down the monitoring screen pumps up the CPU speed to 1134Mhz and turns on a second core as well. Peak performance is nice, but ideally should rarely be utilized.
Krysto - Saturday, July 20, 2013 - link
No, I meant it's a problem because Atom chips look like they are "competitive" in benchmarks, when in reality they have HALF the performance. That's what I was saying. It's a problem for US, not Intel. Intel wins by being misleading.
felixyang - Thursday, July 18, 2013 - link
intel didn't mislead you. In SLM's review, they have very clear description about turbo. Copied here.
Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. ........ this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals.
opwernby - Thursday, July 18, 2013 - link
That's not cheating: it's what compilers are supposed to do. For example, if you write, "for (i=0; i<1000; i++);" a good optimizing compiler will analyze the loop, realize that it does nothing, resolve it to "i=1000;" and compile that. I believe the first use of this type of aggressive compiler technology was seen in Sun's C compiler for whatever version of Solaris it was that ran on the Sparc chips back in the '80s. The fact that the ARM compilers didn't do this speaks more about the expected performance of the chipset than anything else: you can build hardware to be as fast as you like, but if the compilers can't keep up, you might as well be running your code on a Commodore Pet.
opwernby - Thursday, July 18, 2013 - link
Speaking of the Sun thing: I distinctly remember that the then-current version of the Sun "pizza-box"-style workstation appeared in benchmarks to be 100 times faster than the IBM PC-RT (another RISC architecture competing with Sun's platform) even though, on paper, the PC-RT was running on faster hardware: analysis of the benchmarks' compiled code revealed that Sun's compiler had effectively edited out the loops as I described above. Result: the PC-RT died off very quickly.
FunBunny2 - Friday, July 19, 2013 - link
The PC-RT didn't last long, but the processor (in its children) lives on as the RS-6000/PPC/iSeries/Z
Wilco1 - Thursday, July 18, 2013 - link
It's certainly cheating, if you followed the whole thing it was not just about ICC optimizing much of the benchmark away. The particular optimization was added recently to ICC - it was a lot more complex than an empty loop, it only optimized a very specific loop by a huge factor (so specific that if you compiled all open source code it would likely only apply to the benchmark and nothing else). For some odd reason AnTuTu then secretly switched to that ICC version despite ICC not being a standard Android compiler. Finally it turned out the settings for ARM were non-optimal, using an older GCC version with pretty much all loop optimizations disabled. Intel and ABI research then started making false claims on how fast Atom was compared to Galaxy S4 based on the parts of AnTuTu that were broken (without actually mentioning AnTuTu).

Giving one side such a huge unfair advantage is called cheating. As a result AnTuTu will now stop using ICC.
jwcalla - Thursday, July 18, 2013 - link
This is why benchmarks have to be taken with a healthy dose of skepticism.

First, if the benchmark program isn't open source, right off the bat it's worthless. If you can't see the code, you can't trust it.

Second, if the program isn't compiled with the same compiler and the same compiler options, the results are crap. You're not getting a valid comparison of the hardware itself.

It's kind of ridiculous seeing many of the journalists out there who took this sensational headline and ran with it without even questioning its legitimacy.
Wilco1 - Wednesday, July 17, 2013 - link
The IPC comparison for integer code goes like:

Silverthorne < A7 < A9 < A9R4 < Silvermont < A12 < Bobcat < A15 < Jaguar

This is based on fair comparisons using Geekbench and so doesn't reflect what some marketing departments claim or what cheated benchmarks (ie. AnTuTu) appear to show.

The ARM Diaries, Part 2: Understanding the Cortex A12

Performance Expectations & Final Words

Post Your Comment

65 Comments

View All Comments

Krysto - Saturday, July 20, 2013 - link

jeffkibuule - Monday, July 22, 2013 - link

Krysto - Saturday, July 20, 2013 - link

felixyang - Thursday, July 18, 2013 - link

opwernby - Thursday, July 18, 2013 - link

opwernby - Thursday, July 18, 2013 - link

FunBunny2 - Friday, July 19, 2013 - link

Wilco1 - Thursday, July 18, 2013 - link

jwcalla - Thursday, July 18, 2013 - link

Wilco1 - Wednesday, July 17, 2013 - link

Log in

Don't have an account? Sign up now