Performance Expectations & Final Words

ARM’s Cortex A9 was first released to licensees back in 2009, with design work beginning on the core years before that. To say that the smartphone market has changed tremendously over the past several years would be an understatement. Many of the assumptions that were true at the time of the Cortex A9’s development are no longer the case. There’s far more NEON/FP code in use on mobile platforms, higher frequency of memory accesses and much heavier performance demands in general. While the Cortex A9 was a good design for its time, its weaknesses on the FP and memory fronts needed addressing. Thankfully, Cortex A12 modernizes the segment.

Although ARM referred to Cortex A9 as an out-of-order design, in reality it supported out-of-order integer execution with in-order FP and memory operations. ARM’s Cortex A12 moves to an almost completely OoO design. All aspects of the design have been improved as well. Although the Cortex A9 is expected to continue to ramp in frequency over the next year as designs transition to 28nm HPM and beyond, Cortex A12 should deliver much better performance in an more energy efficient manner.

At the same frequency (looking just at IPC), ARM expects roughly a 40% uplift in performance over Cortex A9. The power efficiency and area implications are more interesting. ARM claims that on the same process node as a Cortex A9, a Cortex A12 design should be able to deliver the same or better power efficiency. The design achieves improved power efficiency by throwing more die area at the problem; ARM expects a Cortex A12 implementation to be up to 40% larger than a Cortex A9. Just like the increasing performance of the Cortex A15 line of microarchitectures necessitates development of the Cortex A9/A12 line, the increasing size of this line drives up demand for the Cortex A7/A53 family below it.

ARM’s unique business model allows for the extreme targeting and customization of its microprocessor IP portfolio. If one of its cores gets too large (or power hungry), there’s always a smaller/more energy efficient option downstream.

The Cortex A12 IP has been finalized as of a couple of weeks ago and is now available to licensees for integration. The first designs will likely ship in silicon in a bit over a year, with the first devices implementing Cortex A12 showing up in late 2014 or early 2015. Whether or not the design will be too late once it arrives is the biggest unknown. Qualcomm’s Krait 300 core should provide the smartphone market with an alternative solution, but the question is whether or not the mobile world will need a Cortex A12 when it shows up. We always like to say that there are no bad products, just bad pricing. A more aggressively priced alternative to a Snapdragon 600 class SoC may entice some customers. Until then, the latest revision to the Cortex A9 core (r4) is expected to carry the torch for ARM. ARM also tells us that we might see more power optimized implementations of Cortex A15 in the interim as well.

Back End Improvements


View All Comments

  • JDG1980 - Wednesday, July 17, 2013 - link

    So, roughly speaking, how does ARM IPC compare to x86? Obviously it's not going to be as high as on modern big-core desktop x86 parts like SB/IB/Haswell, but how does it compare to Atom (both the current generation and the new one in the pipeline) and Bobcat/Kabini? Reply
  • nathanddrews - Wednesday, July 17, 2013 - link

    I think that was covered before:
  • coder543 - Wednesday, July 17, 2013 - link

    The performance expectations (which relate to IPC) were misguided though. Intel's compiler was, essentially, cheating by skipping entire sections of the benchmark. Reply
  • DanNeely - Wednesday, July 17, 2013 - link

    If I'm interpreting the labels on that graph correctly ("K900 at 1.0"); they've under clocked the atom by 50% (from 2ghz to 1ghz) from what it normally operates at; which would flip the results back to Intel winning the majority of the tests. Reply
  • Krysto - Wednesday, July 17, 2013 - link

    No, that's actually the *real* clock speed of Atom, which Intel misleadingly calls a "2 Ghz" core. The 2 Ghz speed is the turbo-boost speed, just like for laptops Haswell will really be clocked at 1.3 Ghz (same performance level as 1.5 Ghz IVB), and it goes up to 2.3 Ghz, or whatever, with Turbo-Boost.

    The problem is that I think benchmarks do use the Turbo-Boost speed fully, which means Atom will do very well in benchmarks, while that may not be the case in real day to day life, where the phone might never activate the Turbo-Boost speed, and just use the slower "real clock speed" all the time.
  • name99 - Wednesday, July 17, 2013 - link

    The fact of Turbo-Boost is a problem for lazy benchmarking, it is NOT a problem for Intel.

    Why do you want a high speed core in your phone? There is a population that wants to run aggressive games, or transcode video or whatever, and these people care about sustained performance. But they're in the dramatic minority. For most users, the value of a high speed core is that it makes the phone more zippy, meaning that operations are fast when they need to be fast, after which the phone can go back to sleeping. The user-level "speed" of a phone is measured by how fast it draws a (single) PDF page, or renders a (single) complex web page, or launches an app, not by how it performs over any task that takes longer than a second. In such a world, if Turbo-Boost allows the app to sprint for a second, then go back to low-power mode, the user is very happy with that behavior.

    The only "problem" with this strategy, for Intel, is that it is obvious and will be copied by everyone. Intel is there first and most aggressively for historical and process reasons, but there's no reason they will remain the only player.
    (It's also quite likely that competitors will adopt the ideas of Turbo-Boost, just never call it that. After all the problem to be solved for phones is different from the REAL Turbo-Boost problem. Turbo-Boost comes from a world where you run the chip as hot as it can go --- till it just about to overheat. If an ARM core has no danger of actually overheating, then the design space is different. Now it's simply "we'll rate the core for 2GHz, but at that speed it uses up 10 nJ/op, so as far as possible we'll try to run it a 1GHz (using up 2 nJ/op) or better yet 100MHz (using up 10 pJ/op) [all numbers made up, but you get the point].)
  • Wilco1 - Wednesday, July 17, 2013 - link

    All CPUs typically run at a far lower frequency than the maximum - I'm sure nobody believes eg. Krait 800 runs all 4 cores at 2.3GHz all the time. So if you call a specific frequency the "base" then anything faster than that is automatically a turbo boost. In that sense Intel's turbo boost is largely marketing, a way to claim a low TDP by setting the base frequency arbitrarily low, and allowing to go well over that TDP for a certain amount of time at a much higher boost frequency. Reply
  • inighthawki - Wednesday, July 17, 2013 - link

    My 3.5/3.9GHz advertised i7 4770K runs at 800Mhz at idle :) Reply
  • TomWomack - Thursday, July 18, 2013 - link

    My desktop Haswell with Intel's retail cooler runs all four cores 24/7 turbo-boosted - I'm not quite sure what to, it reports 3401MHz in Linux but that's because /proc/cpuinfo asks ACPI which isn't fully compatible with turbo-boost. The machine draws 100W from the mains while doing so, which (given that it idles at 28W) is entirely consistent with the 84W TDP.

    And, indeed, it runs at 800MHz at idle; and I suspect often slower than that, but /proc/cpuinfo doesn't report C-states
  • RicDavis - Friday, July 19, 2013 - link

    Intel's turbostat has proven very useful for getting good reporting of CPU clock speeds under Linux. with the -v option it also displays the maximum speeds that CPU will run at as the no of active cores varies. Recommended. i7z is another option, but I've seen it do a bad job of showing which cores are active when hyperthreading is enabled. Reply

Log in

Don't have an account? Sign up now