Back End Improvements

The front end of the Cortex A12 is a bit more efficient than the Cortex A9, but the bulk of the performance gains really come from improvements to the execution side of the core. Similar to the Cortex A15, ARM introduced multiple independent issue queues ahead of the functional units. It’s important to get nomenclature right here. Instructions are decoded into micro-ops, renamed instructions are dispatched into the issue queues and then micro-ops are issued from the issue queues when their operands are available. Everything up to the issue queue is handled in order, while issuing can be handled out of order in the Cortex A12 (in most cases, more on this later).

Whereas the Cortex A9 had a single issue queue ahead of all functional units, the Cortex A12 moves to three independent issue queues. The A9’s issue queue could hold 4 decoded instructions, while each issue queue in Cortex A12 is larger than that. The move to larger independent issue queues alone should help with increasing IPC.

The three issue queues are as follows: one for integer, one for FP/NEON and one for loads and stores. ARM provided bits and pieces of an architectural block diagram for the Cortex A12. I reconstructed one as best as I could below. The blue blocks indicate in-order components of the design, while the pink/salmon blocks are out-of-order. You can toggle between the A12 and A9 diagrams to see how things have changed.


Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15, other A-series architectures generally lacked support for hardware int divides). The rest of the integer execution capabilities are unchanged.

The FP/NEON units are vastly improved on the Cortex A12. When the Cortex A9 was first introduced, NEON code was rarely used which even lead NVIDIA to dropping NEON support altogether in Tegra 2. Times quickly changed as NEON code is widely used in Android and mobile applications.

The Cortex A12 design retains separate physical register files for integer and FP operations, but the RFs are larger than in Cortex A9.

Although Cortex A9 was considered an out-of-order microarchitecture, all FP and NEON instructions were executed in-order. With Cortex A12, ARM moves to a fully out-of-order architecture, at least as far as non-memory-ops are concerned. The FP/NEON issue queue now dual-issues into two FP/NEON pipes, both of which operate fully out-of-order. The FP/NEON pipes are also more tightly coupled, allowing for quicker data movement between FP and Integer units.

The improvements to the FP/NEON side are expected to show up quite nicely in benchmarks. ARM shared performance data using an FFMPEG workload on simulated Cortex A9 and Cortex A12 designs at the same frequency with the same number of cores (1):

A 48% increase in NEON performance isn’t unexpected at all given the magnitude of improvements to this part of the execution engine.

The final issue queue feeds the two load-store pipelines with two AGUs, once again a doubling from what was present in the Cortex A9 design. Each pipeline is equally capable (load/store agnostic) and mostly out-of-order (limits on what you can re-order if there are address dependencies between loads). By comparison, the load/store pipe in Cortex A9 was fully in-order.

Introduction to Cortex A12 & The Front End Performance Expectations & Final Words
POST A COMMENT

65 Comments

View All Comments

  • crypticsaga - Wednesday, July 17, 2013 - link

    Is it just me or have the arm architecture reviews become more interesting than the intel or amd ones? Reply
  • WhitneyLand - Wednesday, July 17, 2013 - link

    Your tastes are changing. I used to read power supply and case reviews, now I usually don't even peek at the summary. I guess you could say some people are interested in where the action is, and while Intel is making bold moves (I still read Intel articles) there are still many things that we already know from previous reviews. ARM is still novel to us. Reply
  • jeffkibuule - Monday, July 22, 2013 - link

    They are constantly evolving. Intel isn't experiencing hyper growth in performance anymore, instead optimizing for power which is far less sexy. Reply
  • lilmoe - Wednesday, July 17, 2013 - link

    Cortex A12 is too late to the game. Too frustratingly so. Reply
  • name99 - Wednesday, July 17, 2013 - link

    What game is that? ARM is in the business of making money. It will sell hundreds of millions of these to the developing world, and make money on each one.
    If you want to view low-power CPUs as porn, not a business, you should be spending your time watching Apple (and to a lesser extent, Qualcomm and, maybe one-day Intel) not ARM.
    Reply
  • lilmoe - Wednesday, July 17, 2013 - link

    I'm strictly speaking business, not porn, thank you. To me, Cortex A12 is addressing the same problem that Krait and Swift cores addressed (specifically in memory bandwidth), where competitors are clearly >2 years ahead by the time of availability. Look how "successful" A15 vendors are (/s) while Qualcomm is taking a huge share of the pie.

    big.LITTLE is proving too hard to implement. Samsung has succeeded in providing Apple with their needs when other fabs failed, all while having difficulties with their implementation of big.LITTLE. Samsung even ditched the CHEAPER licence of ARM's Mali GPU in favor of IT's solution. There's clearly a problem somewhere. Yes, Cortex A15 is faster, but "average" performance of Krait 200 compared with big.LITTLE (a15/a7) is VERY comparable. However, in heavy workloads, Cortex A15 consumes significantly more power.

    ARM has this "view" on how the market "should" be heading, while the market is heading in a clearly different power-envelope/performance direction. Reason? Android. Cortex A9 is not powerful enough for Android, and A15 consumes too much power. I'm a big believer in power efficiency, but ARM seriously need to revise their power envelope charts. Cortex A15 should have been a target of the 20/22nm process, NOT 28nm. That's how demand is working now. Cortex A12 SHOULD HAVE BEEN prioritized over Cortex A15 on 28nm. OEMs (including Samsung) are preferring Snapdragons over Exynos 5 and Tegra 4 even on more power tolerable devices like tablets.

    That said, even they're high performance Cortex A15 is seriously threatened by Krait 300 and Silvermont cores in power efficiency at relative performance. And by the time A57 is implemented, where do you think the competition will be?

    Someday Intel? Dude, for $400, can either get a Windows RT ARM tablet, or a FULL Windows 8 tablet running Saltwell (and Silvermont in the very near future), which one would you pick? Android tablet you say? Guess what Samsung is doing now with their Tab 3 10.1.

    Developing countries? Don't worry about those, Krait 200 cores will be too darn cheap in 2 years when A12 is ready to ship. Oh, they also have modem integration......

    The business world works VERY differently from the world enthusiasts live in...
    Reply
  • aryonoco - Wednesday, July 17, 2013 - link

    You make very valid points, but you have a very developed-world-centric point of view.

    Yes A12 is too late, it is a response to Swift and Krait, and it is over two years too late. We'll have to wait a while still of course to see how it stacks up against its competition in 2015, but I agree with you in that it should have been prioritized, and it's late.

    What you are missing however is that there is a huge swath of the market where ARM doesn't really have a competition. Swift and Krait are non-entities there, they are too expensive for your average Chinese OEM like Zopo or THL or Cube or FAEA (or many dozen others). These phones and tablets are now ruling China, India and south-east Asia, and they are all using Mediatek Cortex A7s in the phones, and Rockchip or Allwinner Cortex A9s in their tablets and Android sticks. These are huge markets, we are talking over 3.5 Billion people who live in these countries, and yes Samsung and Apple sell their phones there as well, but they are tiny (especially Apple). Something like 37% of ARM's Cortex A chips were produced by Mediatek alone in 2012, and no one in the developed world has even heard of them.

    Sure Qualcomm is trying to play in this market, with their confusingly named Snapdragon S4 Play (which is just a cortex A5) but Mediatek beats it hands down both in performance and price. Modem integration you say? It just isn't a factor. They all have single band HSPA modems on 2100 Mhz, or at best dual band 850/2100 or 900/2100 and that's all that these markets care about. Oh and dual SIM, they really care about their dual SIM. No one cares about Qualcomm's fancy pants LTE modems, there are no networks to use them on!

    Will you ever see Cortex A12 in a flagship Samsung or Apple or HTC product? Probably not, but that doesn't really matter. You'll see it in hundreds of millions of products that will line up on AliExpress and Taobao and that's where the vast bulk of the market is going to be.
    Reply
  • parim - Thursday, July 18, 2013 - link

    I completely agree with all your points, the sub 200$ android phone market in India is dominated by MediaTech.

    All of India is on 2100Mhz HSPA

    Qualcomm/nvidia need to reduce their prices by a lot to prevent being completely washed away by the MediaTech Wave
    Reply
  • Qwertilot - Thursday, July 18, 2013 - link

    The other thing to remember is that ARM aren't in competition with Qualcomm, Apple etc. You could even reasonably argue that the existence of Krait meant they didn't need to cover that niche as urgently as the newer markets that the A15/57 etc can be aimed at.

    Folk like MediaTech must be a massive long term worry for everyone in the SoC market (and Intel trying to break into it) - at some point (reasonably soon?) the bulk of people even in the developed world might well decide that they've got 'enough' performance and then all those very cheap chips will be waiting to destroy margins. ARM are of course set up not to mind.
    Reply
  • aryonoco - Thursday, July 18, 2013 - link

    Of course you are right, ARM doesn't really view Krait and Swift as its competition... but just because ARM is happy to license its instruction set doesn't mean they like Krait and Swift. ARM gets a higher initial revenue when it licenses out its instruction set, but after that, the percentage that it gets on shipping Krait and Swift cores is far lower than what it gets from a straight implementation of one of its Cortex A designs... in ARM's ideal world, everyone would be implementing straight from its designs and giving it the higher percentage fee.

    And yes, MediaTek is a massive worry to everyone the big guys. There is also a price war happening between Chinese companies (Rockchip and Allwinner) and MediaTek (which is Taiwanese) right now, price of highest end products have come down by over a third since the start of the year... you can now buy MT's highest end 1.5 Ghz quad core A7 chip for $8... this is just crazy!

    Of course ARM is sitting comfortable at the top, but the one company that can give ARM some headache is Imagination. They've been beating ARM in the embedded GPU space for a good while now, and now that they have a CPU architecture of their own... this is going to get very interesting. In the 90s, MIPS had a far better IPC than ARM, so if Imagination really is set to revive MIPS and get aggressive on price, it will be very interesting to watch, especially with over 95% of Android apps working just fine on MIPS. Pity no one is paying attention to that battle, AnandTech didn't even cover the launch of Warrior.
    Reply

Log in

Don't have an account? Sign up now