Back End Improvements

The front end of the Cortex A12 is a bit more efficient than the Cortex A9, but the bulk of the performance gains really come from improvements to the execution side of the core. Similar to the Cortex A15, ARM introduced multiple independent issue queues ahead of the functional units. It’s important to get nomenclature right here. Instructions are decoded into micro-ops, renamed instructions are dispatched into the issue queues and then micro-ops are issued from the issue queues when their operands are available. Everything up to the issue queue is handled in order, while issuing can be handled out of order in the Cortex A12 (in most cases, more on this later).

Whereas the Cortex A9 had a single issue queue ahead of all functional units, the Cortex A12 moves to three independent issue queues. The A9’s issue queue could hold 4 decoded instructions, while each issue queue in Cortex A12 is larger than that. The move to larger independent issue queues alone should help with increasing IPC.

The three issue queues are as follows: one for integer, one for FP/NEON and one for loads and stores. ARM provided bits and pieces of an architectural block diagram for the Cortex A12. I reconstructed one as best as I could below. The blue blocks indicate in-order components of the design, while the pink/salmon blocks are out-of-order. You can toggle between the A12 and A9 diagrams to see how things have changed.


Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15, other A-series architectures generally lacked support for hardware int divides). The rest of the integer execution capabilities are unchanged.

The FP/NEON units are vastly improved on the Cortex A12. When the Cortex A9 was first introduced, NEON code was rarely used which even lead NVIDIA to dropping NEON support altogether in Tegra 2. Times quickly changed as NEON code is widely used in Android and mobile applications.

The Cortex A12 design retains separate physical register files for integer and FP operations, but the RFs are larger than in Cortex A9.

Although Cortex A9 was considered an out-of-order microarchitecture, all FP and NEON instructions were executed in-order. With Cortex A12, ARM moves to a fully out-of-order architecture, at least as far as non-memory-ops are concerned. The FP/NEON issue queue now dual-issues into two FP/NEON pipes, both of which operate fully out-of-order. The FP/NEON pipes are also more tightly coupled, allowing for quicker data movement between FP and Integer units.

The improvements to the FP/NEON side are expected to show up quite nicely in benchmarks. ARM shared performance data using an FFMPEG workload on simulated Cortex A9 and Cortex A12 designs at the same frequency with the same number of cores (1):

A 48% increase in NEON performance isn’t unexpected at all given the magnitude of improvements to this part of the execution engine.

The final issue queue feeds the two load-store pipelines with two AGUs, once again a doubling from what was present in the Cortex A9 design. Each pipeline is equally capable (load/store agnostic) and mostly out-of-order (limits on what you can re-order if there are address dependencies between loads). By comparison, the load/store pipe in Cortex A9 was fully in-order.

Introduction to Cortex A12 & The Front End Performance Expectations & Final Words
Comments Locked

65 Comments

View All Comments

  • wumpus - Thursday, July 18, 2013 - link

    Most of the analysis of MIPS implies that it has a chance at the embedded world, but not a prayer where the chips listed in this article play. I would assume that Imagination has a long term plan to break into this market, but it will take some sort of extreme cost/power/performance advantage to convince anyone to give up an architecture. There is a reason that ARM64 is still a dominant architecture, and it has nothing to do with any inherent superiority of the instruction set (indeed, it is a disaster and an unholy kludge. While most of the time "the backward stuff" might be irrelevant to modern computing, it still takes up area, has to be validated, and still has warts that have to be dealt with every design). Changing architectures isn't taken lightly (see how wonderfully windows RT is doing).
  • Qwertilot - Thursday, July 18, 2013 - link

    Isn't the business model the major reason for the architecture getting so dominant? Given how cheaply/freely they license the architecture, you need a really strong motivation and/or massive scale to consider using anything else. Limits ARMs size of course.
  • Mondozai - Friday, July 19, 2013 - link

    "There is a reason that ARM64 is still a dominant architecture, and it has nothing to do with any inherent superiority of the instruction set"

    Sorry that's just lazy.
    ARM is where it is because no comptetitor has managed an alternative that is sufficiently competitive to their architecture.

    Legacy isn't an issue. If ARM was going irrelevant, the switch would occur, and very fast too.
  • fteoath64 - Friday, July 19, 2013 - link

    Qualcomm is not worried right now because it is busy with serving higher priced solutions and it could hardly supply those customers!. Besides, Qualcomm has access to next-gen process in volumes that can crush MediaTek , Actions, HiSilicon etc by dropping prices in the next-gen offerings. That market will run out for MediaTek within 12 months, so Qualcomm is playing the game right for now. All ARM licensees can do whatever they want as long as they stay within their contractual obligations. These Chinese licensees better be careful les they get cut off from the license and had to seek alternative architectures (meaning none, what going Intel is suicide!). The idea of MediaTek and others taking the "lower-tier" of the market for 3-4 quarters is enough for all to feast on the market. They do not want Intel to come in to cream everything off and leaves nothing for the partners to live on. There is strategy and turf protection, these are no dumb companies having made it this far. They know most of the tricks and can outwit the or else they will die. You should know the difference between Chinese vendors and western vendors, the Chinese manufacturers are happy to sell a wholesaler a phone for $110 when it cost them $80 to make. ie profit $30. The wholesaler turns around and sell it for $180 making $40-50 each after all the distribution costs etc. Western companies will see this unit for $300 retail!. The difference is greed even when the manufacturer provided goodwill in the factory price. The reason factories put a limited profit on each unit is to move the volumes because they know full well what the market price is going to be. They do not want to inflate it further.
  • lilmoe - Thursday, July 18, 2013 - link

    I'm sure you have valid points too, but you're not getting my side of the story. MediaTech's solution is pretty solid, yes, but it's strictly Cortex A9. Consumer demand, even in developing countries, is growing, and the need for faster chips is growing as well. By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds.
    Profit margins are way higher in developed countries, it's just a matter of time that even current flagship devices be slashed in price (with a slight change in design and chassis) and they're ready to take on the cheapest Chinese OEMs have to offer...

    Anyway, this doesn't change the fact that ARM made a mistake in its priorities. Cortex A15 (big.LITTLE) should have been targeted for 20/22nm and smaller processes. Priority should have been for Cortex A12. And yes, flagships (hero phones) could have made use of that core if it was ready by this time, especially since it could have competed really well with Krait in both power efficiency and performance (most probably beating it if ARM's claims are to be believed in that regard.
  • Qwertilot - Thursday, July 18, 2013 - link

    The world, and A15 in particular, isn't just mobile phones :) The Samsung Chromebook, early versions of the micro server hardware, shield aren't all massive volume but they've started the process of starting to get Arm chips accepted in a bunch of device classes where they just didn't previously exist.
    (Phone wise it did end up powering a lot of S4's.).

    You can see that could easily be more attractive/important as a priority for ARM than picking fights with existing licensees over high end mobile phones.

    If Qualcomm/ (Intel too of course) end up directly taking on the cheapest that the Chinese OEMs have to offer they've basically already lost as there's then so little profit left. They have to somehow make the case for more expensive but also more powerful/efficient chips. At some point that'll get hard.
  • lilmoe - Thursday, July 18, 2013 - link

    True, it isn't only for mobile devices (smartphones and tablets), but those make the absolute majority of demand. ARM has too much competition to face in the server world, a world that already has tons of existing x86 code. By the time they're ready to seriously fight in the server world, 14nm would be the norm.

    Again, my argument isn't about which chip goes where, or who should compete where, it's about timing and priorities of architecture designs, which ARM clearly screwed up on. At least that's how I see it.
  • Mondozai - Friday, July 19, 2013 - link

    lilmoe wrote:

    "By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds."

    And what makes you think that the competition will stand still until that time?
    They could get better SoC's based on the A-12 by that time, to just name one (out of many) possibilities.
  • lilmoe - Friday, July 19, 2013 - link

    They WILL stand still because Cortex A12 isn't ready to ship yet. It'll be a good 2 years before it's "cheaply" manufactured by the likes of MediaTec.
    By the way, Krait powered Nokia Lumias are going for around and less than the $200 mark right now. People are forgetting that Android isn't the only contender in the market. OS market share will most probably look very different in 2015.
    There are many dynamics running the low power processor market.
  • Wilco1 - Wednesday, July 17, 2013 - link

    It's only late if you consider it a directly competitor of Silvermont. However A9R4 as used by Tegra 4i seems perfectly timed.

Log in

Don't have an account? Sign up now