The A12 Vortex CPU µarch

When talking about the Vortex microarchitecture, we first need to talk about exactly what kind of frequencies we’re seeing on Apple’s new SoC. Over the last few generations Apple has been steadily raising frequencies of its big cores, all while also raising the microarchitecture’s IPC. I did a quick test of the frequency behaviour of the A12 versus the A11, and came up with the following table:

Maximum Frequency vs Loaded Threads
Per-Core Maximum MHz
Apple A11 1 2 3 4 5 6
Big 1 2380 2325 2083 2083 2083 2083
Big 2   2325 2083 2083 2083 2083
Little 1     1694 1587 1587 1587
Little 2       1587 1587 1587
Little 3         1587 1587
Little 4           1587
Apple A12 1 2 3 4 5 6
Big 1 2500 2380 2380 2380 2380 2380
Big 2   2380 2380 2380 2380 2380
Little 1     1587 1562 1562 1538
Little 2       1562 1562 1538
Little 3         1562 1538
Little 4           1538

Both the A11 and A12’s maximum frequency is actually a single-thread boost clock – 2380MHz for the A11’s Monsoon cores and 2500MHz for the new Vortex cores in the A12. This is just a 5% boost in frequency in ST applications. When adding a second big thread, both the A11 and A12 clock down to respectively 2325 and 2380MHz. It’s when we are also concurrently running threads onto the small cores that things between the two SoCs diverge: while the A11 further clocks down to 2083MHz, the A12 retains the same 2380 until it hits thermal limits and eventually throttles down.

On the small core side of things, the new Tempest cores are actually clocked more conservatively compared to the Mistral predecessors. When the system just had one small core running on the A11, this would boost up to 1694MHz. This behaviour is now gone on the A12, and the clock maximum clock is 1587MHz. The frequency further slightly reduces to down to 1538MHz when there’s four small cores fully loaded.

Much improved memory latency

As mentioned in the previous page, it’s evident that Apple has put a significant amount of work into the cache hierarchy as well as memory subsystem of the A12. Going back to a linear latency graph, we see the following behaviours for full random latencies, for both big and small cores:

The Vortex cores have only a 5% boost in frequency over the Monsoon cores, yet the absolute L2 memory latency has improved by 29% from ~11.5ns down to ~8.8ns. Meaning the new Vortex cores’ L2 cache now completes its operations in a significantly fewer number of cycles. On the Tempest side, the L2 cycle latency seems to have remained the same, but again there’s been a large change in terms of the L2 partitioning and power management, allowing access to a larger chunk of the physical L2.

I only had the test depth test up until 64MB and it’s evident that the latency curves don’t flatten out yet in this data set, but it’s visible that latency to DRAM has seen some improvements. The larger difference of the DRAM access of the Tempest cores could be explained by a raising of the maximum memory controller DVFS frequency when just small cores are active – their performance will look better when there’s also a big thread on the big cores running.

The system cache of the A12 has seen some dramatic changes in its behaviour. While bandwidth is this part of the cache hierarchy has seen a reduction compared to the A11, the latency has been much improved. One significant effect here which can be either attributed to the L2 prefetcher, or what I also see a possibility, prefetchers on the system cache side: The latency performance as well as the amount of streaming prefetchers has gone up.

Instruction throughput and latency

Backend Execution Throughput and Latency
  Cortex-A75 Cortex-A76 Exynos-M3 Monsoon | Vortex
  Exec Lat Exec Lat Exec Lat Exec Lat
Integer Arithmetic
ADD
2 1 3 1 4 1 6 1
Integer Multiply 32b
MUL
1 3 1 2 2 3 2 4
Integer Multiply 64b
MUL
1 3 1 2 1
(2x 0.5)
4 2 4
Integer Division 32b
SDIV
0.25 12 0.2 < 12 1/12 - 1 < 12 0.2 10 | 8
Integer Division 64b
SDIV
0.25 12 0.2 < 12 1/21 - 1 < 21 0.2 10 | 8
Move
MOV
2 1 3 1 3 1 3 1
Shift ops
LSL
2 1 3 1 3 1 6 1
Load instructions 2 4 2 4 2 4 2  
Store instructions 2 1 2 1 1 1 2  
FP Arithmetic
FADD
2 3 2 2 3 2 3 3
FP Multiply
FMUL
2 3 2 3 3 4 3 4
Multiply Accumulate
MLA
2 5 2 4 3 4 3 4
FP Division (S-form) 0.2-0.33 6-10 0.66 7 >0.16 12 0.5 | 1 10 | 8
FP Load 2 5 2 5 2 5    
FP Store 2 1-N 2 2 2 1    
Vector Arithmetic 2 3 2 2 3 1 3 2
Vector Multiply 1 4 1 4 1 3 3 3
Vector Multiply Accumulate 1 4 1 4 1 3 3 3
Vector FP Arithmetic 2 3 2 2 3 2 3 3
Vector FP Multiply 2 3 2 3 1 3 3 4
Vector Chained MAC
(VMLA)
2 6 2 5 3 5 3 3
Vector FP Fused MAC
(VFMA)
2 5 2 4 3 4 3 3

To compare the backend characteristics of Vortex, we’ve tested the instruction throughput. The backend performance is determined by the amount of execution units and the latency is dictated by the quality of their design.

The Vortex core looks pretty much the same as the predecessor Monsoon (A11) – with the exception that we’re seemingly looking at new division units, as the execution latency has seen a shaving of 2 cycles both on the integer and FP side. On the FP side the division throughput has seen a doubling.

Monsoon (A11) was a major microarchitectural update in terms of the mid-core and backend. It’s there that Apple had shifted the microarchitecture in Hurricane (A10) from a 6-wide decode from  to a 7-wide decode. The most significant change in the backend here was the addition of two integer ALU units, upping them from 4 to 6 units.

Monsoon (A11) and Vortex (A12) are extremely wide machines – with 6 integer execution pipelines among which two are complex units, two load/store units, two branch ports, and three FP/vector pipelines this gives an estimated 13 execution ports, far wider than Arm’s upcoming Cortex A76 and also wider than Samsung’s M3. In fact, assuming we're not looking at an atypical shared port situation, Apple’s microarchitecture seems to far surpass anything else in terms of width, including desktop CPUs.

The Apple A12 - First Commercial 7nm Silicon SPEC2006 Performance: Reaching Desktop Levels
Comments Locked

253 Comments

View All Comments

  • FreckledTrout - Friday, October 5, 2018 - link

    Pretty typical with any high end products. The top 10% pave the way for the rest to have these products at a reasonable price a few years later. You can get an iPhone 7 pretty cheap now.
  • MonkeyPaw - Friday, October 5, 2018 - link

    It’s still cheaper than my first PC, a 486sx2 running at 50mhz. RAM and hard drives were still measured in megabytes, and the internet made noise before connecting and it tied up your phone line when you used it. There has also been about 20 years of inflation. Flagship smartphones are expensive, but they sure do a lot. That doesn’t mean I’m buying one, but we’ve come a long way in my hopefully-less-than-half-a-lifetime.
  • keith3000 - Friday, October 5, 2018 - link

    OMG! Exactly what I was thinking as I read this review on my $225 T-Mobile Rev VL Plus. I may not be able to afford such a technological marvel as the iPhone XS MAX, but I bet I get anywhere from 80-to-90% of the overall functionality for one-fifth the price. I've bought many premium smart phones over the years, starting with the HTC EVO 4G LTE many years ago, followed by Samsung Galaxy S3, then the S4, and even the gigantic Asus Zenfone 3 Ultra. Each phone was better than the one before, and yet each were major disappointments to me for various reasons which I won't go into here. Suffice to say that the ever increasing cost of each phone raised my expectations about what they should be able to do, and thus contributed to my sense of disappointment when they failed to live up to the hype. So when the Zenphone 3 crapped out on me after less than a year of use and I saw this cheap Rev VL Plus, I decided to stop wasting so much money on these overpriced devices and buy something that wouldn't leave me feeling robbed or cheated it it didn't turn out to be the "next best thing". Now, after almost a year of use, I feel like it was a good decision. And if something better comes along in a few months at a similar price point, I can buy it without feeling remorse for having wasted so much money on a phone that didn't last very long. So all you 10-percenters - go ahead and throw away $1,200 on a phone. I'm quite content to have a 2nd rate phone and save a thousand dollars.
  • ws3 - Sunday, October 7, 2018 - link

    You say you spent $225 on your phone less than (but almost) a year ago and then say that you would be willing to replace it immediately if some other phone interested you. So you are apparently willing to spend around $225 for one year of ownership of a phone.

    By this metric you should be willing to spend $1000 on a phone provided you keep it for 4 years or more.

    Now it may the the case that you don’t want to keep any phone for four years, and so the iPhone X[S] is not for you. But here I am with an four year old iPhone 6+, that still works great (thanks to iOS 12). I similarly expect the iPhone X[S] to be good for four years at least, so, although I am not a “10%er”, I am seriously considering purchasing one.

    It’s simply a fallacy to assert that only the wealthy would be interested in the latest iPhone models.
  • FunBunny2 - Sunday, October 7, 2018 - link

    "Now it may the the case that you don’t want to keep any phone for four years, and so the iPhone X[S] is not for you. But here I am with an four year old iPhone 6+, that still works great (thanks to iOS 12). "

    ergo, Apple's problem. unfulfilled TAM for iPhones is disappearing faster than kegs at a Georgetown Prep gathering. keeping one longer than a cycle is a real problem for them. they will figure out a way to stop such disloyalty.
  • ex2bot - Sunday, October 7, 2018 - link

    They’ll find a way, like supporting the 5S and later with iOS 12. /s
  • icalic - Friday, October 5, 2018 - link

    Hi Andrei Frumusanu,
    Thanks for extraordinary review of iPhone Xs!

    in page one you said A12 GPU 4-core "G11P" @ >~1.1GHz, i have several question.
    1. how do you estimate that clockspeed?
    2. if you know that clockspeed can you estimate how many GFLOPs FP32 and FP16 on A12 GPU?
  • syxbit - Friday, October 5, 2018 - link

    Great review of the SoC.
    Please, when you review the Pixel 3, or (in 2019), updated Snapdragons, hold them to this bar.
    I get really frustrated when I see your (or other) reviews complimenting the latest Snapdragon even though they're miles behind the Ax.
    As an Android user, I find it very unfortunate that to get my OS of choice I must get inferior hardware.
  • edzieba - Friday, October 5, 2018 - link

    Phone reviews are a review of the phone, not just the SoC.
  • syxbit - Friday, October 5, 2018 - link

    I know that, but the SoC is the area where Apple are completely dominant.

Log in

Don't have an account? Sign up now