Implementations Choices & Customers

Naturally, the Cortex-X1 is expected to be quite bigger than a Cortex-A78, but not dramatically more. Arm does warn though that for mobile designs it’s extremely unlikely that we’ll see implementations with more than two X1 cores. The company here is essentially embracing the industry trend of going for a three tier core hierarchy, and with the introduction of the A78 and X1, they’re allowing customers to build such systems with much more flexibility and more differentiation than the frequency and process library differentiation we’ve been seeing on today’s “mid” and performance cores.

There’s still going to be customers who may be cost averse or simply not take part in the “Cortex-X Program”, who might just avoid the X1 and just go with A78 cores. The comparison Arm is making here is against an equivalent A77 setup, and the A78 cores would indeed bring a good amount of area savings all while improving performance.

Cortex-X1 implementers would very likely go for a hybrid cluster implementation with X1, A78 and A55 cores in a DSU. Arm here depicts Qualcomm’s favorite 1+3+4 configuration, and it's a logical setup that we’d expect to see in a future Snapdragon chip.

Today’s announcement of the Arm cores also came with an unusual quote from Samsung LSI:

“Samsung and Arm have a strong technology partnership and we are very excited to see the new direction Arm is taking with Cortex-X Custom program, enabling innovation in the Android ecosystem for next-gen user experiences.”

- Joonseok Kim, vice president of SoC design team at Samsung Electronics

It’s extremely rare to hear Samsung talk about a new Arm IP like this during a launch, and I think it’s pretty safe to say that this is very much an indirect confirmation that they’re a licensee of the X1 cores. In which case, we’ll be seeing the core in the next generation of flagship Exynos chipsets. Looking back at what happened with Samsung’s custom CPU design team last year as well as their lackluster performance of their custom cores, the very existence of the X1 probably further sealed the fate for their custom core efforts. The only remaining questions for me is whether they’ll go for a 1+3+4, or a 2+2+4 setup, and if Samsung’s 5nm will showcase better competitiveness compared to their lagging 7nm node.

Meanwhile HiSilicon, being in the middle of political turmoil, probably won't get to produce an X1 chip; plus the vendor has a tendency not always use the latest CPU IPs anyhow. MediaTek would be the last candidate licensee for the X1 – but here I’m also relatively uncertain if the company’s cost-oriented mantra actually fits well with the X1’s philosophy of going all out on area, with the likelihood that it’s also more expensive to license.

First Impressions - Arm Finally Going For Pure Performance

Today’s reveal of the Cortex-A78 and Cortex-X1 brought both the expected and the unexpected. I've had relatively modest expectations of the A78, as for years we had been told it would be the smallest upgrade amongst the new Austin family of Arm CPU microarchitectures. The A76 and A77 were after all both big leaps in performance and IPC. What I didn’t expect was for Arm to really focus on maximizing the PPA of the design, with efficiency being a first-class citizen in terms of design priorities. In that sense, the A78’s performance improvements might be a little tame compared to previous generations, but seemingly it’s still going to be an excellent core that is going to continue Arm's recent strides in outstandingly efficient computing.

Meanwhile the Cortex-X1 is a big change for Arm. And that change has less to do with the technology of the cores, and more with the business decisions that it now opens up for the company, although both are intertwined. For years many people were wondering why the company didn't design a core that could more closely compete with what Apple had built. In my view, one of the reasons for that was that Arm has always been constrained by the need to create a “one core fits all” design that could fit all of their customers’ needs – and not just the few flagship SoC designs.

The Cortex-X program here effectively unshackles Arm from these business limitations, and it allows the company to provide the best of both worlds. As a result, the A78 continues the company’s bread & butter design philosophy of power-performance-area leadership, whilst the X1 and its successors can now aim for the stars in terms of performance, without such strict area usage or power consumption limitations.

In this regard, the X1 seems really, really impressive. The 30% IPC improvement over the A77 is astounding and not something I had expected from the company this generation. The company has been incessantly beating the drum of their annual projected 20-25% improvements in performance – a pace which is currently well beyond what the competition has been able to achieve. These most recent projected performance figures are getting crazy close to the best that what we’ve seeing from the x86 players out there right now. That’s exciting for Arm, and should be worrying for the competition.

Performance & Power Projections: Best of Both Worlds
Comments Locked

192 Comments

View All Comments

  • Andrei Frumusanu - Wednesday, June 3, 2020 - link

    > The choice to switch to LLVM was stupid at the time, but even more so today since GCC has since moved further ahead of LLVM...

    GCC's problem is its license. Apple nor Google would be able to integrate it into the IDE like Xcode/Android Studio. In the grand scheme of things, going LLVM is a much better choice, even if it's slower than GCC.
  • ksec - Tuesday, May 26, 2020 - link

    The 40-45% figure assumes X-1 could run at 3Ghz within its TDP budget.

    And even with that in mind the figures Anandtech put up shows it is still behind A13.

    Not bad for the rest of the ARM ecosystem. But still not quite there yet.
  • MarcGP - Tuesday, May 26, 2020 - link

    Behind the A13 ?, you missed the estimation chart where it shows the X1 reaching the A13 performance (a bit lower in integer performance and a bit higher in floating poing performance) at a much lower power consumption.
  • ksec - Wednesday, May 27, 2020 - link

    Behind in IPC. The chart put the X1 with an 5nm node with 15% clock speed increase against a 7nm Node A13 with non sustainable 2.63 Ghz Clock.

    Also worth noting this is 7nm+ not 7nm EUV from TSMC. So if you put the node aside those number would likely still put it under A13.
  • dotjaz - Tuesday, May 26, 2020 - link

    You understand INCORRECTLY. 30% is for the same frequency and 20% is the same power. you DID read it wrong.
  • dotjaz - Tuesday, May 26, 2020 - link

    With the same baseline, A77@2.6GHz, then A78@3GHz is +20%, X1@3GHz is +50%
  • ZolaIII - Wednesday, May 27, 2020 - link

    Nope you are wrong. First off all given constant power delta for something which goes into phone the A78 will be a rather significant improvement over A77 with same performance at half the power budget. A77 already had a lead over Apple big core's regarding the performance/W metric & and this means more than brute force approach. Yes Apple big core's are supperio but on something that has power budget of a laptop. On the other hand X1 is a direct take on those apple core's & it should be up to 2x faster than A78 in tasks which are optimised and utilities FP SIMDs basically SMP tasks. This is more relevant to server tasks and not so much for mobile space, still I would like to see more advanced SIMD blocks and their inclusion on smaller core's with SMT as SIMDs are hard to feed optimally and front end expansion there for is a must but it can be done in a more elegant manner like for instance MIPS did with VMT. ARM desperately needs power efficient basic OoO core a successor of A73 if you like with DynamiQ integration as an A55 replacement. Their is a A65AE but we didn't seen any implementation of it in any space so far.
  • Santoval - Friday, May 29, 2020 - link

    It is not even an apples for apples comparison, since A78 has +20% *sustained* performance over A77, while X1 has +30% *peak* performance. Therefore the sustained performance lead of X1 over A77 might be in the +25% ballpark. Is a mere extra 5 - 10% performance over A78 really worth a 30% larger die area and quite higher TDP? Unless Arm can increase the performance lead of X1 over A78 at least another 20% I don't see the former being an attractive (or even a sane) licence and purchasing option.
  • ChrisGX - Monday, July 6, 2020 - link

    The X1 exhibits 22% performance advantage over the A78 when process and frequency are controlled factors. So, yes, X1 performance is 1.22xA78. The performance improvement of the A78 over the A77 however includes a process node and frequency change, 20% all up. So, the performance of the X1 is: (A77 * 1.2) * 1.22 or 1.46xA77.
  • ChrisGX - Monday, July 6, 2020 - link

    Please note Andrei seems to have made assumptions something like this in his calculations with A77 SPECspeed/performance at 2.6GHz being something in the order of 32 (which seems reasonable).

Log in

Don't have an account? Sign up now