Implementations Choices & Customers

Naturally, the Cortex-X1 is expected to be quite bigger than a Cortex-A78, but not dramatically more. Arm does warn though that for mobile designs it’s extremely unlikely that we’ll see implementations with more than two X1 cores. The company here is essentially embracing the industry trend of going for a three tier core hierarchy, and with the introduction of the A78 and X1, they’re allowing customers to build such systems with much more flexibility and more differentiation than the frequency and process library differentiation we’ve been seeing on today’s “mid” and performance cores.

There’s still going to be customers who may be cost averse or simply not take part in the “Cortex-X Program”, who might just avoid the X1 and just go with A78 cores. The comparison Arm is making here is against an equivalent A77 setup, and the A78 cores would indeed bring a good amount of area savings all while improving performance.

Cortex-X1 implementers would very likely go for a hybrid cluster implementation with X1, A78 and A55 cores in a DSU. Arm here depicts Qualcomm’s favorite 1+3+4 configuration, and it's a logical setup that we’d expect to see in a future Snapdragon chip.

Today’s announcement of the Arm cores also came with an unusual quote from Samsung LSI:

“Samsung and Arm have a strong technology partnership and we are very excited to see the new direction Arm is taking with Cortex-X Custom program, enabling innovation in the Android ecosystem for next-gen user experiences.”

- Joonseok Kim, vice president of SoC design team at Samsung Electronics

It’s extremely rare to hear Samsung talk about a new Arm IP like this during a launch, and I think it’s pretty safe to say that this is very much an indirect confirmation that they’re a licensee of the X1 cores. In which case, we’ll be seeing the core in the next generation of flagship Exynos chipsets. Looking back at what happened with Samsung’s custom CPU design team last year as well as their lackluster performance of their custom cores, the very existence of the X1 probably further sealed the fate for their custom core efforts. The only remaining questions for me is whether they’ll go for a 1+3+4, or a 2+2+4 setup, and if Samsung’s 5nm will showcase better competitiveness compared to their lagging 7nm node.

Meanwhile HiSilicon, being in the middle of political turmoil, probably won't get to produce an X1 chip; plus the vendor has a tendency not always use the latest CPU IPs anyhow. MediaTek would be the last candidate licensee for the X1 – but here I’m also relatively uncertain if the company’s cost-oriented mantra actually fits well with the X1’s philosophy of going all out on area, with the likelihood that it’s also more expensive to license.

First Impressions - Arm Finally Going For Pure Performance

Today’s reveal of the Cortex-A78 and Cortex-X1 brought both the expected and the unexpected. I've had relatively modest expectations of the A78, as for years we had been told it would be the smallest upgrade amongst the new Austin family of Arm CPU microarchitectures. The A76 and A77 were after all both big leaps in performance and IPC. What I didn’t expect was for Arm to really focus on maximizing the PPA of the design, with efficiency being a first-class citizen in terms of design priorities. In that sense, the A78’s performance improvements might be a little tame compared to previous generations, but seemingly it’s still going to be an excellent core that is going to continue Arm's recent strides in outstandingly efficient computing.

Meanwhile the Cortex-X1 is a big change for Arm. And that change has less to do with the technology of the cores, and more with the business decisions that it now opens up for the company, although both are intertwined. For years many people were wondering why the company didn't design a core that could more closely compete with what Apple had built. In my view, one of the reasons for that was that Arm has always been constrained by the need to create a “one core fits all” design that could fit all of their customers’ needs – and not just the few flagship SoC designs.

The Cortex-X program here effectively unshackles Arm from these business limitations, and it allows the company to provide the best of both worlds. As a result, the A78 continues the company’s bread & butter design philosophy of power-performance-area leadership, whilst the X1 and its successors can now aim for the stars in terms of performance, without such strict area usage or power consumption limitations.

In this regard, the X1 seems really, really impressive. The 30% IPC improvement over the A77 is astounding and not something I had expected from the company this generation. The company has been incessantly beating the drum of their annual projected 20-25% improvements in performance – a pace which is currently well beyond what the competition has been able to achieve. These most recent projected performance figures are getting crazy close to the best that what we’ve seeing from the x86 players out there right now. That’s exciting for Arm, and should be worrying for the competition.

Performance & Power Projections: Best of Both Worlds
Comments Locked

192 Comments

View All Comments

  • DanNeely - Tuesday, May 26, 2020 - link

    having gotten to page 4 in the article, the explanation is that ARMs slides as used on the first page suck. The 20% from A77-A78 is +7% architecture, and +13% 5nm instead of 7nm. The 30% from A77-X1 is entirely architecture; that in turn implies that upcoming X1 chips should be about 40-45% faster than current A77 ones.

    AIUI It's still going to be falling short of what Apple's doing (and not just because the A55 little cores are getting really dated); but is a badly needed narrowing of the gap.
  • ichaya - Tuesday, May 26, 2020 - link

    40-45% on appropriately less die area than Apple and you've got something competitive atleast.
  • DanNeely - Tuesday, May 26, 2020 - link

    in terms of engineering prowess certainly; but not in terms of letting Samsung/etc finally design smartphones and tablets that are as fast as their rivals from Apple. Assuming the product plays out in retail, in another 2 or 3 years when I look to replace my S10 I'll probably get something with an X core in it; but I really hope that they'll widen the performance uplift vs their more general purpose cores by then.
  • Raqia - Tuesday, May 26, 2020 - link

    The existing A77's are already very impressive in terms of PPA, I would consider them as impressive as Apple's big cores taken as a whole. (The small cores are another story since the major uplift from the A13...) A lot of Android vendors value area in particular since they integrate modems on die whereas Apple does not; this drives a lot of value and cost savings for customers.
  • CiccioB - Tuesday, May 26, 2020 - link

    If Samsung really wants to create a phone/tablet as fast as an Apple one it should first concentrate more on SW optimizations. Apple puts a lot of efforts in that. It's not only a question who makes the bigger core.
    See the comparison of Samsung crappy phones with other Android ones using much more optimized and less bloated version of the OS. They are good for benchmarking with all those cores and MHz (and tricks on turbo spped for benchmark apps), but in real life Samsung phones are slower than they could be do to low optimizations.
  • Wilco1 - Tuesday, May 26, 2020 - link

    Bingo! Adding a big core that does well in benchmarks is not a good solution. Improving browser performance with software optimization can be far more effective.
  • armchair_architect - Wednesday, May 27, 2020 - link

    @Wilco1 I am afraid you only the SW part of the equation here.
    Again X1 is not only good in benchmarks, being wide helps in that you can achieve same performance as last-gen by running at vastly lower frequency and voltage.
    Thus power efficiency for all use cases that do not require max peak perf enjoy a huge power saving.
  • Wilco1 - Thursday, May 28, 2020 - link

    You can't brute-force your way to performance or efficiency. If you can improve performance via software optimization, you take it any time over a faster core that gives the same gain but needs more power to run the unoptimized software.

    It's as simple as that.
  • armchair_architect - Thursday, May 28, 2020 - link

    Obviously you would ideally need both SW optimization and faster CPUs.
    But again, power will not always be higher and higher power != higher energy usage.
  • Wilco1 - Thursday, May 28, 2020 - link

    Absolutely. But the biggest issue in the Android world is software optimization and tuning, not CPU performance. Improving that would easily add up to a new CPU generation. The choice to switch to LLVM was stupid at the time, but even more so today since GCC has since moved further ahead of LLVM...

    Note all the evidence points to using smaller cores to improve power efficiency. You can see this on the perf/W estimates for SPEC - the A78 is almost twice as efficient as A13 while achieving 74% of the performance.

Log in

Don't have an account? Sign up now