Implementations Choices & Customers

Naturally, the Cortex-X1 is expected to be quite bigger than a Cortex-A78, but not dramatically more. Arm does warn though that for mobile designs it’s extremely unlikely that we’ll see implementations with more than two X1 cores. The company here is essentially embracing the industry trend of going for a three tier core hierarchy, and with the introduction of the A78 and X1, they’re allowing customers to build such systems with much more flexibility and more differentiation than the frequency and process library differentiation we’ve been seeing on today’s “mid” and performance cores.

There’s still going to be customers who may be cost averse or simply not take part in the “Cortex-X Program”, who might just avoid the X1 and just go with A78 cores. The comparison Arm is making here is against an equivalent A77 setup, and the A78 cores would indeed bring a good amount of area savings all while improving performance.

Cortex-X1 implementers would very likely go for a hybrid cluster implementation with X1, A78 and A55 cores in a DSU. Arm here depicts Qualcomm’s favorite 1+3+4 configuration, and it's a logical setup that we’d expect to see in a future Snapdragon chip.

Today’s announcement of the Arm cores also came with an unusual quote from Samsung LSI:

“Samsung and Arm have a strong technology partnership and we are very excited to see the new direction Arm is taking with Cortex-X Custom program, enabling innovation in the Android ecosystem for next-gen user experiences.”

- Joonseok Kim, vice president of SoC design team at Samsung Electronics

It’s extremely rare to hear Samsung talk about a new Arm IP like this during a launch, and I think it’s pretty safe to say that this is very much an indirect confirmation that they’re a licensee of the X1 cores. In which case, we’ll be seeing the core in the next generation of flagship Exynos chipsets. Looking back at what happened with Samsung’s custom CPU design team last year as well as their lackluster performance of their custom cores, the very existence of the X1 probably further sealed the fate for their custom core efforts. The only remaining questions for me is whether they’ll go for a 1+3+4, or a 2+2+4 setup, and if Samsung’s 5nm will showcase better competitiveness compared to their lagging 7nm node.

Meanwhile HiSilicon, being in the middle of political turmoil, probably won't get to produce an X1 chip; plus the vendor has a tendency not always use the latest CPU IPs anyhow. MediaTek would be the last candidate licensee for the X1 – but here I’m also relatively uncertain if the company’s cost-oriented mantra actually fits well with the X1’s philosophy of going all out on area, with the likelihood that it’s also more expensive to license.

First Impressions - Arm Finally Going For Pure Performance

Today’s reveal of the Cortex-A78 and Cortex-X1 brought both the expected and the unexpected. I've had relatively modest expectations of the A78, as for years we had been told it would be the smallest upgrade amongst the new Austin family of Arm CPU microarchitectures. The A76 and A77 were after all both big leaps in performance and IPC. What I didn’t expect was for Arm to really focus on maximizing the PPA of the design, with efficiency being a first-class citizen in terms of design priorities. In that sense, the A78’s performance improvements might be a little tame compared to previous generations, but seemingly it’s still going to be an excellent core that is going to continue Arm's recent strides in outstandingly efficient computing.

Meanwhile the Cortex-X1 is a big change for Arm. And that change has less to do with the technology of the cores, and more with the business decisions that it now opens up for the company, although both are intertwined. For years many people were wondering why the company didn't design a core that could more closely compete with what Apple had built. In my view, one of the reasons for that was that Arm has always been constrained by the need to create a “one core fits all” design that could fit all of their customers’ needs – and not just the few flagship SoC designs.

The Cortex-X program here effectively unshackles Arm from these business limitations, and it allows the company to provide the best of both worlds. As a result, the A78 continues the company’s bread & butter design philosophy of power-performance-area leadership, whilst the X1 and its successors can now aim for the stars in terms of performance, without such strict area usage or power consumption limitations.

In this regard, the X1 seems really, really impressive. The 30% IPC improvement over the A77 is astounding and not something I had expected from the company this generation. The company has been incessantly beating the drum of their annual projected 20-25% improvements in performance – a pace which is currently well beyond what the competition has been able to achieve. These most recent projected performance figures are getting crazy close to the best that what we’ve seeing from the x86 players out there right now. That’s exciting for Arm, and should be worrying for the competition.

Performance & Power Projections: Best of Both Worlds
Comments Locked

192 Comments

View All Comments

  • Kurosaki - Thursday, June 25, 2020 - link

    It's the future!
  • MarcGP - Tuesday, May 26, 2020 - link

    You don't make any sense. You say at the same time :

    1) It's not a problem of the SOCs performance but of the poor emulation
    2) The emulation software is not the issue but the weak SOCs

    Make up your mind, it's one or the other, but it can't be both.
  • armchair_architect - Tuesday, May 26, 2020 - link

    This looks pretty exciting!
    What if vendors will go for 2+4 configs (like Apple) with 2 X1 + 4 A78?
    Apple has showed that this configuration is really good!
    That would be a killer combinations as the littles are practically useless in real scenarios and a slow implementation of a A78 could very well cover the low part of the cluster DVFS curve, for idle.
    A55 is super old and doesnt offer any useful performance, I suspect they are only there for the low-power scenarios.
    But again, Apple has showed that its out-of-order little cores can be super efficient when implemented at low frequency (I think they run at something like 1.7GHz peak frequency).
    I didnt read much information on X1 power, but yes it will for sure be less power efficient than A78 when both of them are running flat out at 3GHz.
But, I highly suspect that (like A78 vs A77) on the whole DVFS curve, X1 can be lower power than A78 in the performance regions in which they overlap.
    It is simply a matter of being wider and slower, this makes you more efficient. That is the Apple way, wide and slow.
    They have the example of iso-performance metric, X1 will need much lower frequencies and voltages to reach a middle-of-the-road performance point (something like ~35 SpecInt, given that projection is 47 SpecInt flat-out).
This could easily offset the intrinsic iso-frequency power deficit that the X1 brings.
  • spaceship9876 - Tuesday, May 26, 2020 - link

    I'm surprised they are still using Arm v8.2, that was released in Jan 2016.
  • Kamen Rider Blade - Tuesday, May 26, 2020 - link

    I concur.

    In September 2019, ARMv8.6-A was introduced.

    https://en.wikipedia.org/wiki/ARM_architecture#ARM...

    There's also these new instructions to add in the future:

    In May 2019, ARM announced their upcoming Scalable Vector Extension 2 (SVE2) and Transactional Memory Extension (TME)
  • eastcoast_pete - Tuesday, May 26, 2020 - link

    I might be mistaken, but aren't those wide SVEs a co-development with Fujitsu? ARM might simply not have a blanket license to use that jointly developed tech. I am rooting for wider availability of those for my own reasons (video encoding runs much faster if it can use wide extensions). And there is no such thing as too much oomph for working with videos.
  • SarahKerrigan - Tuesday, May 26, 2020 - link

    It's just taking a while for SVE to get in. Future ARM Ltd cores are likely to have it; future server chips from Hisilicon are roadmapped to as well (although at this point, all bets are off due to the ongoing difficulties between the American state and Huawei.)
  • GC2:CS - Tuesday, May 26, 2020 - link

    So Apple is finally defeated now ?
    A11 was 25% A12 15% and A13 just 20% faster than last gen. So A10 is still quite competitive today.

    This is higher gain than Apple for a third year in a row.
    The question is, how much of that is won by going to the 5nm process ? I heard it is quite advanced compared to 7nm.
  • tipoo - Tuesday, May 26, 2020 - link

    30% faster than the A77 will bring X1 closer to, but probably still under the A13. And A14 will be out by then. Wouldn't call Apple defeated yet, it's easier to make larger gains when you were at half their per core performance a few years ago...
  • MarcGP - Wednesday, May 27, 2020 - link

    30% faster in IPC gains alone (iso-process and frequency). The X1 SOCs will be much faster than that 30% over the current A77 SOCs, much closer to 50% than 30%.

Log in

Don't have an account? Sign up now