Arm Unveils Client CPU Performance Roadmap Through 2020 - Taking Intel Head Onby Andrei Frumusanu on August 16, 2018 9:05 AM EST
Today’s announcement is an oddball one for Arm as we see the first-ever public forward looking CPU IP roadmap detailing performance and power projections for the next two generations through to 2020.
Back in May we extensively covered Arm’s next generation Cortex A76 CPU IP and how it’s meant to be a game-changer in terms of providing one of the biggest generational performance jumps in the company’s recent history. The narrative in particular focused on how the A76 now brought real competition and viable alternatives to the x86 market and in particular how it would be able to offer performance equivalent to Intel’s best mobile offerings, at much lower power.
Arm sees always-connected devices with 5G connectivity as a prime opportunity for a shift in the laptop market. Qualcomm’s recent Snapdragon 835 and Snapdragon 850 platforms were the first attempts in trying to establish this new slice for Arm-based PCs.
Today’s roadmap now publicly discloses the codenames of the next two generations of CPU cores following the A76 – Deimos and Hercules. Both future cores are based on the new A76 micro-architecture and will introduce respective evolutionary refinements and incremental updates for the Austin cores.
The A76 being a 2018 product – and we should be hearing more on the first commercial devices on 7nm towards the end of the year and coming months, Deimos is its 2019 successor aiming at more wide-spread 7nm adoption. Hercules is said to be the next iteration of the microarchitecture for 2020 products and the first 5nm implementations. This is as far as Arm is willing to project in the future for today’s disclosure, as the Sophia team is working on the next big microarchitecture push, which I suspect will be the successor to Hercules in 2021.
Part of today’s announcement is Arm’s reiteration of the performance and power goals of the A76 against competing platforms from Intel. The measurement metric today was the performance of a SPECint2006 Speed run under Linux while complied under GCC7. The power metrics represent the whole SoC “TDP”, meaning CPU, interconnect and memory controllers – essentially the active platform power much in a similar way we’ve been representing smartphone mobile power in recent mobile deep-dive articles.
Here a Cortex A76 based system running at up to 3GHz is said to match the single-thread performance of an Intel Core i5-7300U running at its maximum 3.5GHz turbo operating speed, all while doing it within a TDP of less than 5W, versus “15W” for the Intel system. I’m not too happy with the power presentation done here by Arm as we kind of have an apples-and-oranges comparison; the Arm estimates here are meant to represent actual power consumption under the single-threaded SPEC workload while the Intel figures are the official TDP figures of the SKU – which obviously don’t directly apply to this scenario.
We didn’t have internal data to verify Arm’s claims as of publishing of the article, but the 15W Intel figure is naturally on the high side, given that this just the official TDP representing multi-threaded workloads – a very quick test of CB15 ST power as reported by MSR registers on an 7200U at 3.1GHz measured 9.3W package+DRAM power while an 8250U at 3.35GHz came in at 11W. I haven’t correlated SPEC power on x86 to date, but I’m expecting it on average to be less than CB15. Even if the 15W figure for the 7300U is correct, and I’m expecting something more in the range of 9-11W, Arm might be using one of Intel’s notably less efficient performance points when doing the comparison for these SKUs. Of course this doesn’t invalidate the data as efficiency for the A76 at those frequencies would also not be optimal, it’s just something to keep in mind.
It’s also interesting to see Arm scale back on the performance comparison as they’re using a 3GHz A76 as the comparison data-point – this is in contrast to the 3.3GHz maximum 5W performance point presented during TechDay. I had tried to estimate the A76’s power in mobile form-factors based on the different metrics Arm disclosed and came at an estimated 2.3W at 3GHz. Naturally Arm says “less than 5W” and they could be erring on the safe side of not over-promising – but if it had been *that* much lower, as in my estimate, we would have maybe seen even more aggressive marketing figures. In the end, until we get the first A76 devices in our hands, we won’t know for sure what the exact figures will be and at which point on the efficiency curve Arm’s projected 3GHz performance figures will end up at.
The last slide that is notable to talk about is the performance projections for Deimos and Hercules. Here Arm’s taking a direct stab at Intel’s lack of significant progress over the last few years and reiterating its confidence in the company’s ability in sustaining high CAGR (compound annual growth rate) performance figures for the next generations.
Again at TechDay we quoted figures of 20-25% while today’s announcement contained a more conservative figures of “>=15%” – likely better representing a seemingly larger 20% projected boost for Deimos as well as what seems to be a 10% gain for the 5nm follow-up Hercules. Taking into account the relative positioning of the data-points in this chart, I did some quick correlation and it matches my initial estimated performance figures for a 3GHz A76 at around ~26 SPECint2006. Deimos and Hercules would come in at figures of ~31 and ~34 points.
Finally today’s announcement is a marketing exercise attempting to emphasise Arm’s performance and power commitments over the next few generations, trying to showcase it has the strategy and technology in place to make the Arm laptop market a real growth opportunity. If and how this pans out is something that we won’t find out at least until later on in the year, with the first actual A76 based large form-factor designs not being a thing until at least sometime in 2019. We’re eagerly awaiting the first A76 based mobile designs in the months to come and to have a first hand-on evaluation of the new microarchitecture family.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Wilco1 - Thursday, August 16, 2018 - linkAll comparisons in the article are single threaded, as shown in the footnotes. For multithreaded scenarios Arm chips already win on performance just by having more, smaller cores on a more advanced process (eg. Centriq). For the target market 8 cores (4+4) will be typical.
HStewart - Thursday, August 16, 2018 - linkTalk is cheap - less see some actual real numbers - real test - which ARM show none - just a set of made up graphs with no actual proof.
Wilco1 - Thursday, August 16, 2018 - linkThe graph already shows SPECINT scores, what more do you want, Geekbench?
ZolaIII - Friday, August 17, 2018 - linkI don't doubt ARM menages to outperform X86 in integer workloads but how ever those don't scale good in SMP (actually they scale horrible). SMP scale except able good only with FP.
This actually explains performance boost we ware all wandering about hire compared to the earlier ARM's projections as this is INTEGER only.
Wilco1 - Friday, August 17, 2018 - linkImproving integer performance is much harder than floating point. It's also more important for typical mobile and laptop use since integer performance is directly related to the user experience.
Big Arm SMP CPUs do well on FP already, for example ThunderX2 beats high-end Xeons on OpenFoam. No surprise then there are several Arm-based supercomputers being built.
ZolaIII - Friday, August 17, 2018 - linkNot really. We didn't see almost any improvement on VFP unit's disregarding of architecture in last decade. SIMD's areas did see significant advancements but their is still a problem of feeding them efficiently (latency). ARM is still way behind Intel regarding SIMD's. At the end each & every program initialises on primary instruction set, which is integer & them switches to FP that's why we correlate it to user experience aka snappines. Of course we ware talking about performance per core all of this time. As the SIMD area is actually relatively small compared to the rest of the core its much better to pair larger SIMD with smaller & more power efficient core for example; six A55 will have 3x NEON FP performance of an A76 while using approximately same power.
Wilco1 - Friday, August 17, 2018 - linkThat's not true at all. Floating point performance on Arm has increased dramatically in the last decade, and that's mostly scalar FP, not SIMD. Aside from HPC, wide SIMD doesn't make sense since it's big and power hungry.
TheJian - Thursday, August 16, 2018 - linkBlah blah, wake me when you put out an 85w-120w SOC that directly takes on Intel mainstream. Full of all the trimmings, nv 1080ti, SSD, 16GB mem, etc etc,... Then you REALLY are pursuing them and their money. After that go server, use consumer to hone your skills on desktops watt/heat/pipeline levels etc. Bring on the ANDROID/LINUX/STEAMos tri-bootable boxes :)
Wilco1 - Thursday, August 16, 2018 - linkEver heard of Thunder-X2 or Centriq? Those are not just taking on Xeon but beating Xeon.
HStewart - Thursday, August 16, 2018 - linkTalk is cheap where are the actual benchmarks.