Hot Chips 2018: Samsung’s Exynos-M3 CPU Architecture Deep Dive

Name: Hot Chips 2018: Samsung’s Exynos-M3 CPU Architecture Deep Dive
Item: Hot Chips 2018: Samsung’s Exynos-M3 CPU Architecture Deep Dive
Author: Andrei Frumusanu

by Andrei Frumusanu on August 20, 2018 1:00 PM EST

45 Comments | Add A Comment

45 Comments

Samsung's Future Strategy & Conclusion

Lastly, Samsung talks more about the project’s timelines and how things got put into motion. As we discussed in the introduction, the M3’s planning was kicked off in 2Q14 with RTL start in 1Q15, following the completion of the M1. During this time, Samsung had a change of cadence and planning, forking a subset of features originally planned for the M3 and putting them in into the M2 in 3Q15. Here the original M3’s plans were revised for a bigger microarchitecture push for performance in the first quarter of 2016.

The RTL was handed off to the SoC team in 1Q2017 for the first EVT0 tapeout of the Exynos 9810. It’s to be noted that actual production silicon is EVT1, whose tape-out happened sometime in the middle of 2017. Finally the Exynos 9810 saw commercial availability in March 2018.

The M3 is said to have been quite a sweat-breaking effort for the design team, having to have gone through what seems to have been a major replanning of the project, and having to deal with extreme time-pressure in terms of hitting a hard deadline in order to make it into the next generation product.

Here it seems Samsung still left a lot of improvements on the table that didn’t make it into the M3 due to time constraints. The cache hierarchy in particular what seems to be one of the weaker parts of the microarchitecture, which is something Samsung admits they aren’t quite happy with. It was one of the features that pushed the design team hard in order to make out the door in time.

One aspect that Samsung didn’t, and wasn’t willing to talk about, is any kind of physical implementation details. As HotChips is a microarchitecture forum, the disclosures were kept to the µarch of the M3. As we’ve seen in the past, a single microarchitecture can end up quite differently in its performance and power characteristics when implemented differently by vendors. Taking this into account, when measuring the end-product, it’s hard to separate these intertwined aspects of a piece of silicon.

The M3 seems like an overall solid microarchitecture which feels a lot more like what we see in desktop grade products. It also feels like Samsung took a more straightforward approach in terms of leveraging performance out of the µarch – in many aspects it’s just a much bigger beast than what we see from Arm – and thus also explains the M3’s quite large silicon size.

When evaluating efficiency of a piece of IP, looking at the higher-level microarchitecture is not enough, and here aspects of the actual electrical engineering of the transistor structures and details in their design choices can easily outweigh any apparent higher level characteristics. Here we’re largely out of our depth and no vendors will really ever make disclosures of such detail, not to mention it would be vastly out of scope for public readerships.

Here the final slide contains probably the most revealing disclosure that gives us a glimpse of Samsung’s future strategy: the SARC design team is said to now be on a strong annual release cadence with continued improvements every year. Indeed when I was making comparisons between the M3 and A76 in asking about some different design choices and specifications, Samsung didn’t shy in reminding me that the real competition for Arm’s new core will be next year’s new Exynos M4, not the M3.

We’ve only had two generational improvements released to date, but with 20% and 59% in IPC gains for the M2 and M3, Samsung does post an albeit short, but very robust track-record. Only just days ago Arm publicly announced its performance core roadmap through to 2020, revealing A76 successors Deimos and Hercules, promising ~15% and 10% generational gains. Here the M3 already seems to match or exceed the A76 in projected performance (at least in SPEC2006), so depending on the power efficiency of the M4, we might finally see a competitive advantage pay-off for Samsung’s custom designs.

Overall, we do thank Samsung for doing microarchitectural disclosures such as seen today, as beyond Arm’s own products, they are a quite a rare event in the ever so secretive industry. Here’s to hoping S.LSI and SARC resolve the Exynos 9810 and M3’s weaknesses and are pushing hard to make next year’s SoC a larger success. We’ll definitely looking forward to get our hands on it!

Physical Layout & Performance Figures

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

45 Comments

View All Comments

eastcoast_pete - Monday, August 20, 2018 - link
Thanks Andrei, I get that the CPU design teams are not in charge of the software. Still, I imagine that as a member of the CPU design team, I would have had some very unkind words for the software guys (and gals) who made quite a mess and made the CPU look bad. Regarding the apparently pretty strict division between even low-level software and hardware at Samsung: Do you think that is part of the problem? Even the best micro-arch can only work as well as the software that runs it allows for. Don't micro-arch + low-level software teams usually work closely together starting at the design stage? How is that handled at Intel, AMD, Qualcomm, Nvidia?
Wardrive86 - Monday, August 20, 2018 - link
The flops you stated are double precision? 12 SP Flops/clock
Wardrive86 - Monday, August 20, 2018 - link
Is there only one 128 bit NEON unit in the M3?
Andrei Frumusanu - Tuesday, August 21, 2018 - link
All of them are 128b. It's single precision Flops.
Wardrive86 - Tuesday, August 21, 2018 - link
Thank you for your response. I suppose I should have asked are there 3 128bit (6 64 bit ALU) NEON units? Is the FPU VFPv5?
Wardrive86 - Tuesday, August 21, 2018 - link
Ah NVM didn't see the SIMD blocks below the FMAC blocks, my bad. Should be able to Vector FMA right up to 24 SP flops/clock in theory/never in actual workloads. What a beast!!
Trifrost - Tuesday, August 21, 2018 - link
NEON is a 128 bit SIMD viewed as 2x64 bit ALUs. It looks like 3x64 bit ALUs if you compare to the M1 block diagram. Max 12 flops if that is true
bobcov - Tuesday, August 21, 2018 - link
This article desperately needs an editor. Could not take it seriously enough to finish reading it. "Productised?" Really? What's next, "seriousity?"
Andrei Frumusanu - Tuesday, August 21, 2018 - link
That's literally the term taken out of the presentation, furthermore;

https://dictionary.cambridge.org/dictionary/englis...
https://en.oxforddictionaries.com/definition/produ...
overzealot - Tuesday, August 21, 2018 - link
Great article, as always. Heavy on the technical aspects, just like we like it.
He's not wrong about the fact that it would benefit from an editor, though. You'd get some easy wins by passing it through a grammar checker if there's no-one available to proof read your articles.
Also, if the page used a font where you can differentiate between lower case L and capitol i (l/I) it would make a lot of terms easier to parse.

While I was reading I made a list of text replacements that would improve readability.
The list is way too large for a comment field, so I'm sending it via email.

Hot Chips 2018: Samsung’s Exynos-M3 CPU Architecture Deep Dive

Samsung's Future Strategy & Conclusion

Post Your Comment

45 Comments

View All Comments

eastcoast_pete - Monday, August 20, 2018 - link

Wardrive86 - Monday, August 20, 2018 - link

Wardrive86 - Monday, August 20, 2018 - link

Andrei Frumusanu - Tuesday, August 21, 2018 - link

Wardrive86 - Tuesday, August 21, 2018 - link

Wardrive86 - Tuesday, August 21, 2018 - link

Trifrost - Tuesday, August 21, 2018 - link

bobcov - Tuesday, August 21, 2018 - link

Andrei Frumusanu - Tuesday, August 21, 2018 - link

overzealot - Tuesday, August 21, 2018 - link

Log in

Don't have an account? Sign up now