AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance

Name: AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance
Item: AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance

by Dr. Ian Cutress & Andrei Frumusanu on March 15, 2021 11:00 AM EST

120 Comments | Add A Comment

120 Comments

Disclaimer June 25^th: The benchmark figures in this review have been superseded by our second follow-up Milan review article, where we observe improved performance figures on a production platform compared to AMD’s reference system in this piece.

Conclusion: AMD's Return to High Performance Compute

On the crest of this launch, AMD has showcased that it can supply enterprise processors to the market again. After the decade or more of its Opteron brand being successful, and then fading away, the EPYC product lines have delineated a clear roadmap from AMD to re-enter this space. Back in the launch of the first generation EPYC, in June 2017, AMD promised an ambitious three year roadmap involving significant performance improvements and a return to the high-performance x86 compute space, culminating in today’s launch.

The goal throughout that time was to bring customers back into the fold – to show them that AMD has ambitious roadmaps and that the company can execute and deliver, while offering a competitive market. As a result, AMD’s lead OEM partners are now doing sizeable volume, over 10% market share, and AMD is scoring big wins in major computing contracts such as two thirds of the US exascale systems, such as Frontier and El Capitan. Frontier, as we learned in our interview with AMD’s Forrest Norrod, is using a custom EPYC Milan based processor called ‘Trento’, while El Capitan will be designed with the next generation EPYC after Milan, called Genoa.

Two Sides of a Coin

Milan is really an evolution and iteration of the design principles that made Rome, with the new chip being defined by its use of the newer Zen3 microarchitecture and chiplet design, including larger characteristic changes such as the new unified 32MB L3 cache shared amongst 8 cores in a single CCX/CCD. Where we see the direct results of these new improvements is in great uplifts in single-threaded and per-thread performance, with figures routinely reaching +20-25% in a wide variety of workloads. The new Milan parts have cores that better take advantage of the larger caches, and higher boost frequencies across the whole stack means that per-core performance has seen big gains.

Particularly new chips such as the EPYC 75F3 with 32 cores and 4 GHz boost are offering very unique differentiation compared to anything else in the market right now, and AMD is sure to gain a lot of success in use-cases which either are limited by per-core software licensing or have service-level-agreements and require higher per-core performance than delivered by the higher density core SKUs.

Where things aren’t quite as positive is in the generational peak performance metrics under full load of all cores. The problem here seems to be generational regressions on the power consumption of the 'un-core' parts of Milan, i.e everything that isn't the core – meaning most likely the new faster IOD, or possibly the new L3 cache design, is increasing the base power. This means idle power is higher, and power available to the cores (at full load) falls behind, decreasing socket efficiency compared to Rome. So, while AMD has invested into doing a smaller redesign of the IOD in Milan to achieve better latencies and higher memory performance, it has come at a cost of socket efficiency and performance for some of the parts. There’s no real silver lining here to the situation, and it's easily Milan’s glass jaw that hinders it from achieving even better performance.

For the future, if Genoa is able to ditch the 14nm IOD in favour of a more modern process node, and employ advanced packaging technologies such as X3D, and more efficient power management, even a 50 W reduction in power on the part of the un-core parts would actually signify a +50% increase in the power envelope available for the cores, as well as help AMD enable lower total power offerings below 155 W on the latest generation core.

AMD Retains x86 Performance Leadership

From a competitive standpoint, Milan continues to strengthen and maintain a very stark one-sided performance advantage against its biggest competitor, Intel. Rome had already offered more raw socket performance than the best Intel had to offer at the time, and the gap is currently quite large as Intel has not updated in that time. Intel has stated that its Ice Lake Xeon-SP family will come sometime soon, however unless Intel manages to close the core count gap, then AMD looks to be in very good shape.

Meanwhile, as AMD is focused on Intel, the Arm competition has also entered the market with force through 2020, and designs such as the Ampere Altra are able to outperform the new top Milan SKUs in many throughput-bound workloads. AMD still has very clear advantages, such as much superior memory performance through huge caches, or vastly superior per-thread performance with specialised dedicated SKUs. Still, it leaves AMD in a spot as they can’t claim to be the outright performance leader under every scenario, and offers another generational target to consider as it develops future cores.

AMD sets its own bar quite high with Milan - by aggressively emphasising its performance gains in the middle of the product stack, the general enterprise market will look on these parts very favorably. There is always room for improvement, but if AMD equip themselves with a good IO update next generation, EPYC could stand to gain better-than-generational performance in the future. But as it stands, the product is a very solid offering in light of the competition in the market.

Compiling LLVM, NAMD Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

120 Comments

View All Comments

Oxford Guy - Tuesday, April 6, 2021 - link
PSP, as far as I know.
Linustechtips12#6900xt - Monday, March 15, 2021 - link
I understand that "zen" architecture is for x86 but with modifications could it be transplanted to the ARM instruction set, as i see it, it definitely could so the real question is when will the transition really start i think around the theoretical zen 5th gen or 6th gen, theres gonna be a lot of arm around here especially with apple. and yes it will defenitly start wiht servers it always does.
Gomez Addams - Monday, March 15, 2021 - link
There are really two things at work : the instruction set of the processor and its topology. AMD has been improving both quite a bit. The instruction set enhancements won't transfer quite so well to ARM but the topology certainly can. Since ARM processors are much smaller, they could probably work in chiplets with possibly 32 cores in each or maybe 16 cores and 4-way SMT. That could make for a very impressive server processor. Four chiplets would give 64 cores and 256 threads. Yikes!
rahvin - Monday, March 15, 2021 - link
So much wrong.
mode_13h - Monday, March 15, 2021 - link
There are pieces of it that can be reused (on the same manufacturing node, at least), but making a truly-competitive ARM chip is probably going to involve some serious tinkering with the pipeline stages & architecture. And there are significant parts of an x86 chip that you'd have to throw out and redo, most notably the instruction decoder.

In all, it's a different core that you're talking about. Not like CPU vs. GPU level of difference, but it's a lot more than just cosmetics.
coder543 - Monday, March 15, 2021 - link
"For this launch, both the 16-core F and 24-core F have the same TDP, so the only reason I can think of for AMD to have a higher price on the 16-core processor is that it only has 2 cores per chiplet active, rather than three? Perhaps it is easier to bin a processor with an even number of cores active."

If I were to speculate, I would strongly guess that the actual reason is licensing. AMD knows that more people are going to want the 16 core CPUs in order to fit into certain brackets of software licensing, so AMD charges more for those to maximize profit and availability of the 16 core parts. For those customers, moving to a 24 core processor would probably mean paying *significantly* more for whatever software they're licensing.
SarahKerrigan - Monday, March 15, 2021 - link
Yep.

Intel sold quad-core Xeon E7's for impressively high prices for a similar reason.
Mikewind Dale - Monday, March 15, 2021 - link
Why couldn't you run a 16 core software license on a 24 core CPU? I run a 4 core licensed version of Stata MP on an 8 core Ryzen just fine.
Ithaqua - Monday, March 15, 2021 - link
Compliance and lawsuits.
You have to pay for all the cores you use for some software.

Yes if you're only running 4 cores on your 8 core Ryzen then your fine but Stata MP is using all 8, there could be a lawsuit.

Now for you I'm sure they wouldn't care. For a larger firm with 10,000+ machines, then that's going to be a big lawsuit.
arashi - Wednesday, March 17, 2021 - link
Some licenses charge for ALL cores, regardless of how many cores you would actually be using.

AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance

Conclusion: AMD's Return to High Performance Compute

Two Sides of a Coin

AMD Retains x86 Performance Leadership

Post Your Comment

120 Comments

View All Comments

Oxford Guy - Tuesday, April 6, 2021 - link

Linustechtips12#6900xt - Monday, March 15, 2021 - link

Gomez Addams - Monday, March 15, 2021 - link

rahvin - Monday, March 15, 2021 - link

mode_13h - Monday, March 15, 2021 - link

coder543 - Monday, March 15, 2021 - link

SarahKerrigan - Monday, March 15, 2021 - link

Mikewind Dale - Monday, March 15, 2021 - link

Ithaqua - Monday, March 15, 2021 - link

arashi - Wednesday, March 17, 2021 - link

Log in

Don't have an account? Sign up now