AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked
by Johan De Gelas on August 7, 2019 7:00 PM ESTRome CPUs: Core Counts and Frequencies
There has been little doubt that on paper Rome and the EPYC 7002 family will be a competitive product compared to Intel's Xeon Scalable when it comes to performance or performance per watt. As always, it comes down to paring which part offers the right competition. With Rome, AMD is once again attacking performance per dollar, as well as peak performance and performance per watt.
EPYC 7000 nomenclature
The naming of the CPUs is kept consistent with the previous generation.
- EPYC = Brand
- 7 = 7000 Series
- 25-74 = Dual Digit Number indicative of stack positioning / performance (non-linear)
- 1/2 = Generation
- P = Single Socket, not present in Dual Socket
AMD is introducing 19 total CPUs to the Rome family, 13 of which are aimed at the dual socket market. All CPUs have 128 PCIe 4.0 lanes available for add-in cards, and all CPUs support up to 4 TiB of DDR4-3200.
AMD EPYC 7001 & 7002 Processors (2P) | ||||||
Cores Threads |
Frequency (GHz) | L3* | TDP | Price | ||
Base | Max | |||||
EPYC 7742 | 64 / 128 | 2.25 | 3.40 | 256 MB | 225 W | $6950 |
EPYC 7702 | 64 / 128 | 2.00 | 3.35 | 256 MB | 200 W | $6450 |
EPYC 7642 | 48 / 96 | 2.30 | 3.20 | 256 MB | 225 W | $4775 |
EPYC 7552 | 48 / 96 | 2.20 | 3.30 | 192 MB | 200 W | $4025 |
EPYC 7542 | 32 / 64 | 2.90 | 3.40 | 128 MB | 225 W | $3400 |
EPYC 7502 | 32 / 64 | 2.50 | 3.35 | 128 MB | 200 W | $2600 |
EPYC 7452 | 32 / 64 | 2.35 | 3.35 | 128 MB | 155 W | $2025 |
EPYC 7402 | 24 / 48 | 2.80 | 3.35 | 128 MB | 155 W | $1783 |
EPYC 7352 | 24 / 48 | 2.30 | 3.20 | 128 MB | 180 W | $1350 |
EPYC 7302 | 16 / 32 | 3.00 | 3.30 | 128 MB | 155 W | $978 |
EPYC 7282 | 16 / 32 | 2.80 | 3.20 | 64 MB | 120 W | $650 |
EPYC 7272 | 12 / 24 | 2.90 | 3.20 | 64 MB | 155 W | $625 |
EPYC 7262 | 8 / 16 | 3.20 | 3.40 | 128 MB | 120 W | $575 |
EPYC 7252 | 8 / 16 | 3.10 | 3.20 | 64 MB | 120 W | $475 |
Select EPYC 7001 Naples CPUs | ||||||
EPYC 7601 | 32 / 64 | 2.20 | 3.20 | 64 MB | 180 W | $4200 |
EPYC 7551 | 32 / 64 | 2.00 | 3.00 | 64 MB | 180 W | >$3400 |
EPYC 7501 | 32 / 64 | 2.00 | 3.00 | 64 MB | 155 W | $3400 |
EPYC 7451 | 24 / 48 | 2.30 | 3.20 | 64 MB | 180 W | $2400 |
EPYC 7371 | 16 / 32 | 3.10 | 3.80 | 64 MB | 200 W | $1550 |
EPYC 7251 | 8 / 16 | 2.10 | 2.90 | 32 MB | 120 W | $475 |
Special CPUs worth noting listed in bold * We are awaiting full L3 cache information |
The top part is the EPYC 7742, which is the CPU we were provided for in this comparison. It is the most expensive non-custom AMD CPU ever. We will discuss whether the price is a bargain or suitable after we have done some benchmarking.
But one thing is for sure: AMD is definitely improving the performance per dollar. The real star is the 7502, as it offers 32 Zen2 cores at 2.50/3.35 GHz for $2600. This means that you get higher clocks, better cores, twice the L3, and just as much cores as the 7601 had - in other words, the 7502 is better in every way, but compared to the 7601 it comes with an impressive 40% discount ($2600 vs $4200).
There is more to it. Unlike Intel's market segmentation strategy, which makes the life of enterprise infrastructure people more complicated than it should be, AMD does not blow fuses on cheaper SKUs to create artificial 'value' for buying more expensive SKUs. The cheapest 8-core 7252 has all 128 PCIe 4.0 lanes, it supports up to 4 TB per socket, it has infinity fabric at the same speed, and includes all virtualization and security features as the best product.
Comparison to Intel
In the table below we have done a base example comparison with some of Intel's SKU list. Given that Intel is dominant in the market, prospective buyers must get a significant price bonus or significantly lower TCO before they switch to AMD.
Intel Second Gen Xeon Scalable (Cascade Lake) |
AMD Second Gen EPYC ("Rome") |
||||||||||
Cores | Freq | TDP (W) |
Price | AMD | Cores | Freq | TDP | Price | |||
Xeon Platinum 8200 | Rome | ||||||||||
8280 | M | 28 | 2.7/4.0 | 205 | $13012 | 7742 | 64 | 2.25/3.40 | 225 | $6950 | |
8280 | 28 | 2.7/4.0 | 205 | $10009 | |||||||
8276 | M | 28 | 2.2/4.0 | 165 | $11722 | 7742 | 64 | 2.25/3.40 | 225 | $6950 | |
8270 | 26 | 2.7/4.0 | 205 | $7405 | |||||||
8268 | 24 | 2.9/3.9 | 205 | $6302 | |||||||
8260 | M | 24 | 2.4/3.9 | 165 | $7705 | 7702 | 64 | 2.00/3.35 | 225 | $6450 | |
8260 | 24 | 2.4/3.9 | 165 | $4702 | 7552 | 48 | 2.20/3.50 | 200 | $4025 | ||
8253 | 16 | 2.2/3.0 | 165 | $3115 | 7502 | 32 | 2.50/3.35 | 200 | $2600 | ||
Xeon Gold 6200 | Rome | ||||||||||
6252 | 24 | 2.1/3.7 | 150 | $3665 | |||||||
6248 | 20 | 2.5/3.9 | 150 | $3072 | |||||||
6242 | 16 | 2.8/3.9 | 150 | $2529 | 7452 | 32 | 2.35/3.35 | 155 | $2025 | ||
6238 | 22 | 2.1/3.7 | 140 | $2612 | 7402 | 24 | 2.80/3.35 | 155 | $1783 | ||
6226 | 12 | 2.8/3.7 | 125 | $1776 | |||||||
Xeon Silver 4200 | Rome | ||||||||||
4216 | 16 | 2.1/3.2 | 100 | $1002 | 7282 | 16 | 2.80/3.20 | 120 | $625 | ||
4214 | 2x12 | 2.2/3.2 | 2x85 | 2x$694 | 7402P | 24 | 2.80/3.35 | 180 | $1250 |
In our comparison, we've also ignored the fact that AMD supports up to 4 TB per socket and has 128 PCIe 4.0 lanes, which it beats Intel on both fronts. While the number of people that will buy 256 GB DIMMs is minimal at best, within the error margin of the market, to us it is simply is ridiculous that Intel expect enterprise users to cough up another few thousand dollars per CPU for a model that supports 2 TB, while you get that for free from AMD.
Going on paper, especially in the high-end, Intel is completely outclassed. A 28-core Xeon 8276M has a list price of ~$12k, while AMD charges "only" $7k for more than twice as many cores. The only advantage Intel keeps is a slightly higher single threaded clock (4 GHz) and AVX-512 support. You could argue that the TDP is lower, but that has to be measured, and frankly there is a good chance that one 64 core (at 2.25-3.2 GHz) is able to keep with two Intel Xeon 8276 (2x28 cores at 2.2-2.8 GHz), while offering much lower power consumption (single socket board vs dual board, 225W vs 2x165W).
AMD is even more generous in the mid-range. The EPYC 7552 offers twice the amout of cores at higher clocks than the Xeon Platinum 8260, which is arguably one of the more popular Xeon Platinum CPUs. The same is true for the EPYC 7452, which still costs less than the Xeon Gold 6242. It is only at the very low end, that the diffences get smaller.
Single Socket
For single socket systems, AMD will offer the following five processors below. These processors mirror the specifications of the 2P counterparts, but have a P in the name and slightly different pricing.
AMD EPYC Processors (1P) | ||||||
Cores Threads |
Frequency (GHz) | L3 | TDP | Price | ||
Base | Max | |||||
EPYC 7702P | 64 / 128 | 2.00 | 3.35 | 256 MB | 200 W | $4425 |
EPYC 7502P | 32 / 64 | 2.50 | 3.35 | 128 MB | 200 W | $2300 |
EPYC 7402P | 24 / 48 | 2.80 | 3.35 | 128 MB | 200 W | $1250 |
EPYC 7302P | 16 / 32 | 3.00 | 3.30 | 128 MB | 155 W* | $825 |
EPYC 7232P | 8 / 16 | 3.10 | 3.20 | 32 MB | 120 W | $450 |
*170W TDP mode also available |
This table makes also clear how much extra frequency AMD extracted out of the 7 nm TSMC process. The sixteen core EPYC 7302P runs at 3.0 GHz with all cores, while the EPYC 7351 was limited to 2.4 GHz at the same 155W TDP.
Again, the EPYC 7502P looks like one of the best deals of the server CPU market. This SKU can offer a lot of advantages compared to the current dual socket servers. If offers very potent single thread performance (3.35 GHz boost) and a very high 2.5 GHz when all cores are used, even when running AVX2 code. Secondly, a single socket server has a lower BOM and has lower power consumption (200W) compared to a dual 16-core system. Lastly, it supports up to 1-2 TB realistically (64-128 GB DIMMs) and has ample I/O bandwidth with 128 PCIe 4.0 lanes.
180 Comments
View All Comments
AnonCPU - Friday, August 9, 2019 - link
The gain in hmmer on EPYC with GCC8 is not due to TAGE predictor.Hmmer gains a lot on EPYC only because of GCC8.
GCC8 vectorizer has been improved in GCC8 and hmmer gets vectorized heavily while it was not the case for GCC7. The same run on an Intel machine would have shown the same kind of improvement.
JohanAnandtech - Sunday, August 11, 2019 - link
Thanks, do you have a source for that? Interested in learning more!AnonCPU - Monday, August 12, 2019 - link
That should be due to the improvements on loop distribution:https://gcc.gnu.org/gcc-8/changes.html
"The classic loop nest optimization pass -ftree-loop-distribution has been improved and enabled by default at -O3 and above. It supports loop nest distribution in some restricted scenarios;"
There are also some references here in what was missing for hmmer vectorization in GCC some years ago:
https://gcc.gnu.org/ml/gcc/2017-03/msg00012.html
And a page where you can see that LLVM was missing (at least in 2015) a good loop distribution algo useful for hmmer:
https://www.phoronix.com/scan.php?page=news_item&a...
AnonCPU - Monday, August 12, 2019 - link
And more:https://community.arm.com/developer/tools-software...
just4U - Friday, August 9, 2019 - link
I guess the question to ask now is can they churn these puppies out like no tomorrow? Is the demand there? What about other Hardware? Motherboards and the like..Do they have 100 000 of these ready to go? The window of opportunity for AMD is always fleeting.. and if their going to capitalize on this they need to be able to put the product out there.
name99 - Friday, August 9, 2019 - link
No obvious reason why not. The chiplets are not large and TSMC ships 200 million Apple chips a year on essentially the same process. So yields should be there.Manufacturing the chiplet assembly also doesn't look any different from the Naples assembly (details differ, yes, but no new envelopes being pushed: no much higher frequency signals or denser traces -- the flip side to that is that there's scope there for some optimization come Milan...)
So it seems like there is nothing to obviously hold them back...
fallaha56 - Saturday, August 10, 2019 - link
Perhaps Hypertheading should be off on the Intel systems to better reflect eg Google’s reality / proper security standards now we know Intel isn’t secure?Targon - Monday, August 12, 2019 - link
That is why Google is going to be buying many Epyc based servers going forward. Mitigations do not mean a problem has been fixed.imaskar - Wednesday, August 14, 2019 - link
Why do you think AWS, GCP, Azure, etc. mitigated the vulnerabilities? They only patched Meltdown at most. All other things are too costly and hard to execute. They just don't care so much for your data. Too loose 2x cloud capacity for that? No way. And for security conscious serious customers they offer private clusters, so your workloads run on separate servers.ballsystemlord - Saturday, August 10, 2019 - link
Spelling and grammar errors:"This happened in almost every OS, and in some cases we saw reports that system administrators and others had to do quite a bit optimization work to get the best performance out of the EPYC 7001 series."
Missing "of":
"This happened in almost every OS, and in some cases we saw reports that system administrators and others had to do quite a bit of optimization work to get the best performance out of the EPYC 7001 series."
"...to us it is simply is ridiculous that Intel expect enterprise users to cough up another few thousand dollars per CPU for a model that supports 2 TB,..."
Excess "is" and missing "s":
"...to us it is simply ridiculous that Intel expects enterprise users to cough up another few thousand dollars per CPU for a model that supports 2 TB,..."
"Although the 225W TDP CPUs needs extra heatspipes and heatsinks, there are still running on air cooling..."
Excess "s" and incorrect "there",
"Although the 225W TDP CPUs need extra heatspipes and heatsinks, they're still running on air cooling..."
"The Intel L3-cache keeps latency consistingy low as long as you stay within the L3-cache."
"consistently" not "consistingy":
"The Intel L3-cache keeps latency consistently low as long as you stay within the L3-cache."
"For example keeping a large part of the index in the cache improve performance..."
Missing comma and missing "s" (you might also consider making cache plural, but you seem to be talking strictly about the L3):
"For example, keeping a large part of the index in the cache improves performance..."
"That is a real thing is shown by the fact that Intel states that the OLTP hammerDB runs 60% faster on a 28-core Intel Xeon 8280 than on EPYC 7601."
Missing "it":
"That it is a real thing is shown by the fact that Intel states that the OLTP hammerDB runs 60% faster on a 28-core Intel Xeon 8280 than on EPYC 7601."
In general, the beginning of the sentance appears quite poorly worded, how about:
"That L3 cache latency is a matter for concern is shown by the fact that Intel states that the OLTP hammerDB runs 60% faster on a 28-core Intel Xeon 8280 than on EPYC 7601."
"In NPS4, the NUMA domains are reported to software in such a way as it chiplets always access the near (2 channels) DRAM."
Missing "s":
"In NPS4, the NUMA domains are reported to software in such a way as its chiplets always access the near (2 channels) DRAM."
"The fact that the EPYC 7002 has higher DRAM bandwidth is clearly visible."
Wrong numbers (maybet you ment, series?):
"The fact that the EPYC 7742 has higher DRAM bandwidth is clearly visible."
"...but show very significant improvements on EPYC 7002."
Wrong numbers (maybet you ment, series?):
"...but show very significant improvements on EPYC 7742."
"Using older garbage collector because they happen to better at Specjbb"
Badly worded.
"Using an older garbage collector because it happens to be better at Specjbb"
"For those with little time: at the high end with socketed x86 CPUs, AMD offers you up to 50 to 100% higher performance while offering a 40% lower price."
"Up to" requires 1 metric, not 2. Try:
"For those with little time: at the high end with socketed x86 CPUs, AMD offers you from 50 up to 100% higher performance while offering a 40% lower price."