AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked

Name: AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked
Item: AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked
Author: Johan De Gelas

by Johan De Gelas on August 7, 2019 7:00 PM EST

180 Comments | Add A Comment

180 Comments

Single-Thread SPEC CPU2006 Estimates

While it may have been superceded by SPEC2017, we have built up a lot of experience with SPEC CPU2006. Considering the trouble we experience with our datacenter infrastructure, it was our best first round option for raw performance analysis.

Single threaded performance continues to be very important, especially in maintainance and setup situations. These examples may include running a massive bash script, trying out a very complex SQL query, or configuring new software - there are lots of times where a user simply does not use all the cores.

Even though SPEC CPU2006 is more HPC and workstation oriented, it contains a good variety of integer workloads. It is our conviction that we should try to mimic how performance critical software is compiled instead of trying to achieve the highest scores. To that end, we:

use 64 bit gcc : by far the most used compiler on linux for integer workloads, good all round compiler that does not try to "break" benchmarks (libquantum...) or favor a certain architecture
use gcc version 7.4 and 8.3: standard compiler with Ubuntu 18.04 LTS and 19.04.
use -Ofast -fno-strict-aliasing optimization: a good balance between performance and keeping things simple
added "-std=gnu89" to the portability settings to resolve the issue that some tests will not compile

The ultimate objective is to measure performance in non-aggressively optimized"applications where for some reason – as is frequently the case – a multi-thread unfriendly task keeps us waiting. The disadvantage is there are still quite a few situations where gcc generates suboptimal code, which causes quite a stir when compared to ICC or AOCC results that are optimized to look for specific optimizations in SPEC code.

First the single threaded results. It is important to note that thanks to turbo technology, all CPUs will run at higher clock speeds than their base clock speed.

The Xeon E5-2699 v4 ("Broadwell") is capable of boosting up to 3.6 GHz. Note: these are old results compiled w GCC 5.4
The Xeon 8176 ("Skylake-SP") is capable of boosting up to 3.8 GHz.
The EPYC 7601 ("Naples") is capable of boosting up to 3.2 GHz.
The EPYC 7742 ("Rome") boosts to 3.4 GHz. Results are compiled with GCC 7.4 and 8.3

Unfortunately we could not test the Intel Xeon 8280 in time for this data. However, the Intel Xeon 8280 will deliver very similar results, the main difference being that it runs a 5% higher clock (4 GHz vs 3.8 GHz). So we basically expect the results to be 3-5% higher than the Xeon 8176.

As per SPEC licensing rules, as these results have not been officially submitted to the SPEC database, we have to declare them as Estimated Results.

Subtest	Application Type	Xeon E5-2699 v4	EPYC 7601	Xeon 8176	EPYC 7742	EPYC 7742
Frequency		3.6 GHz	3.2 GHz	3.8 GHz	3.4 GHz	3.4 GHz
Compiler		gcc 5.4	gcc 7.4	gcc 7.4	gcc 7.4	gcc 8.3
400.perlbench	Spam filter	43.4	31.1	46.4	41.3	43.7
401.bzip2	Compression	23.9	24.0	27.0	26.7	27.2
403.gcc	Compiling	23.7	35.1	31.0	42.3	42.6
429.mcf	Vehicle scheduling	44.6	40.1	40.6	39.5	39.6
445.gobmk	Game AI	28.7	24.3	27.7	32.8	32.7
456.hmmer	Protein seq.	32.3	27.9	35.6	30.3	60.5
458.sjeng	Chess	33.0	23.8	32.8	27.7	27.6
462.libquantum	Quantum sim	97.3	69.2	86.4	72.7	72.3
464.h264ref	Video encoding	58.0	50.3	64.7	62.2	60.4
471.omnetpp	Network sim	44.5	23.0	37.9	23.0	23.0
473.astar	Pathfinding	26.1	19.5	24.7	25.4	25.4
483.xalancbmk	XML processing	64.9	35.4	63.7	48.0	47.8

A SPEC CPU analysis is always complicated, being a mix of what kind of code the compiler produces and CPU architecture.

Subtest	Application type	EPYC 7742 (2nd gen) vs 7601 (1st gen)	EPYC 7742 vs Intel Xeon Scalable	Gcc 8.3 vs 7.4
400.perlbench	Spam filter	+33%	-11%	+6%
401.bzip2	Compression	+11%	-1%	+2%
403.gcc	Compiling	+21%	+28%	+1%
429.mcf	Vehicle scheduling	-1%	-3%	0%
445.gobmk	Game AI	+35%	+18%	+0%
456.hmmer	Protein seq. analyses	+9%	-15%	+100%
458.sjeng	Chess	+16%	-16%	-1%
462.libquantum	Quantum sim	+5%	-16%	-1%
464.h264ref	Video encoding	+24%	-4%	-3%
471.omnetpp	Network sim	+0%	-39%	0%
473.astar	Pathfinding	+30%	+3%	0%
483.xalancbmk	XML processing	+36%	-25%	0%

First of all, the most interesting datapoint was the fact that the code generated by gcc 8 seems to have improved vastly for the EPYC processors. We repeated the single threaded test three times, and the rate numbers show the same thing: it is very consistent.

hmmer is one of the more branch intensive benchmarks, and the other two workloads where the impact of branch prediction is higher (somewhat higher percentage of branch misses) - gobmk, sjeng - perform consistingly better on the second generation EPYC with it's new TAGE predictor.

Why the low IPC omnetpp ("network sim") does not show any improvement is a mystery to us, we expected that the larger L3 cache would help. However this is a test that loves very large caches, as a result the Intel Xeons have the advantage (38.5 - 55 MB L3).

The video encoding benchmark "h264ref" also relies somewhat on the L3 cache, but that benchmark relies much more on DRAM bandwidth. The fact that the EPYC 7002 has higher DRAM bandwidth is clearly visible.

The pointer chasing benchmarks – XML procesing and Path finding – performed less than optimal on the previous EPYC generation (compared to the Xeons), but show very significant improvements on EPYC 7002.

Latency Part Two: Beating The Prefetchers Multi-core SPEC CPU2006

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

180 Comments

View All Comments

sing_electric - Thursday, August 8, 2019 - link
Not just Netburst - remember, Intel's plans were ORIGINALLY for Itanium to migrate down through the stack, ending up in consumer machines. Two massively costly mistakes when it came to planing the future of CPUs. Honestly, I hope Intel properly compensated the team behind the P6, an architecture so good that it was essentially brought back a year after release to after those 2 failures.

OTOH, it's kind of amazing that AMD survived the Bulldozer years, since their margin for error is much smaller than Intel's. Good thing they bought ATI, since I'm not sure the company survives without the money they made from graphics cards and consoles...
JohanAnandtech - Thursday, August 8, 2019 - link
Thank you for the kudos and sympathy. It was indeed hot! At 39°C/102°F, the server was off.

I agree - I too admire the no-nonsense leadership of Lisa Su. Focus, careful execution and customer centric.
WaltC - Thursday, August 8, 2019 - link
AMD has proven once again that Intel can be beaten, and soundly, too...;) The myth of the indestructible Intel is forever shattered, and Intel's CPU architectures are so old they creak and are riddled with holes, imo. Where would Intel have put us, if there'd been no AMD? You like Rdram, you like Itanium, just for starters? You like paying through the nose? That's where Intel wanted to go in its never-ending quest to monopolize the market! AMD stopped all of that by offering an alternative path the market was happy to take--a path that didn't involve emulators and tossing out your software library just to give Intel a closed bus! Intel licensed AMD's x86-64, among other things--and they flourished when AMD dropped the ball. I chalk all that up to AMD going through a succession of horrible CEOs--people who literally had no clue! Remember the guy who ran AMD for awhile who concluded it made sense for AMD to sell Intel servers...!? Man, I thought AMD was probably done! There's just no substitute for first-class management at the top--Su was the beginning of the AMD renaissance! Finally! As a chip manufacturer, Intel will either learn how to exist in a competitive market or the company over time will simply fade away. I often get the feeling that Intel these days is more interested in the financial services markets than in the computer hardware markets. While Intel was busy milking its older architectures and raking in the dough, AMD was busy becoming a real competitor once again! What a difference the vision at the top, or the lack of it, makes.
aryonoco - Thursday, August 8, 2019 - link
That dude was Rory Read, and while the SeaMicro acquisition didn't work out, he did some great work and restructured AMD and in many ways saved the company while dealing with the Bulldozer disaster.

Rory stablized the finances of the company by lowering costs over 30%, created the semi-custom division that enabled them to win the contracts for both the Xbox and PS4, creating a stable stream of revenue. Of course Rory's greatest accomplishment was hiring Lisa Su and then grooming her to become the CEO.

Rory was a transitional CEO and he did exactly what was required of him. If there is a CEO that should be blamed for AMD's woes, it's Dirk Meyer.
aryonoco - Thursday, August 8, 2019 - link
Forgot to mention, Rory also hired Kim Keller to design K12, and in effect he started the project that would later on become Zen.

Of course Lisa deserves all the glory from then on. She has been an exceptional leader, bringing focus and excelling at execution, things that AMD always traditionally lacked.
tamalero - Sunday, August 11, 2019 - link
Id Blame Hector Ruiz first.
It was his crown to lose during the Athlon 64 era, and he simply didn't have anything to show. Making the Athlon 64 core arch a one hit wonder for more than a decade.
MarcusTaz - Wednesday, August 7, 2019 - link
Another site's article that starts with an F stated that Rome runs hot and uses 1.4 volts, above TMSC recommended 1.3 volt. Did you need to run 1.4 volts for these tests?
evernessince - Wednesday, August 7, 2019 - link
Well 1st, that 1.3v figure is from TSMC's mobile focused 7nm LPP node. Zen 2 is made on the high performance 7nm node, not the mobile focused LPP. Whatever publication you read didn't do their homework. TSMC has not published information on their high performance node and I think it rather arrogant to give AMD an F based on an assumption. As if AMD engineers are stupid enough to put dangerous voltages through their CPUs that would result in a company sinking lawsuit. It makes zero sense.

FYI all AMD 3000 series processors go up to 1.4v stock. Given that these are server processors, they will run hot. After all, more cores = more heat. It's the exact same situation for Intel server processors. The only difference here is that AMD is providing 50 - 100% more performance in the same or less power consumption at 40% less cost.
DigitalFreak - Thursday, August 8, 2019 - link
You reading Fudzilla?
Kevin G - Wednesday, August 7, 2019 - link
AMD is back. They have the performance crown again and have decided to lap the competition with what can be described as an embarrassing price/performance comparison to Intel. The only thing they need to do is be able to meet demand.

One thing I wish they would have done is added quad socket support. Due to the topology necessary, intersocket bandwidth would be a concern at higher core counts but if you just need lots of memory, those low end 8 core chips would have been fine (think memcache or bulk NVMe storage).

With the topology improvements, I also would have liked AMD to try something creative: a quad chip + low clocked/low voltage Vega 20 in the same package all linked together via Infinity Fabric. That would be something stunning for HPC compute. I do see AMD releasing some GPU in a server socket at some point for this market as things have been aligning in this direction for sometime.

Supporting something like CCIX or OpenCAPI also would have been nice. A nod toward my previous point, even enabling Infinity Fabric to Vega 20 compute cards instead of PCIe 4.0 would have been yet another big step for AMD as that'd permit full coherency between the two chips without additional overhead.

I think it would be foolish to ignore AVX-512 for Zen 3, even if the hardware they run it one continues to use 256 bit wide SIMD units. ISA parity is important even if they don't inherently show much of a performance gain (though considering the clock speed drops seen in Sky Lake-SP, if AMD could support AVX-512 at the clocks they're able to sustain at AVX2 on Zen 2, they might pull off an overall throughput win).

With regards to Intel, they have Cooper Lake due later this year. If Intel was wise, they'd use that as a means to realign their pricing structure and ditch the memory capacity premium. Everything else Intel can do in the short term is flex their strong packaging techniques and push integrated accelerators: on package fabric, FPGA, Optane DIMMs etc. Intel can occupy several lucrative niches in important, growing fields with that they have in-house right now but they need to get them to market and at competitive prices. Otherwise it is AMD's game for the next 12 to 15 months until Ice Lake-SP arrives to bring back the competitive landscape. It isn't even certain that Intel can score a clean win either as Zen 3 based chips may start to arrive in the same time frame.

AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked

Single-Thread SPEC CPU2006 Estimates

Post Your Comment

180 Comments

View All Comments

sing_electric - Thursday, August 8, 2019 - link

JohanAnandtech - Thursday, August 8, 2019 - link

WaltC - Thursday, August 8, 2019 - link

aryonoco - Thursday, August 8, 2019 - link

aryonoco - Thursday, August 8, 2019 - link

tamalero - Sunday, August 11, 2019 - link

MarcusTaz - Wednesday, August 7, 2019 - link

evernessince - Wednesday, August 7, 2019 - link

DigitalFreak - Thursday, August 8, 2019 - link

Kevin G - Wednesday, August 7, 2019 - link

Log in

Don't have an account? Sign up now