Sizing Up Servers: Intel's Skylake-SP Xeon versus AMD's EPYC 7000 - The Server CPU Battle of the Decade?
by Johan De Gelas & Ian Cutress on July 11, 2017 12:15 PM EST- Posted in
- CPUs
- AMD
- Intel
- Xeon
- Enterprise
- Skylake
- Zen
- Naples
- Skylake-SP
- EPYC
Multi-core SPEC CPU2006
For the record, we do not believe that the SPEC CPU "Rate" metric has much value for estimating server CPU performance. Most applications do not run lots of completely separate processes in parallel; there is at least some interaction between the threads. But since the benchmark below caused so much discussion, we wanted to satisfy the curiosity of our readers.
Does the EPYC7601 really have 47% more raw integer power? Let us find out. Though please note that you are looking at officially invalid base SPEC rate runs, as we still have to figure out how to tell the SPEC software that our "invalid" flag "-Ofast" is not invalid at all. We did the required 3 iterations though.
Subtest | Application type | Xeon E5-2699 v4 @ 2.8 |
Xeon 8176 @ 2.8 |
EPYC 7601 @2.7 |
EPYC Vs Broadwell EP |
EPYC vs Skylake SP |
400.perlbench | Spam filter | 1470 | 1980 | 2020 | +37% | +2% |
401.bzip2 | Compression | 860 | 1120 | 1280 | +49% | +14% |
403.gcc | Compiling | 960 | 1300 | 1400 | +46% | +8% |
429.mcf | Vehicle scheduling | 752 | 927 | 837 | +11% | -10% |
445.gobmk | Game AI | 1220 | 1500 | 1780 | +46% | +19% |
456.hmmer | Protein seq. analyses | 1220 | 1580 | 1700 | +39% | +8% |
458.sjeng | Chess | 1290 | 1570 | 1820 | +41% | +16% |
462.libquantum | Quantum sim | 545 | 870 | 1060 | +94% | +22% |
464.h264ref | Video encoding | 1790 | 2670 | 2680 | +50% | -0% |
471.omnetpp | Network sim | 625 | 756 | 705 (*) | +13% | -7% |
473.astar | Pathfinding | 749 | 976 | 1080 | +44% | +11% |
483.xalancbmk | XML processing | 1120 | 1310 | 1240 | +11% | -5% |
(*) We had to run 471.omnetpp with 64 threads on EPYC: when running at 128 threads, it gave errors. Once solved, we expect performance to be 10-20% higher.
Ok, first a disclaimer. The SPECint rate test is likely unrealistic. If you start up 88 to 128 instances, you create a massive bandwidth bottleneck and a consistent CPU load of 100%, neither of which are very realistic in most integer applications. You have no synchronization going on, so this is really the ideal case for a processor such as the AMD EPYC 7601. The rate test estimates more or less the peak integer crunching power available, ignoring many subtle scaling problems that most integer applications have.
Nevertheless, AMD's claim was not farfetched. On average, and using a "neutral" compiler with reasonable compiler settings, the AMD 7601 has about 40% (42% if you take into account that our Omnetpp score will be higher once we fixed the 128 instances issue) more "raw" integer processing power than the Xeon E5-2699 v4, and is even about 6% faster than the Xeon 8176. Don't expect those numbers to be reached in most real integer applications though. But it shows how much progress AMD has made nevertheless...
219 Comments
View All Comments
psychobriggsy - Tuesday, July 11, 2017 - link
Indeed it is a ridiculous comment, and puts the earlier crying about the older Ubuntu and GCC into context - just an Intel Fanboy.In fact Intel's core architecture is older, and GCC has been tweaked a lot for it over the years - a slightly old GCC might not get the best out of Skylake, but it will get a lot. Zen is a new core, and GCC has only recently got optimisations for it.
EasyListening - Wednesday, July 12, 2017 - link
I thought he was joking, but I didn't find it funny. So dumb.... makes me sad.blublub - Tuesday, July 11, 2017 - link
I kinda miss Infinity Fabric on my Haswell CPU and it seems to only have on die - so why is that missing on Haswell wehen Ryzen is an exact copy?blublub - Tuesday, July 11, 2017 - link
Your actually sound similar to JuanRGA at SAKevin G - Wednesday, July 12, 2017 - link
@CajunArson The cache hierarchy is radically different between these designs as well as the port arrangement for dispatch. Scheduling on Ryzen is split between execution resources where as Intel favors a unified approach.bill.rookard - Tuesday, July 11, 2017 - link
Well, that is something that could be figured out if they (anandtech) had more time with the servers. Remember, they only had a week with the AMD system, and much like many of the games and such, optimizing is a matter of run test, measure, examine results, tweak settings, rinse and repeat. Considering one of the tests took 4 hours to run, having only a week to do this testing means much of the optimization is probably left out.They went with a 'generic' set of relative optimizations in the interest of time, and these are the (very interesting) results.
CoachAub - Wednesday, July 12, 2017 - link
Benchmarks just need to be run on as level as a field as possible. Intel has controlled the market so long, software leans their way. Who was optimizing for Opteron chips in 2016-17? ;)theeldest - Tuesday, July 11, 2017 - link
The compiler used isn't meant to be the the most optimized, but instead it's trying to be representative of actual customer workloads.Most customer applications in normal datacenters (not google, aws, azure, etc) are running binaries that are many years behind on optimizations.
So, yes, they can get better performance. But using those optimizations is not representative of the market they're trying to show numbers for.
CajunArson - Tuesday, July 11, 2017 - link
That might make a tiny bit of sense if most of the benchmarks run were real-world workloads and not C-Ray or POV-Ray.The most real-world benchmark in the whole setup was the database benchmark.
coder543 - Tuesday, July 11, 2017 - link
The one benchmark that favors Intel is the "most real-world"? Absolutely, I want AnandTech to do further testing, but your comments do not sound unbiased.