Multi-core SPEC CPU2006

For the record, we do not believe that the SPEC CPU "Rate" metric has much value for estimating server CPU performance. Most applications do not run lots of completely separate processes in parallel; there is at least some interaction between the threads. But since the benchmark below caused so much discussion, we wanted to satisfy the curiosity of our readers. 

Does the EPYC7601 really have 47% more raw integer power? Let us find out. Though please note that you are looking at officially invalid base SPEC rate runs, as we still have to figure out how to tell the SPEC software that our "invalid" flag "-Ofast" is not invalid at all. We did the required 3 iterations though. 

Subtest Application type Xeon
E5-2699 v4
@ 2.8
Xeon
8176
@ 2.8
EPYC
7601
@2.7
EPYC 
Vs
Broadwell EP
EPYC 
vs
Skylake
SP
400.perlbench Spam filter 1470 1980 2020 +37% +2%
401.bzip2 Compression 860 1120 1280 +49% +14%
403.gcc Compiling 960 1300 1400 +46% +8%
429.mcf Vehicle scheduling 752 927 837 +11% -10%
445.gobmk Game AI 1220 1500 1780 +46% +19%
456.hmmer Protein seq. analyses 1220 1580 1700 +39% +8%
458.sjeng Chess 1290 1570 1820 +41% +16%
462.libquantum Quantum sim 545 870 1060 +94% +22%
464.h264ref Video encoding 1790 2670 2680 +50% -0%
471.omnetpp Network sim 625 756 705 (*) +13% -7%
473.astar Pathfinding 749 976 1080 +44% +11%
483.xalancbmk XML processing 1120 1310 1240 +11% -5%

(*) We had to run 471.omnetpp with 64 threads on EPYC: when running at 128 threads, it gave errors. Once solved, we expect performance to be 10-20% higher. 

Ok, first a disclaimer. The SPECint rate test is likely unrealistic. If you start up 88 to 128 instances, you create a massive bandwidth bottleneck and a consistent CPU load of 100%, neither of which are very realistic in most integer applications. You have no synchronization going on, so this is really the ideal case for a processor such as the AMD EPYC 7601. The rate test estimates more or less the peak integer crunching power available, ignoring many subtle scaling problems that most integer applications have.  

Nevertheless, AMD's claim was not farfetched. On average, and using a "neutral" compiler with reasonable compiler settings, the AMD 7601 has about 40% (42% if you take into account that our Omnetpp score will be higher once we fixed the 128 instances issue) more "raw" integer processing power than the Xeon E5-2699 v4, and is even about 6% faster than the Xeon 8176. Don't expect those numbers to be reached in most real integer applications though. But it shows how much progress AMD has made nevertheless...

SMT Integer Performance With SPEC CPU2006 Multi-Threaded Integer Performance
Comments Locked

219 Comments

View All Comments

  • JohanAnandtech - Friday, July 21, 2017 - link

    Thanks! It is was a challenge, and we will update this article later on, when better kernel support is available.
  • serendip - Tuesday, July 11, 2017 - link

    What idiot marketroid thought it was cool to have a huge list of SKUs and gimped "precious metals" branding? I'd like to see Epyc kicking Xeon butt simply because AMD has much more sensible product lists and there's not much gimping going on.
  • ParanoidFactoid - Tuesday, July 11, 2017 - link

    Reading through this, the takeaway seems thus. Epyc has latency concerns in communicating between CCX blocks, though this is true of all NUMA systems. If your application is latency sensitive, you either want a kernel that can dynamically migrate threads to keep them close to their memory channel - with an exposed API so applications can request migration. (Linux could easily do this, good luck convincing MS). OR, you take the hit. OR, you buy a monolithic die Intel solution for much more capital outlay. Further, the takeaway on Intel is, they have the better technology. But their market segmentation strategy is so confusing, and so limiting, it's near impossible to determine best cost/performance for your application. So you wind up spending more than expected anyway. AMD is much more open and clear about what they can and can't do. Intel expects to make their money by obfuscating as part of their marketing strategy. Finally, Intel can go 8 socket, so if you need that - say, high core low latency securities trading - they're the only game in town. Sun, Silicon Graphics, and IBM have all ceded that market.
  • msroadkill612 - Wednesday, July 12, 2017 - link

    "it's near impossible to determine best cost/performance for your application. So you wind up spending more than expected anyway. AMD is much more open and clear about what they can and can't do. Intel expects to make their money by obfuscating as part of their marketing strategy.

    Finally, Intel can go 8 socket, so if you need that - say, high core low latency securities trading - they're the only game in town. Sun, Silicon Graphics, and IBM have all ceded that market."

    & given time is money, & intelwastes customers time, then intel is expensive.

    Those guys will go intel anyway, but just sayin, there is already talk of a 48 core zen cpu, making 98 cores on a mere 2p mobo.

    As i have posted b4, if wall street starts liking gpu compute for prompter answers, amdS monster apuS will be unanswerable.
  • nils_ - Wednesday, July 19, 2017 - link

    98 cores on a 2p mobo isn't quite right if you keep in mind that the 32 core versions already constitute a 4 CPU system, unless AMD somehow manages to get more cores on a single die.
  • nils_ - Wednesday, July 19, 2017 - link

    Good analysis, although Sun and IBM are still coming out with new CPUs and at least with IBM there is renewed interest in the POWER ecosystem.
  • eek2121 - Wednesday, July 12, 2017 - link

    , but rather AMD's spanking new EPYC server CPU. Both CPUs are without a doubt very different: micro architecture, ISA extentions, <snip>

    Should be extensions.
  • intelemployee2012 - Wednesday, July 12, 2017 - link

    After looking at the number of people who really do not fully understand the entire architecture and workloads and thinking that AMD Naples is superior because it has more cores, pci lanes etc is surprising.
    AMD made a 32 core server by gluing four 8core desktop dies whereas Intel has a single die balanced datacenter specific architecture which offers more perf if you make the entire Rack comparison. It's not the no of cores its the entire Rack which matters.
    Intel cores are superior than AMD so a 28 core xeon is equal to ~40 cores if you compare again Ryzen core so this whole 28core vs 32core is a marketing trick. Everyone thinks Intel is expensive but if you go by performance per dollar Intel has a cheaper option at every price point to match Naples without compromising perf/dollar.
    To be honest with so many Fabs, don't you think Intel is capable of gluing desktop dies to create a 32core,64core or evn 128core server (if it wants to) if thats the implementation style it needs to adopt like AMD?
    The problem these days is layman looks at just numbers but that's not how you compare.
  • sharath.naik - Wednesday, July 12, 2017 - link

    Agree, Most who look at these numbers will walk away thinking AMD is doing well with EPYC. The article points out the approach to testing and also states the performance challenges with EPYC, which can be missed who reading this review without the prior review on the older Xeons. For example the Big data test, I bet the newbies will walk away thinking EPYC beats the older XEONS E5 v4, as thats what the graphs show,without ever looking back at the numbers for a single 22 core Xeon e5 v4. So yes, a few back links in the article will be helpful.
  • warreo - Wednesday, July 12, 2017 - link

    Not a fanboi of either company, but care to elaborate more? I checked the original Xeon E5 v4 review. It shows that a single Xeon E5 v4 performs about 10% slower than a dual setup. Extrapolating that here, that means the single Xeon E5 v4 setup would be right around 4.5 jobs per day, which would make it roughly 50% slower than the dual Epyc and Xeon 8176.

    Sure, you could argue perf/dollar is better against a dual Epyc setup...but one could make the same argument against Intel's Skylake Xeons? I also wouldn't expect the performance to scale linearly anyway. Please let me know what I'm missing.

Log in

Don't have an account? Sign up now