SMT Integer Performance With SPEC CPU2006

Next, to test the performance impact of simultaneous multithreading (SMT) on a single core, we test with two threads on the same core. This way we can evaluate how well the core handles SMT. 

Subtest Application type Xeon E5-2690 @ 3.8 Xeon E5-2690 v3 @ 3.5 Xeon E5-2699 v4 @ 3.6 EPYC 7601 @3.2 Xeon 8176 @ 3.8
400.perlbench Spam filter 39.8 43.9 47.2 40.6 55.2
401.bzip2 Compression 32.6 32.3 32.8 33.9 34.8
403.gcc Compiling 40.7 43.8 32.5 41.6 32.1
429.mcf Vehicle scheduling 44.7 51.3 55.8 44.2 56.6
445.gobmk Game AI 36.6 35.9 38.1 36.4 39.4
456.hmmer Protein seq. analyses 32.5 34.1 40.9 34.9 44.3
458.sjeng Chess 36.4 36.9 39.5 36 41.9
462.libquantum Quantum sim 75 73.4 89 89.2 91.7
464.h264ref Video encoding 52.4 58.2 58.5 56.1 75.3
471.omnetpp Network sim 25.4 30.4 48.5 26.6 42.1
473.astar Pathfinding 31.4 33.6 36.6 29 37.5
483.xalancbmk XML processing 43.7 53.7 78.2 37.8 78

Now on a percentage basis versus the single-threaded results, so that we can see how much performance we gained from enabling SMT:

Subtest Application type Xeon E5-2699 v4 @ 3.6 EPYC 7601 @3.2 Xeon 8176 @ 3.8
400.perlbench Spam filter 109% 131% 110%
401.bzip2 Compression 137% 141% 128%
403.gcc Compiling 137% 119% 131%
429.mcf Vehicle scheduling 125% 110% 131%
445.gobmk Game AI 125% 150% 127%
456.hmmer Protein seq. analyses 127% 125% 125%
458.sjeng Chess 120% 151% 125%
462.libquantum Quantum sim 91% 129% 90%
464.h264ref Video encoding 101% 112% 112%
471.omnetpp Network sim 109% 116% 103%
473.astar Pathfinding 140% 149% 137%
483.xalancbmk XML processing 120% 107% 116%

On average, both Xeons pick up about 20% due to SMT (Hyperthreading). The EPYC 7601 improved by even more: it gets a 28% boost on average. There are many possible explanations for this, but two are the most likely. In the situation where AMD's single threaded IPC is very low because it is waiting on the high latency of a further away L3-cache (>8 MB), a second thread makes sure that the CPU resources can be put to better use (like compression, the network sim). Secondly, we saw that AMD core is capable of extracting more memory bandwidth in lightly threaded scenarios. This might help in the benchmarks that stress the DRAM (like video encoding, quantum sim). 

Nevertheless, kudos to the AMD engineers. Their first SMT implementation is very well done and offers a tangible throughput increase. 

Single Threaded Integer Performance: SPEC CPU2006 Multi-core SPEC CPU2006
Comments Locked

219 Comments

View All Comments

  • msroadkill612 - Wednesday, July 12, 2017 - link

    It looks interesting. Do u have a point?

    Are you saying they have a place in this epyc debate? using cheaper ddr3 ram on epyc?
  • yuhong - Friday, July 14, 2017 - link

    "We were told from Intel that ‘only 0.5% of the market actually uses those quad ranked and LR DRAMs’, "
  • intelemployee2012 - Wednesday, July 12, 2017 - link

    what kind of a forum and website is this? we can't delete the account, cannot edit a comment for fixing typos, cannot edit username, cannot contact an admin if we need to report something. Will never use these websites from now on.
  • Ryan Smith - Wednesday, July 12, 2017 - link

    "what kind of a forum and website is this?"

    The basic kind. It's not meant to be a replacement for forums, but rather a way to comment on the article. Deleting/editing comments is specifically not supported to prevent people from pulling Reddit-style shenanigans. The idea is that you post once, and you post something meaningful.

    As for any other issues you may have, you are welcome to contact me directly.
  • Ranger1065 - Thursday, July 13, 2017 - link

    That's a relief :)
  • iwod - Wednesday, July 12, 2017 - link

    I cant believe what i just read. While I knew Zen was good for Desktop, i expected the battle to be in Intel's flavour on the Server since Intel has years to tune and work on those workload. But instead, we have a much CHEAPER AMD CPU that perform Better / Same or Slightly worst in several cases, using much LOWER Energy during workload, while using a not as advance 14nm node compared to Intel!

    And NO words on stability problems from running these test on AMD. This is like Athlon 64 all over again!
  • pSupaNova - Wednesday, July 12, 2017 - link

    Yes it is.

    But this time much worse for Intel with their manufacturing lead shrinking along with their workforce.
  • Shankar1962 - Wednesday, July 12, 2017 - link

    Competition has spoiled the naming convention Intels 14 === competetions 7 or 10
    Intel publicly challenged everyone to revisit the metrics and no one responded
    Can we discuss the yield density and scaling metrics? Intel used to maintain 2year lead now grew that to 3-4year lead
    Because its vertically integrated company it looks like Intel vs rest of the world and yet their revenue profits grow year over year
  • iwod - Thursday, July 13, 2017 - link

    Grew to 3 - 4 years? Intel is shipping 10nm early next year in some laptop segment, TSMC is shipping 7nm Apple SoC in 200M yearly unit quantity starting next September.

    If anything the gap from 2 - 3 years is now shrink to 1 to 1.5 year.
  • Shankar1962 - Thursday, July 13, 2017 - link

    Yeah 1-1.5 years if we cheat the metrics when comparison
    2-3years if we look at metrics accurately
    A process node shrink is compared by metrics like yield cost scaling density etc
    7nm 10nm etc is just a name

Log in

Don't have an account? Sign up now