Single-Threaded Integer Performance: SPEC CPU2006

Even though SPEC CPU2006 is more HPC and workstation oriented, it contains a good variety of integer workloads. Running SPEC CPU2006 is a good way to evaluate single threaded (or core) performance. The main problem is that the results submitted are "overengineered" and it is very hard to make any fair comparisons.

For that reason, we wanted to keep the settings as "real world" as possible. So we used:

  • 64 bit gcc 5.2.1: most used compiler on Linux, good all round compiler that does not try to "break" benchmarks (libquantum...)
  • -Ofast: compiler optimization that many developers may use
  • -fno-strict-aliasing: necessary to compile some of the subtests
  • base run: every subtest is compiled in the same way.

The ultimate objective is to measure performance in applications where for some reason – as is frequently the case – a "multi-thread unfriendly" task keeps us waiting.

Here is the raw data. Perlbench failed to compile on Ubuntu 15.10, so we skipped it. Still we are proud to present you the very first SPEC CPU2006 benchmarks on Little Endian POWER8.

On the IBM server, numactl was used to physically bind the 2, 4, or 8 copies of SPEC CPU to the first 2, 4, or 8 threads of the first core. On the Intel server, the 2 copy benchmark was bound to the first core.

Subtest
SPEC CPU2006
Integer
Application
Type
IBM POWER8
10c@3.5
Single
Thread
IBM POWER8
10c@3.5
SMT-2
IBM POWER8
10c@3.5
SMT-4
IBM POWER8
10c@3.5
SMT-8
Xeon E5-2699 v4
2.2-3.6
Xeon E5-2699 v4
2.2-3.6
(+HT)
400.perlbench Spam filter N/A N/A N/A N/A 32.2 36.6
401.bzip2 Compress 17.5 26.9 33.7 35.2 19.2 25.3
403.gcc Compiling 32.1 44.6 56.6 61.5 28.9 33.3
429.mcf Vehicle scheduling 47.1 50 64.1 73.5 39 43.9
445.gobmk Game AI 20.2 31.3 41.4 43.1 22.4 27.7
456.hmmer Protein seq. analyses 19.1 27.1 28.6 22.5 24.2 28.4
458.sjeng Chess 17.1 25.4 32.6 33.1 24.8 28.3
462.libquantum Quantum
sim
44.7 82.1 109 108 59.2 67.3
464.h264ref Video encoding 32.7 45.4 53.3 48.8 40.7 40.7
471.omnetpp Network
sim
23.5 29.1 37.1 42.5 23.5 29.9
473.astar Pathfinding 16.5 24.8 33.5 36.9 18.9 23.6
483.xalancbmk XML processing 24.9 35.3 44.7 48.4 35.4 41.8

First we look at how well SMT-2, SMT-4 and SMT-8 work on the IBM POWER8.

Subtest
SPEC CPU2006
Integer
Application
Type
IBM POWER8
10c@3.5
Single
Thread
IBM POWER8
10c@3.5
SMT-2
IBM POWER8
10c@3.5
SMT-4
IBM POWER8
10c@3.5
SMT-8
400.perlbench Spam filter N/A N/A N/A N/A
401.bzip2 Compress 100% 154% 193% 201%
403.gcc Compiling 100% 139% 176% 192%
429.mcf Vehicle scheduling 100% 106% 136% 156%
445.gobmk Game AI 100% 155% 205% 213%
456.hmmer Protein seq. analyses 100% 142% 150% 118%
458.sjeng Chess 100% 149% 191% 194%
462.libquantum Quantum
sim
100% 184% 244% 242%
464.h264ref Video encoding 100% 139% 163% 149%
471.omnetpp Network
sim
100% 124% 158% 180%
473.astar Pathfinding 100% 150% 203% 224%
483.xalancbmk XML processing 100% 142% 180% 194%

The performance gains from single threaded operation to two threads are very impressive, as expected. While Intel's SMT-2 offers in most subtests between 10 and 25% better performance, the dual threaded mode of the POWER8 boosts performance by 40 to 50% in most applications, or more than twice as much relative to the Xeons. Not one benchmark regresses when we throw 4 threads upon the IBM POWER8 core. The benchmarks with high IPC such as hmmer peak at SMT-4, but most subtests gain a few % when running 8 threads.

Memory Subsystem: Latency Measurements Multi-Threaded Integer Performance: SPEC CPU2006
Comments Locked

124 Comments

View All Comments

  • JohanAnandtech - Thursday, July 28, 2016 - link

    Ah, you will have to wait for the improved P8 which is the first Power going after HPC :-)
  • RISC is RISKY! - Tuesday, August 2, 2016 - link

    I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply.
  • RISC is RISKY! - Tuesday, August 2, 2016 - link

    of processors!
  • rootvgnet - Friday, August 12, 2016 - link

    Johan - interesting article, I enjoyed it - especially after I discovered how to get to the next page.

    As far as the comments go - 1) a good article will get a diverse response (from those with an open, read querying, mind.
    2) I agree with those who, in other words are saying: "there is no 'one size fits all'." And my gut reaction is that you are providing a level of detail that assists in determining which platform/processor "fits my need"

    Looking forward to part2.

Log in

Don't have an account? Sign up now