Assessing IBM's POWER8, Part 1: A Low Level Look at Little Endian
by Johan De Gelas on July 21, 2016 8:45 AM ESTSingle-Threaded Integer Performance: SPEC CPU2006
Even though SPEC CPU2006 is more HPC and workstation oriented, it contains a good variety of integer workloads. Running SPEC CPU2006 is a good way to evaluate single threaded (or core) performance. The main problem is that the results submitted are "overengineered" and it is very hard to make any fair comparisons.
For that reason, we wanted to keep the settings as "real world" as possible. So we used:
- 64 bit gcc 5.2.1: most used compiler on Linux, good all round compiler that does not try to "break" benchmarks (libquantum...)
- -Ofast: compiler optimization that many developers may use
- -fno-strict-aliasing: necessary to compile some of the subtests
- base run: every subtest is compiled in the same way.
The ultimate objective is to measure performance in applications where for some reason – as is frequently the case – a "multi-thread unfriendly" task keeps us waiting.
Here is the raw data. Perlbench failed to compile on Ubuntu 15.10, so we skipped it. Still we are proud to present you the very first SPEC CPU2006 benchmarks on Little Endian POWER8.
On the IBM server, numactl was used to physically bind the 2, 4, or 8 copies of SPEC CPU to the first 2, 4, or 8 threads of the first core. On the Intel server, the 2 copy benchmark was bound to the first core.
Subtest SPEC CPU2006 Integer |
Application Type |
IBM POWER8 10c@3.5 Single Thread |
IBM POWER8 10c@3.5 SMT-2 |
IBM POWER8 10c@3.5 SMT-4 |
IBM POWER8 10c@3.5 SMT-8 |
Xeon E5-2699 v4 2.2-3.6 |
Xeon E5-2699 v4 2.2-3.6 (+HT) |
400.perlbench | Spam filter | N/A | N/A | N/A | N/A | 32.2 | 36.6 |
401.bzip2 | Compress | 17.5 | 26.9 | 33.7 | 35.2 | 19.2 | 25.3 |
403.gcc | Compiling | 32.1 | 44.6 | 56.6 | 61.5 | 28.9 | 33.3 |
429.mcf | Vehicle scheduling | 47.1 | 50 | 64.1 | 73.5 | 39 | 43.9 |
445.gobmk | Game AI | 20.2 | 31.3 | 41.4 | 43.1 | 22.4 | 27.7 |
456.hmmer | Protein seq. analyses | 19.1 | 27.1 | 28.6 | 22.5 | 24.2 | 28.4 |
458.sjeng | Chess | 17.1 | 25.4 | 32.6 | 33.1 | 24.8 | 28.3 |
462.libquantum | Quantum sim |
44.7 | 82.1 | 109 | 108 | 59.2 | 67.3 |
464.h264ref | Video encoding | 32.7 | 45.4 | 53.3 | 48.8 | 40.7 | 40.7 |
471.omnetpp | Network sim |
23.5 | 29.1 | 37.1 | 42.5 | 23.5 | 29.9 |
473.astar | Pathfinding | 16.5 | 24.8 | 33.5 | 36.9 | 18.9 | 23.6 |
483.xalancbmk | XML processing | 24.9 | 35.3 | 44.7 | 48.4 | 35.4 | 41.8 |
First we look at how well SMT-2, SMT-4 and SMT-8 work on the IBM POWER8.
Subtest SPEC CPU2006 Integer |
Application Type |
IBM POWER8 10c@3.5 Single Thread |
IBM POWER8 10c@3.5 SMT-2 |
IBM POWER8 10c@3.5 SMT-4 |
IBM POWER8 10c@3.5 SMT-8 |
400.perlbench | Spam filter | N/A | N/A | N/A | N/A |
401.bzip2 | Compress | 100% | 154% | 193% | 201% |
403.gcc | Compiling | 100% | 139% | 176% | 192% |
429.mcf | Vehicle scheduling | 100% | 106% | 136% | 156% |
445.gobmk | Game AI | 100% | 155% | 205% | 213% |
456.hmmer | Protein seq. analyses | 100% | 142% | 150% | 118% |
458.sjeng | Chess | 100% | 149% | 191% | 194% |
462.libquantum | Quantum sim |
100% | 184% | 244% | 242% |
464.h264ref | Video encoding | 100% | 139% | 163% | 149% |
471.omnetpp | Network sim |
100% | 124% | 158% | 180% |
473.astar | Pathfinding | 100% | 150% | 203% | 224% |
483.xalancbmk | XML processing | 100% | 142% | 180% | 194% |
The performance gains from single threaded operation to two threads are very impressive, as expected. While Intel's SMT-2 offers in most subtests between 10 and 25% better performance, the dual threaded mode of the POWER8 boosts performance by 40 to 50% in most applications, or more than twice as much relative to the Xeons. Not one benchmark regresses when we throw 4 threads upon the IBM POWER8 core. The benchmarks with high IPC such as hmmer peak at SMT-4, but most subtests gain a few % when running 8 threads.
124 Comments
View All Comments
JohanAnandtech - Thursday, July 28, 2016 - link
Send me a mail at johan@anandtech.comabufrejoval - Thursday, August 4, 2016 - link
Hmm, a bit fuzzy after the first paragraph or so and evidently because I dislike malwaretizement: Such links should be banned!mystic-pokemon - Friday, July 22, 2016 - link
Hi floobitFor virtualization: powerVM and out of the box KVM (tested on Fedora 23, Ubuntu 15.04 / 15.10 / 16.04) work quite well. Xen doesn't work well or hasn't been officially tested / released.
tipoo - Thursday, July 21, 2016 - link
Fun! I was always curious about this processor.tipoo - Thursday, July 21, 2016 - link
Interesting that the L3 eDRAM not only allows them to pack in much more L3 (what was it, 3 SRAM transistors per eDRAM or something?), but it's also low latency which was a cited concern with eDARM by some people. Appears to be an unfounded fear.And then on top of that they put another large L4 eDRAM cache on.
Maybe Intel needs to play with eDRAM more...
tipoo - Thursday, July 21, 2016 - link
Lol, eDRAM, not eDARMKevin G - Thursday, July 21, 2016 - link
There was a change in how the L4 cache works from Broadwell to SkyLake on the mobile parts. The implication is that Intel was exploring the idea of a large L4 eDRAM for SkyLake-EP/EX parts. We'll see how that turns out as Intel also has explored using HMC as a cache for high bandwidth applications in Knights Landing. So either way, Intel has thus idea on there radar and we'll see how it pans out next year.tsk2k - Thursday, July 21, 2016 - link
Is it possible to run Windows on one of these?ZeDestructor - Thursday, July 21, 2016 - link
At the moment, a very solid no.That said, if enough partners ask for it and/or if the numbers make sense for Azure, MS will at the very least have a damn good look at porting Windows over.
DanNeely - Thursday, July 21, 2016 - link
It's probably just a case of doing QA and releasing it. They've sold a PPC build in the past; and maintain internal builds for a number of other CPU architectures to avoid accidentally baking x86isms into the core code.