Assessing IBM's POWER8, Part 2: Server Applications on OpenPOWER
by Johan De Gelas on September 15, 2016 8:01 AM ESTCPU Choices
For this article, the only current-generation Intel Broadwell-EP processors we had in the lab were the Xeon E5-2699 v4 and Xeon E5-2650 v4. Comparing the IBM POWER8 with the former was not fair: the Xeon costs almost 3 times ($4115) more than the midrange POWER8 chip ($1500). The latter was not an option either with a TDP of 90W. There are no Intel chips with 190W TDP, so we had to compromise.
The most comparable CPU that was available to us was the Xeon E5-2690 v3. It is a higher end midrange Intel SKU (135W TDP) that came out around the same time as the POWER8. If the 190W TDP POWER8 cannot beat this 135W TDP chip, IBM's micro architects have not done a very good job. Don't let the 2.6 GHz label fool you: this Haswell Xeon can boost to 3.1 GHz when all cores are active and to 3.5 GHz in a single thread situation. So it does have 2 cores extra and similar clockspeeds.
However we can't ignore the current-generation Broadwell-EP entirely. To get a better idea how the midrange POWER8 compares to the latest Xeons, we had to add another midrange Xeon E5 v4 SKU. So we only enabled 14 of the 22-cores of the Xeon E5-2699 v4. This gives us a chip that is somewhere between the Xeon E5-2660 v4 (14 cores at 2 GHz) and E5-2680 v4 (14 cores at 2.4 GHz). Well, at least on paper. The Xeon E5-2680 v4 runs most of the time at 2.9 GHz in heavily multi-threaded situations (+5 steps, all cores active), while our Xeon E5-2699 v4 with 14 cores runs at 2.8 GHz (+6 turbo steps). As the TDP of the latter is higher, the turbo clock will be used for a higher percentage of the time. Bottom line, our Xeon E5-2699 v4 with 14 cores is very similar to an E5-2680 v4 with a 145 W TDP. As the Xeon E5-2680 costs around $1745, it is in the right price range. From a price/performance point of perspective that is as fair as we can get it.
For those looking to get the best performance per watt: we'll save you some time and tell you that it does not get any better than the Xeon E5-2600 v4 series. Intel really went all the way to make sure that the Broadwell EP Xeon is a power sipper. And although the performance step is small, the Xeon E5-2600 v4 consumes much less than a similar Xeon E5 v3 SKU, let alone a CPU with a 190W TDP (+ 60-80W memory buffers).
Benchmark Configuration and Methodology
Our testing was conducted on Ubuntu Server 15.10 (kernel 4.2.0) with gcc compiler version 5.2.1. The reason why we did not update was that we only got everything working with that version.
Last but not least, we want to note how the performance graphs have been color-coded. Orange is for used for the review POWER8 CPU. The latest generation of the Intel Xeon (v4) gets dark blue, the previous one (v3) gets light blue. Older Xeon generations are colored with the default gray.
IBM S812LC (2U)
The IBM S812LC is based up on Tyan's "Habanero" platform. The board inside the IBM server is thus designed by Tyan.
CPU | One IBM POWER8 2.92 GHz (up to 3.5 GHz Turbo) |
RAM | 256 GB (16x16GB) DDR3-1333 |
Internal Disks | 2x Samsung 850Pro 960 GB |
Motherboard | Tyan SP012 |
PSU | Delta Electronics DSP-1200AB 1200W |
Intel's Xeon E5 Server – S2600WT (2U Chassis)
CPU | One Intel Xeon processor E5-2699 v4 (2.2 GHz, 22c, 55MB L3, 145W) One "simulated" Intel Xeon processor E5-2680 v4 (2.2 GHz, 14c, 35MB L3, 145W) One Intel Xeon processor E5-2699 v3 (2.3 GHz, 18c, 45MB L3, 145W) One Intel Xeon processor E5-2690 v3 (3.2 GHz, 8c, 20MB L3, 135W) |
RAM | 128 GB (8x16GB) Kingston DDR4-2400 or 256 GB (8x 32GB) Hynix DDR4-2133 |
Internal Disks | 2x Samsung 850Pro 960 GB |
Motherboard | Intel Server Board Wildcat Pass |
PSU | Delta Electronics 750W DPS-750XB A (80+ Platinum) |
All C-states are enabled in the BIOS.
SuperMicro 6027R-73DARF (2U Chassis)
CPU | Two Intel Xeon processor E5-2697 v2 (2.7GHz, 12c, 30MB L3, 130W) |
RAM | 128GB (8x16GB) Samsung at 1866 MHz |
Internal Disks | 2x Intel SSD3500 400GB |
Motherboard | SuperMicro X9DRD-7LN4F |
PSU | Supermicro 740W PWS-741P-1R (80+ Platinum) |
All C-states are enabled in the BIOS.
Other Notes
Both servers are fed by a standard European 230V (16 Amps max.) power line. The room temperature is monitored and kept at 23°C by our Airwell CRACs.
49 Comments
View All Comments
JohanAnandtech - Sunday, September 25, 2016 - link
Thanks Jesper. Looks like I will have to spend even more time on that system :-). And indeed, out of the box performance is important if IBM ever wants to get a piece of the x86 market.luminarian - Thursday, September 15, 2016 - link
It was my understanding that the SMT mode on the power8 could be changed. Depending on the type of work this would make a giant difference, especially with mysql/mariadb that are limited to 1 process/thread per connection.With databases the real winner would be with one that supports parallel queries, such as postgresql 9.6, db2, oracle, etc.
Also yer bench mark very easily could be limiting the power8 if its not opening enough connections to fill out the number of threads that thing can handle, remember mysql/mariaDB are 1 process/thread per connection. Alot of database bench marks default to a small number of connections, this thing has 160 threads with the dual 10 core. I would suggest trying to run that same benchmark again but do it at the same time from multiple client machines. See if the bench takes a larger dip when a second client machine runs the same bench or if the bench shows similar figures(granted this might hit hd io limit on the power8 server).
So yea, that and try SMT-2 and SMT-4 modes.
JohanAnandtech - Friday, September 16, 2016 - link
Hi, I tried SMT-4, throughput was about 25% worse: 11k instead 14k+. 95th perc response time was better: 3.7 ms.JohanAnandtech - Friday, September 16, 2016 - link
updated the MySQL graphs with SMT-4 data. Our Spark tests gets worse with SMT-4 and that is also true for SPECjbb.luminarian - Friday, September 16, 2016 - link
Awesome, Thanks for the response.Meteor2 - Friday, September 16, 2016 - link
The HPC potential is awesome. You can really see why Oak Ridge chose POWER9 and Volta.Communism - Sunday, September 18, 2016 - link
Pretty sure most of the reason for that is due to Intel blocking every attempt Nvidia makes at getting a high bandwidth interface bolted onto a Xeon.Given that one of the main reasons that Intel blocked Nvidia's chipset business way back in the day was to try to limit the ability of other companies bolting on high bandwidth accelerators onto Intel chips (Presumably to protect their own initiatives in that space).
Klimax - Saturday, September 17, 2016 - link
Not terribly impressive. You have to get SW to paly nice and spend time to fine tune it to outperform Intel and it will cost you in power and cooling. More like "yes, if you get quite bigger TDP you get bit more power". And it won't be terribly good in many cases. (Like public facing service where latency is critical)Maybe if you are in USA and can waste admins and devs time and waste a lot on cooling and electricity then maybe. Otherwise why bother...
SarahKerrigan - Sunday, September 18, 2016 - link
I don't see this as a bad result. This is a 22nm processor, over two years old, and it beats Haswell-EP (which is newer) on efficiency. Broadwell-EP is brand new, and P9 should come out well before the end of BDW-EP's lifecycle.Kevin G - Sunday, September 18, 2016 - link
Some of the POWER9 chips will be out next year though is suspect that the scale-up models maybe an early 2018 part. Considering that those chips go into IBM's big iron Unix servers, they tend to launch a bit later than the low end models so it isn't game changing.The real question is when SkyLake-EP/EX will launch and in comparison to the scale-out POWER9 chips. I was expecting a first half of 2017 for the Intel parts but I have no reference as to when to expect the POWER9 SO chips. Thus there is a chance Intel can come out first.
Intel also wants a quick transition to SkyLake-EP/EX as they unify those to lines to some extent and provide some major platform improvements. I'm thinking Broadwell-EP/EX will have a relatively short life span compared to Haswell-EP/EX. This mimics much of what happened on the desktop and the challenge to move to 14 nm.