Closing Thoughts: Microarchitecture

Wrapping things up, let's first look at the POWER8 from a microarchitectural point of view. The midrange POWER8 is able to offer significantly higher performance than the best midrange contemporary Xeon "Haswell-EP" E5s. Even the performance per watt can be quite good, but only in the right conditions (high average CPU load, complex workloads).

IBM's engineers got their act together in 2014. That might seem like a trivial thing to say, but "Netburst", "Bulldozer" and "Cell" designs show that building a balanced architecture is not an easy task.

Considering POWER8 Scale Out Servers

The performance data and analysis is all very interesting, but at the end of the day, what if you are a server buyer? The POWER8 servers are definitely not for everyone. For many people, the superior performance per watt ratio of the Xeon E5-2600 v4 is more attractive. In most virtualized environments, the CPU load is relatively low, and saving lots of power at low loads (at least 100W per socket) will keep the TCO a lot lower.

An – admittedly smaller – part of the market does not care that much about electricity bills, but rather favors performance per dollar. And in that case, the S812LC model that can use cheaper DIMMs to reach the same capacity (32 DIMMs slots instead of 16-24) combined with the relatively cheap POWER8 CPU can make sense. It is important to note that our benchmarks are definitely not the showcases. According to IBM, MariaDB and Postgres have been more optimized for the POWER8 than MySQL. In those cases, IBM claims up to 40% better performance than the Xeon E5-2690 v4.

Those IBM benchmarks show of course the POWER8 in the best light, but we feel we should not dismiss them, just take them with grain of salt. If you read our very first article about OpenPOWER on Linux, you will notice we had trouble getting many workloads to even run. Once we got it working, performance was suboptimal. Now, less than a year later, most of the performance problems have either vanished (MySQL, Spark) or improved a lot (OpenJDK). The OpenPOWER Linux ecosystem gets better at a very fast pace.

IBM asks $13141 for the S822LC ("for Big Data") which includes two 10-core POWER8s, 256 GB DRAM, two 1 TB disks. A similarly configured DELL R730 with Xeon E5-2680 v4 costs around $12-13k, a similar HP DL380 costs around $15-17k. Though it is admittedly debatable how large the price advantage is as the actual street prices are hard to determine. HP and IBM tend to give bigger discounts, but those discounts get smaller for the most affordable servers.

So although IBM will not convert the masses as the price advantage is not that large, the new POWER8 servers are competitively priced. The bottom line: the IBM POWER8 LC servers can offer you better performance per dollar than a similar x86 server. But it's not a guarantee; as a server buyer you have to do your research and check whether your application is among the POWER8 optimized ones, and what kind of CPU load profile your application has. The Intel Xeons, by comparison, require less research, and are much more "general purpose".

Meanwhile the most expensive of the new server models, the S822LC HPC with six quad port NVLINKs and four Tesla P100s, is unique in the HPC market. Given a workload that has real and meaningful bus bandwidth needs, and it is very likely that any Xeon server with 4 GPUs will have a lot of trouble competing with it.

Overall the new POWER8 servers are not a broad full scale attack on Intel's Xeon. Rather they are a potent attempt to establish some strong beachheads in a number of niche but key markets (GPU accelerated HPC, Big Data).

And looking towards the future, it's worth considering that the POWER9 will offer a scale out version without the expensive and power hogging memory buffers. With that in the works, it's clear that IBM and the members of the OpenPOWER foundations are definitely on the right track to grab a larger part of the server market.

Energy Consumption
Comments Locked

49 Comments

View All Comments

  • JohanAnandtech - Sunday, September 25, 2016 - link

    Thanks Jesper. Looks like I will have to spend even more time on that system :-). And indeed, out of the box performance is important if IBM ever wants to get a piece of the x86 market.
  • luminarian - Thursday, September 15, 2016 - link

    It was my understanding that the SMT mode on the power8 could be changed. Depending on the type of work this would make a giant difference, especially with mysql/mariadb that are limited to 1 process/thread per connection.

    With databases the real winner would be with one that supports parallel queries, such as postgresql 9.6, db2, oracle, etc.

    Also yer bench mark very easily could be limiting the power8 if its not opening enough connections to fill out the number of threads that thing can handle, remember mysql/mariaDB are 1 process/thread per connection. Alot of database bench marks default to a small number of connections, this thing has 160 threads with the dual 10 core. I would suggest trying to run that same benchmark again but do it at the same time from multiple client machines. See if the bench takes a larger dip when a second client machine runs the same bench or if the bench shows similar figures(granted this might hit hd io limit on the power8 server).

    So yea, that and try SMT-2 and SMT-4 modes.
  • JohanAnandtech - Friday, September 16, 2016 - link

    Hi, I tried SMT-4, throughput was about 25% worse: 11k instead 14k+. 95th perc response time was better: 3.7 ms.
  • JohanAnandtech - Friday, September 16, 2016 - link

    updated the MySQL graphs with SMT-4 data. Our Spark tests gets worse with SMT-4 and that is also true for SPECjbb.
  • luminarian - Friday, September 16, 2016 - link

    Awesome, Thanks for the response.
  • Meteor2 - Friday, September 16, 2016 - link

    The HPC potential is awesome. You can really see why Oak Ridge chose POWER9 and Volta.
  • Communism - Sunday, September 18, 2016 - link

    Pretty sure most of the reason for that is due to Intel blocking every attempt Nvidia makes at getting a high bandwidth interface bolted onto a Xeon.

    Given that one of the main reasons that Intel blocked Nvidia's chipset business way back in the day was to try to limit the ability of other companies bolting on high bandwidth accelerators onto Intel chips (Presumably to protect their own initiatives in that space).
  • Klimax - Saturday, September 17, 2016 - link

    Not terribly impressive. You have to get SW to paly nice and spend time to fine tune it to outperform Intel and it will cost you in power and cooling. More like "yes, if you get quite bigger TDP you get bit more power". And it won't be terribly good in many cases. (Like public facing service where latency is critical)

    Maybe if you are in USA and can waste admins and devs time and waste a lot on cooling and electricity then maybe. Otherwise why bother...
  • SarahKerrigan - Sunday, September 18, 2016 - link

    I don't see this as a bad result. This is a 22nm processor, over two years old, and it beats Haswell-EP (which is newer) on efficiency. Broadwell-EP is brand new, and P9 should come out well before the end of BDW-EP's lifecycle.
  • Kevin G - Sunday, September 18, 2016 - link

    Some of the POWER9 chips will be out next year though is suspect that the scale-up models maybe an early 2018 part. Considering that those chips go into IBM's big iron Unix servers, they tend to launch a bit later than the low end models so it isn't game changing.

    The real question is when SkyLake-EP/EX will launch and in comparison to the scale-out POWER9 chips. I was expecting a first half of 2017 for the Intel parts but I have no reference as to when to expect the POWER9 SO chips. Thus there is a chance Intel can come out first.

    Intel also wants a quick transition to SkyLake-EP/EX as they unify those to lines to some extent and provide some major platform improvements. I'm thinking Broadwell-EP/EX will have a relatively short life span compared to Haswell-EP/EX. This mimics much of what happened on the desktop and the challenge to move to 14 nm.

Log in

Don't have an account? Sign up now