Closing Thoughts: Microarchitecture

Wrapping things up, let's first look at the POWER8 from a microarchitectural point of view. The midrange POWER8 is able to offer significantly higher performance than the best midrange contemporary Xeon "Haswell-EP" E5s. Even the performance per watt can be quite good, but only in the right conditions (high average CPU load, complex workloads).

IBM's engineers got their act together in 2014. That might seem like a trivial thing to say, but "Netburst", "Bulldozer" and "Cell" designs show that building a balanced architecture is not an easy task.

Considering POWER8 Scale Out Servers

The performance data and analysis is all very interesting, but at the end of the day, what if you are a server buyer? The POWER8 servers are definitely not for everyone. For many people, the superior performance per watt ratio of the Xeon E5-2600 v4 is more attractive. In most virtualized environments, the CPU load is relatively low, and saving lots of power at low loads (at least 100W per socket) will keep the TCO a lot lower.

An – admittedly smaller – part of the market does not care that much about electricity bills, but rather favors performance per dollar. And in that case, the S812LC model that can use cheaper DIMMs to reach the same capacity (32 DIMMs slots instead of 16-24) combined with the relatively cheap POWER8 CPU can make sense. It is important to note that our benchmarks are definitely not the showcases. According to IBM, MariaDB and Postgres have been more optimized for the POWER8 than MySQL. In those cases, IBM claims up to 40% better performance than the Xeon E5-2690 v4.

Those IBM benchmarks show of course the POWER8 in the best light, but we feel we should not dismiss them, just take them with grain of salt. If you read our very first article about OpenPOWER on Linux, you will notice we had trouble getting many workloads to even run. Once we got it working, performance was suboptimal. Now, less than a year later, most of the performance problems have either vanished (MySQL, Spark) or improved a lot (OpenJDK). The OpenPOWER Linux ecosystem gets better at a very fast pace.

IBM asks $13141 for the S822LC ("for Big Data") which includes two 10-core POWER8s, 256 GB DRAM, two 1 TB disks. A similarly configured DELL R730 with Xeon E5-2680 v4 costs around $12-13k, a similar HP DL380 costs around $15-17k. Though it is admittedly debatable how large the price advantage is as the actual street prices are hard to determine. HP and IBM tend to give bigger discounts, but those discounts get smaller for the most affordable servers.

So although IBM will not convert the masses as the price advantage is not that large, the new POWER8 servers are competitively priced. The bottom line: the IBM POWER8 LC servers can offer you better performance per dollar than a similar x86 server. But it's not a guarantee; as a server buyer you have to do your research and check whether your application is among the POWER8 optimized ones, and what kind of CPU load profile your application has. The Intel Xeons, by comparison, require less research, and are much more "general purpose".

Meanwhile the most expensive of the new server models, the S822LC HPC with six quad port NVLINKs and four Tesla P100s, is unique in the HPC market. Given a workload that has real and meaningful bus bandwidth needs, and it is very likely that any Xeon server with 4 GPUs will have a lot of trouble competing with it.

Overall the new POWER8 servers are not a broad full scale attack on Intel's Xeon. Rather they are a potent attempt to establish some strong beachheads in a number of niche but key markets (GPU accelerated HPC, Big Data).

And looking towards the future, it's worth considering that the POWER9 will offer a scale out version without the expensive and power hogging memory buffers. With that in the works, it's clear that IBM and the members of the OpenPOWER foundations are definitely on the right track to grab a larger part of the server market.

Energy Consumption
Comments Locked

49 Comments

View All Comments

  • nils_ - Monday, September 26, 2016 - link

    Isn't the limit slighty lower than 32 GiB? At some point the JVM switches to 64 bit pointers, which means you'll lose a lot of the available heap to larger pointers. I think you might want to lower your settings. I'm curious, what kind of GC times are you seeing with your heap size? I don't currently have access to Java running on non virtualised hardware so I would like to know if the overhead is significant (mostly running Elasticsearch here).
  • CajunArson - Thursday, September 15, 2016 - link

    All in all the Power chip isn't terrible but the power consumption coupled with the sheer amount of tuning that is required just to get it competitive with the Xeons isn't too encouraging. You could spend far less time tuning the Xeons and still have higher performance or go ahead with tuning to get even more performance out of those Xeons.

    On top of the fact that this isn't a supposedly "high end" model, the higher end power parts cost more and will burn through even more power, and that's an expense that needs to be considered for the types of real-world applications that use these servers.
  • dgingeri - Thursday, September 15, 2016 - link

    That ad on the last page that claims lower equipment cost of course compares that to an HP DL380, the most overpriced Xeon E5 system out right now. (I know because I shopped them.) Comparing it to a comparable Dell R730 would show less expense, better support, and better expansion options.
  • Morawka - Thursday, September 15, 2016 - link

    you mean a company made a slide that uses the most extreme edge cases to make their product look good?!?! Shocking /s
  • Gondalf - Thursday, September 15, 2016 - link

    Something is wrong is these power consumption data. The plataform idles at 221W and under full load only 260W?? the cpu is vanished?? Power 8 at over 3Ghz has an active power of only 40W??
    1) the idle value is wrong or 2) the under load value is wrong. All this is not consistent with IBM TDP official values.
    IMO the energy consumption page of the article has to be rewrite.
  • JohanAnandtech - Thursday, September 15, 2016 - link

    We have double checked those numbers. It is probably an indication that many of the power saving features do not work well under Linux right now.
    BTW, just to give you an idea: running c-ray (floating point) caused the consumption to go to 361W.
  • Kevin G - Thursday, September 15, 2016 - link

    I presume that c-ray uses the 256 bit vector unit on POWER8?

    Also have you done any energy consumption testing that takes advantage of the hardware decimal unit?
  • mapesdhs - Thursday, September 15, 2016 - link

    C-ray isn't that smart. :D It's a very simple code, brute force basically, and the smaller dataset can easily fit in a modern cache (actually the middling size test probably does too on CPUs like these). Hmm, I suppose it's possible one could optimise the compilation a bit to help, but I doubt anything except a full rewrite could make decent use of any vector tech, and I don't want to allow changes to the code, that would make comparisons to all other test results null. Compiler optimisations are ok, but not multi-pass optimisations that feed back info about the target data into the initial compile, that's cheating IMO (some people have done this to obtain what look like really silly run times, but I don't include them on my main C-ray page).

    Ian.
  • Gondalf - Tuesday, September 20, 2016 - link

    Ummm so in short words the utilized sw don't stress at all the cpu, not even the hot caches near the memory banks. We need a bench with an high memory utilization and a balanced mix between integer and FP, more in line with real world utilization

    I don't know if this test is enough to say POWER8 is power/perf competitive with haswell in 22nm.
    In fact POWER market share is definitively at the historic minimum and 14nm Broadwell is pretty young, so this disaster it is not its fault.
  • jesperfrimann - Wednesday, September 21, 2016 - link

    If you have a OPAL (Bare Metal system that cannot run POWERVM) then all the powersavings features are off by default AFAIR.
    Try to have a look at:
    https://public.dhe.ibm.com/common/ssi/ecm/po/en/po...

    Many of the features does have a performance impact, ranging from negative over neutral to positive for a single one.

    But Again. I think your comparison with 'vanilla' software stacks are relevant. This is what people would see out of the box with an existing software stack.
    It is 101% relevant to do that comparison as this is the marked that IBM is trying to break into with these servers.

    But what could be fun to see was some tests where all the Bells and Whistles were utilized. As many have written here.. use of Hardware supported Decimal Floating Point. The Vector Execution unit, the ability to do hardware assisted Memory Compression etc. etc.

    // Jesper

Log in

Don't have an account? Sign up now