Reading the Benchmarks

There are a lot of benchmarks available that compare the IBM POWER8 to Xeons. One example is the Enterprise Resource Planning (ERP) software SAP. We have used the Sales & Distribution 2 Tier benchmark many times because it is one of the very few benchmarks that is a very good representation of real world high-end enterprise workloads.

SAP Sales & Distribution 2 Tier benchmark

Now combine this with the benchmarks that IBM has compiled on their marketing slides and the fact that we know that the POWER8 chip has a TDP of 190W at nominal speed and 247W when running at "Turbo" clockspeeds.

It all seems very simple: the IBM POWER8 is a more power hungry chip but delivers much better performance. But as always you should take the time to read the benchmarks very closely. The IBM S824 is typically the one featured in the benchmarks. However, we are pretty sure that is not the system that will be able to sway the current Intel Xeon customers towards OpenPOWER. Nor are we convinced that the most widely reported benchmarks are accurately predicting the experience of those people.

There are three reasons for that. First of all, most of the benchmarks are run on AIX (7), IBM's own proprietary UNIX. AIX is a high performance, extremely robust OS, but it does not have the rich software system and support that Linux has. Furthermore even with their common design elements, an excellent Linux administrator will have to invest some time to get the same level of expertise in AIX. But more importantly, the S824 is a pretty expensive machine, both in acquisition cost (starting at $21.000, up to $60.000 and more) and energy cost. That kind of pricing lands the system in hostile and more powerful quad Xeon E7 territory.

Lastly, the S824 uses two CPU cards or Dual Chip Modules (DCM), each containing two six-core POWER8 modules at 3.5 GHz. Now consider that the third party OpenPOWER servers have 190/247W TDP 10-core 3.4 GHz POWER8 CPUs. The power consumption does not increase linearly as you add more cores and higher clocks. So the CPU modules found inside the S824 are definitely more power hungry, probably well above 250W.

There is more. Take a look at IBM "Scale-out" server, the more affordable server range of IBM servers. First, a bit of IBM server nomenclature which is actually quite logical and easy to decipher (take note, Intel marketing).

  • S stands for "Scale-out"
  • 8 stands for POWER8
  • 1 or 2 is the number of sockets
  • 2 or 4 is the height, expressed in rack Us.

So an S824 contains 2 sockets in a 4U chassis and a S812 is a one socket system. There is one designation left, : the "L" or Linux .

Notice that the non-L versions also support Linux, but a few months ago they supported only the Big Endian (BE) versions (the slide is from the beginning of this year). IBM told us that all POWER8 servers now support both Little Endian (LE) and BE Linux.

This is important since using an LE version (Ubuntu, SUSE) makes data migration from and data sharing (NAS, SAN) with an x86 system much easier, as x86 only supports LE.

Challenging the Xeon Software Issues
Comments Locked

146 Comments

View All Comments

  • usernametaken76 - Thursday, November 12, 2015 - link

    Technically this is not true. IBM had a working version of AIX running on PS/2 systems as late as the 1.3 release. Unfortunately support was withdrawn and future releases of AIX were not compiled for x86 compatible processors. One can still find a copy of this release if one knows where to look. It's completely useless to anyone but a museum or curious hobbyist, but it's out there.
  • zenip - Friday, November 13, 2015 - link

    ...>--click here-
  • Steven Perron - Monday, November 23, 2015 - link

    Hello Johan,

    I was reading this article, and I found it interesting. Since I am a developer for the IBM XL compiler, the comparisons between GCC and XL were particularly interesting. I tried to reproduce the results you are seeing for the LZMA benchmark. My results were similar, but not exactly the same.

    When I compared GCC 4.9.1 (I know a slightly different version that you) to XL 13.1.2 (I assume this is the version you used), I saw XL consistently ahead of GCC, even when I used -O3 for both compilers.

    I'm still interested in trying to reproduce your results, so I can see what XL can do better, so I have a couple questions on areas that could be different.

    1) What version of the XL compiler did you use? I assumed 13.1.2, but it is worth double checking.
    2) Which version of the 7-zip software did you use? I picked up p7zip 15.09.
    3) Also, I noticed when the Power 8 machine was running at full capacity (for me that was 192 threads on a 24 core machine), the results would fluctuate a bit. How many runs did you do for each configuration? Were the results stable?
    4) Did you try XL at the less aggressive and more stable options like "-O3" or "-O3 -qhot"?

    Thanks for you time.
  • Toyevo - Wednesday, November 25, 2015 - link

    Other than the ridiculous price of CDIMMs the power efficiency just doesn't look healthy. For data centers leasing their hardware like Amazon AWS, Google AppEngine, Azure, Rackspace, etc, clients who pay for hardware yet fail to use their allocation significantly help the bottom line of those companies by reduced overheads. For others high usage is a mandatory part of the ROI equation during its period as an operating asset, thus power consumption is a real cost. Even with our small cluster of 12 nodes the power efficiency is a real consideration, let alone companies standardizing toward IBM and utilising 100s or 1000s of nodes that are arguably less efficient.

    Perhaps you could devise some sort of theoretical total cost of ownership breakdown for these articles. My biggest question after all of this is, which one gets the most work done with the lowest overheads. Don't get me wrong though, I commend you and AnandTech on the detail you already provide.
  • AstroGuardian - Tuesday, December 8, 2015 - link

    It's good to have someone challenging Intel, since AMD crap their pants on regular basis
  • dba - Monday, July 25, 2016 - link

    Dear Johan:

    Can you extrapolate how much faster the Sparc S7 will be in your Cluster Benchmarking,
    if the 2 on Die Infiniband ports are Activated, 5, 10, 20% ???

    Thank You, dennis b.

Log in

Don't have an account? Sign up now