Java Performance

The SPECjbb 2015 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security.

We tested with four groups of transaction injectors and backends. The reason why we use the "Multi JVM" test is that it is more realistic: multiple VMs on a server is a very common practice.

The Java version was OpenJDK 1.8.0_91. We applied relatively basic tuning to mimic real-world use, while aiming to fit everything inside a server with 128 GB of RAM:

"-server -Xmx24G -Xms24G -Xmn16G -XX:+AlwaysPreTouch -XX:+UseLargePages"

The graph below shows the maximum throughput numbers for our MultiJVM SPECJbb test.

SPECJBB 2015-Multi Max-jOPS

The Critical-jOPS metric is a throughput metric under response time constraint.

SPECJBB 2013-Multi Critical-jOPS

After checking the other benchmarks and IBM's published benchmarks at spec.org we suspected that there was something suboptimal for the POWER8 server. A Java expert at IBM responded:

Currently OpenJDK 8 performance on power lags behind the IBM JDK for this benchmark, and is not reflective of the hardware. The gap is being closed by driving changes into OpenJDK 9 that will be back ported to OpenJDK 8.

So this is one of those cases where the "standard" open source ecosystem is not optimal. So we tried out IBM's 64 bit SDK version LE 8.0-3.11. Out of the box, throughput got even worse. Only when we used more complicated tuning and more memory...

"-XX:-RuntimeInstrumentation -Xmx32G -Xms32G -Xmn16G -Xlp -Xcompressedrefs -Dcom.ibm.crypto.provider.doAESInHardware=true -XnotlhPrefetch"

...did performance get better. We also had to use static huge pages (16 MB each) instead of transparent huge pages. We gave the Intel Xeons also 32 GB per VM and eight 32 GB DIMMs, but we did not use the IBM JDK as it caused a performance decrease.

SPECJBB 2015-Multi Max-jOPS optimized

Both the Xeon and the IBM POWER8 max throughput got worse. In the case of Intel we can easily explain this: its 32 GB of DDR4-2133 memory offers less bandwidth and higher latency. In case of IBM, we can only say we lack knowledge of the IBM JDK.

SPECJBB 2013-Multi Critical-jOPS optimized

Again, the Xeon E5-2680 v4 loses performance due to the fact that we had to use more but slower memory.

This is not an apples to apples comparison as we would like it, but it gives a good indication that the IBM POWER8 might actually able to beat even more expensive Xeon E5s than our (simulated) Xeon E5-2680v4 with optimized software. A Xeon E5-2690 v4 ($2090) would be beaten by a decent margin and even the 2695v4 ($2400) might be in reach.

Benchmark Configuration and Methodology Database Performance
Comments Locked

49 Comments

View All Comments

  • loa - Monday, September 19, 2016 - link

    This article neglects one important aspect to costs:
    per-core licensed software.
    Those licenses can easily be north of 10 000$ . PER CORE. For some special purpose software the license cost can be over 100 000 $ / core. Yes, per core. It sounds ridiculous, but it's true.
    So if your 10-core IBM system has the same performance as a 14-core Intel system, and your license cost is 10 000$ / core, well, then you just saved yourself 40 000 $ by using the IBM processor.
    Even with lower license fee / core, the cost advantage can be significant, easily outweighing the additional electricity bill over the lifetime of the server.
  • aryonoco - Tuesday, September 20, 2016 - link

    Thanks Johan for another very interesting article.

    As I have said before, there is literally nothing on the web that compares with your work. You are one of a kind!

    Looking forward to POWER 9. Should be very interesting.
  • HellStew - Tuesday, September 20, 2016 - link

    Good article as usually. Thanks Johan.
    I'd still love to see some VM benchmarks!
  • cdimauro - Wednesday, September 21, 2016 - link

    I don't know how much value could have the performed tests, because they don't reflect what happens in the real world. In the real world you don't use an old o.s. version and an old compiler for an x86/x64 platform, only because the POWER platform has problems with the newer ones. And a company which spends so much money in setting up its systems, can also spend just a fraction and buy an Intel compiler to squeeze out the maximum performance.
    IMO you should perform the tests with the best environment(s) which is available for a specific platform.
  • JohanAnandtech - Sunday, September 25, 2016 - link

    I missed your reaction, but we discussed this is in the first part. Using Intel's compiler is good practice in HPC, but it is not common at all in the rest of the server market. And I do not see what an Intel compiler can do when you install mysql or run java based applications. Nobody is running recompiled databases or most other server software.
  • cdimauro - Sunday, October 2, 2016 - link

    Then why you haven't used the latest available distro (and compiler) for x86? It's the one which people usually use when installing a brand new system.
  • nils_ - Monday, September 26, 2016 - link

    This seems rather disappointing, and with regards to optmized Postgres and MariaDB, I think in that case one should also build these software packages optimized for Xeon Broadwell.
  • jesperfrimann - Thursday, September 29, 2016 - link

    @nils_
    Optimized for.. simply means that the software has been officially ported to POWER, and yes that would normally include that the specific accelerators that are inside the POWER architecture now are actually used by the software, and this usually means changing the code a bit.
    So .. to put it in other words .. just like it is with Intel x86 Xeons.

    // Jesper
  • alpha754293 - Monday, October 3, 2016 - link

    I look forward to your HPC benchmarks if/when they become available.

Log in

Don't have an account? Sign up now