Java Performance

The SPECjbb 2015 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security.

We tested with four groups of transaction injectors and backends. The reason why we use the "Multi JVM" test is that it is more realistic: multiple VMs on a server is a very common practice.

The Java version was OpenJDK 1.8.0_91. We applied relatively basic tuning to mimic real-world use, while aiming to fit everything inside a server with 128 GB of RAM:

"-server -Xmx24G -Xms24G -Xmn16G -XX:+AlwaysPreTouch -XX:+UseLargePages"

The graph below shows the maximum throughput numbers for our MultiJVM SPECJbb test.

SPECJBB 2015-Multi Max-jOPS

The Critical-jOPS metric is a throughput metric under response time constraint.

SPECJBB 2013-Multi Critical-jOPS

After checking the other benchmarks and IBM's published benchmarks at spec.org we suspected that there was something suboptimal for the POWER8 server. A Java expert at IBM responded:

Currently OpenJDK 8 performance on power lags behind the IBM JDK for this benchmark, and is not reflective of the hardware. The gap is being closed by driving changes into OpenJDK 9 that will be back ported to OpenJDK 8.

So this is one of those cases where the "standard" open source ecosystem is not optimal. So we tried out IBM's 64 bit SDK version LE 8.0-3.11. Out of the box, throughput got even worse. Only when we used more complicated tuning and more memory...

"-XX:-RuntimeInstrumentation -Xmx32G -Xms32G -Xmn16G -Xlp -Xcompressedrefs -Dcom.ibm.crypto.provider.doAESInHardware=true -XnotlhPrefetch"

...did performance get better. We also had to use static huge pages (16 MB each) instead of transparent huge pages. We gave the Intel Xeons also 32 GB per VM and eight 32 GB DIMMs, but we did not use the IBM JDK as it caused a performance decrease.

SPECJBB 2015-Multi Max-jOPS optimized

Both the Xeon and the IBM POWER8 max throughput got worse. In the case of Intel we can easily explain this: its 32 GB of DDR4-2133 memory offers less bandwidth and higher latency. In case of IBM, we can only say we lack knowledge of the IBM JDK.

SPECJBB 2013-Multi Critical-jOPS optimized

Again, the Xeon E5-2680 v4 loses performance due to the fact that we had to use more but slower memory.

This is not an apples to apples comparison as we would like it, but it gives a good indication that the IBM POWER8 might actually able to beat even more expensive Xeon E5s than our (simulated) Xeon E5-2680v4 with optimized software. A Xeon E5-2690 v4 ($2090) would be beaten by a decent margin and even the 2695v4 ($2400) might be in reach.

Benchmark Configuration and Methodology Database Performance
Comments Locked

49 Comments

View All Comments

  • PowerOfFacts - Friday, September 16, 2016 - link

    troll
  • BOMBOVA - Friday, October 7, 2016 - link

    Rich info , good scout
  • PowerOfFacts - Friday, September 16, 2016 - link

    Sigh ....
  • PowerOfFacts - Friday, September 16, 2016 - link

    That's strange, this site says you can buy a POWER8 server for $4800. https://www.ibm.com/marketplace/cloud/big-data-inf...

    Screwed up Power (so many times)? Please explain? Compared to what....SPARC? Itanium? If you are talking about those platforms, POWER has 70% of that marketshare. Do you mean against "Good Enough" Intel? Absolutely Intel is the market leader but only in share as it isn't in innovation. Power still delivers enterprise features for AIX and IBM i customers with features Intel could only dream about. Where the future of the data center is going with Linux, well it did take IBM a while to figure out they couldn't do it their way. Now, they are committed 100% (from my perspective as a non-IBMer while also being committed to AIX & IBM i as their is a solid install base there) which we all see in the form of IBM & even non-IBM solutions built by OpenPOWER partners and ISV solutions using little endian Linux. Yes, there are some workloads that require extra work to optimize but for those already optimized or those which can be optimized, those customers can now buy a server for less money that has the potential to outperform Intel by up to 2X, in a system using innovative technology (CAPI & NVLink) that is more reliable. I don't know, IBM may be late and Power has some work to do but I really don't think you can back up your statement that "IBM has screwed up power so many times". Latest OpenPOWER Summit was a huge success. Here is a Google interview https://www.youtube.com/watch?v=f0qTLlvUB-s&fe...

    Oh, but you were probably just trying to be clever and take a few competitive shots.
  • CajunArson - Saturday, September 17, 2016 - link

    Yeah, that $4800 Power server wasn't nearly equivalent to what was benchmarked in this review with the "midrange" server that costs over $11K on the same web page you cited.

    I could build an 8 or 12 core Xeon that would put the hurt on that low-end Power box for less money and continue to save money during every minute of operation.
  • JohanAnandtech - Saturday, September 17, 2016 - link

    " it will cost anywhere from 5-10X" . What do you base this on? Several SKUs of IBM are in the $1500 range. "Something like $10K for the processor". This seems to be about the high-end. The E7s are in the $4.6-7k range. Even if IBM would charge $10k for the high end CPUs, it is nowhere near being 5x more expensive. Unless I am missing something, you seem to have missed that IBM has a scale out range and is offering much more affordable OpenPOWER CPUs.
  • jesperfrimann - Wednesday, September 21, 2016 - link

    IMHO, the place where POWER servers make sense right now, is for use with IBM software. So if you are using something DB2 or WebSphere, where the real cost is the Software licenses.
    Then it's really a Nobrainer. Not that your local IBM sales Guy will like that you'll do a switch to a Linux@Power solution :)

    // Jesper
  • YukaKun - Thursday, September 15, 2016 - link

    For the Java tests, did you change the GC collector settings? Also, why only 24GB for the JVM? I run JBoss with 32GB across our servers. I'd use more, but they still have issues with going to higher levels.

    Cheers!
  • madwolfa - Thursday, September 15, 2016 - link

    Unless working with huge datasets you want to keep your JVM heap size as reasonably low as possible... otherwise there would be a penalty on GC performance. Granted, with this sort of hardware it would be pretty minuscule, but the general rule of thumb still applies...
  • JohanAnandtech - Thursday, September 15, 2016 - link

    No changes to the GC Collector settings. 24 GB for VM = 4x 24 GB + 4x 3 GB for Transaction Injector and 2 GB for the controllor = +/- 110 GB memory. We wanted to run it inside 128 GB as most of our DIMMs are 16 GB at DDR4-2400/2133.

Log in

Don't have an account? Sign up now