Java Server Performance

The SPECjbb 2013 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security.

Benchmark architecture diagram

We tested with four groups of transaction injectors and back-ends. We applied relatively basic tuning to mimic real-world use. We used this JVM configuration setting for the systems limited to 32 GB (all Xeon E3):

"-server -Xmx4G -Xms4G -Xmn2G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 20-27GB of RAM. For the servers that could address 64 GB or more (Atom, Xeon D and Xeon E5), we used a slightly beefier setting:

"-server -Xmx8G -Xms8G -Xmn4G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 43-57GB of RAM. The first metric is basically maximum throughput.

SPECJBB 2013-Multi max-jOPS

As long as you run enough JVMs on top your server, the Xeon D and Xeon E5 will not dissapoint. The Xeon D is at least 37% faster than the previous Xeon E3 generation, the Xeon E5 delivers 50% more. 

The Critical-jOPS metric is a throughput metric under response time constraint.

SPECJBB 2013-Multi Critical-jOPS

The Xeon D seems to be slightly hindered by the lack of memory bandwidth in the max throughput benchmark, but less than in our HPC benchmark. It is important to understand that maximum throughput is very important in a HPC benchmark, but for a Java based back-end server, the critical benchmark matters much more than the maximum one. The reason is simple: the critical benchmark tells you what your customers will experience on a daily basis, the maximum throughput benchmark descibes what you will get in the worst case scenario when your server is pushed to its limits. 

In the critical benchmark, the Xeon D is at least 65% faster than any Xeon E3. The Broadwell core is a minor improvement over the Haswell core when you look at performance only (single threaded integer performance), but once it is integrated in a chip like the Xeon D, it is astonishing how much performance per watt you get. A 60-70% increase in performance per watt is a rare thing indeed. 

SPECJBB®2013 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).
HPC: Fluid Dynamics Web Server Performance
Comments Locked

90 Comments

View All Comments

  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Hi Patrick, the base clock of our chip is 2 GHz, not 1.9 GHz as the one pre-production version that we got from Intel. I have to check the turboclocks though, but I do believe we have measured 2.6 GHz. I'll doublecheck.
  • pjkenned - Wednesday, June 24, 2015 - link

    Awesome! Our ES ones were 1.9GHz.
  • Chrisrodinis1 - Tuesday, June 23, 2015 - link

    For comparison, this server uses Xeon's. It is the HP Proliant BL460c G9 blade server: https://www.youtube.com/watch?v=0s_w8JVmvf0
  • MrDiSante - Wednesday, June 24, 2015 - link

    Why use only -O2 when compiling the benchmarks? I would imagine that in order to squeeze out every last bit of performance, all production software is compiled with all optimizations turned up to 11. I noticed that their github uses -O2 as an example - is it that TinyMemBenchmark just doesn't play nice with -O3?
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    The standard makefile had no optimization whatsoever. If you want to measure latency, you do not want maximum performance but rather accuracy, so I played it safe and used -O2. I am not convinced that all production software is optimized with all optimization turned on.
  • diediealldie - Wednesday, June 24, 2015 - link

    Intel seems disARMing them... X-Gene 2 doesn't look so promising, as they'll have to fight mighty Skylake-based Xeons, not Broadwell ones.

    Thanks for great article again.
  • jfallen - Wednesday, June 24, 2015 - link

    Thanks Johan for the great article. I'm a tech enthusiast, and will never buy or use one of these. But it makes great reading and I appreciate the time you take to research and write the article.

    Regards
    Jordan
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Happy to read this! :-)
  • TomWomack - Wednesday, June 24, 2015 - link

    This looks very much consistent with my experience; the disconcertingly high idle power (I looked at the board with a thermal camera; the hot chips were the gigabit PHY, the inductors for the power supply, and the AST2400 management chip), the surprisingly good memory performance, the fairly hot SoC (running sixteen threads of number-crunching I get a power draw of 83W at the plug) and the generally pretty good computation.

    I'm not entirely sure it was a better buy for my use case than a significantly cheaper 6-core Haswell E - Haswell E is not that hot, electricity not that expensive, and from my supplier the X10SDV-F board and memory were £929 whilst Scan get me an i7-5820K board, CPU and memory for £702. And four-channel DDR4 probably is usefully faster than two-channel for what I do.

    I quite strongly don't believe in server mystique - the outbuilding is big enough that I run out of power before I run out of space for micro-ATX cases, and I am lucky enough to be doing calculations which are self-checking to the point that ECC is a waste of money.
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Hi Tom, I believe we saw up to 90 Watt at the wall when running OpenFOAM (10 Gbit enabled). It is however less relevant for such a chip which is not meant to be a HPC chip as we have shown in the article. HPC really screams for an E5.

Log in

Don't have an account? Sign up now