Java Server Performance

The SPECjbb 2013 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security.

Benchmark architecture diagram

We tested with four groups of transaction injectors and back-ends. We applied relatively basic tuning to mimic real-world use. We used this JVM configuration setting for the systems limited to 32 GB (all Xeon E3):

"-server -Xmx4G -Xms4G -Xmn2G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 20-27GB of RAM. For the servers that could address 64 GB or more (Atom, Xeon D and Xeon E5), we used a slightly beefier setting:

"-server -Xmx8G -Xms8G -Xmn4G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 43-57GB of RAM. The first metric is basically maximum throughput.

SPECJBB 2013-Multi max-jOPS

As long as you run enough JVMs on top your server, the Xeon D and Xeon E5 will not dissapoint. The Xeon D is at least 37% faster than the previous Xeon E3 generation, the Xeon E5 delivers 50% more. 

The Critical-jOPS metric is a throughput metric under response time constraint.

SPECJBB 2013-Multi Critical-jOPS

The Xeon D seems to be slightly hindered by the lack of memory bandwidth in the max throughput benchmark, but less than in our HPC benchmark. It is important to understand that maximum throughput is very important in a HPC benchmark, but for a Java based back-end server, the critical benchmark matters much more than the maximum one. The reason is simple: the critical benchmark tells you what your customers will experience on a daily basis, the maximum throughput benchmark descibes what you will get in the worst case scenario when your server is pushed to its limits. 

In the critical benchmark, the Xeon D is at least 65% faster than any Xeon E3. The Broadwell core is a minor improvement over the Haswell core when you look at performance only (single threaded integer performance), but once it is integrated in a chip like the Xeon D, it is astonishing how much performance per watt you get. A 60-70% increase in performance per watt is a rare thing indeed. 

SPECJBB®2013 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).
HPC: Fluid Dynamics Web Server Performance
Comments Locked

90 Comments

View All Comments

  • extide - Tuesday, June 23, 2015 - link

    That's ECC Registered, -- not sure if it will take that, but probably, although you dont need registered, or ECC.
  • nils_ - Wednesday, June 24, 2015 - link

    If you want transcoding, you might want to look at the Xeon E3 v4 series instead, which come with Iris Pro graphics. Should be a lot more efficient.
  • bernstein - Thursday, June 25, 2015 - link

    for using ECC UDIMMs, a cheaper option would be an i3 in a xeon e3 board.
  • psurge - Tuesday, June 23, 2015 - link

    Has Intel discussed their Xeon-D roadmap at all? I'm wondering in particular if 2x25GbE is coming, whether we can expect a SOC with higher clock-speed or more cores (at a higher TDP), and what the timeframe is for Skylake based cores.
  • nils_ - Tuesday, June 23, 2015 - link

    Is 25GbE even a standard? I've heard about 40GbE and even 56GbE (matching infiniband), but not 25.
  • psurge - Tuesday, June 23, 2015 - link

    It's supposed be a more cost effective speed upgrade to 10GbE than 40GbE (it uses a single 25Gb/s serdes lane, as used in 100GbE, vs 4 10Gb/s lanes), and IIRC is being pushed by large datacenter shops like Google and Microsoft. There's more info at http://25gethernet.org/. I'm not sure where things are in the standardization process.
  • nils_ - Wednesday, June 24, 2015 - link

    It also has an interesting property when it comes to using a breakout cable of sorts, you could connect 4 servers to 1 100GbE port (this is already possible with 40GbE which can be split into 4x10GbE).
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Considering that the Xeon D must find a home in low power high density servers, I think dual 10 Gbit will be standard for a while. Any idea what 25/40 Gbit PHY would consume? Those 10 Gbit PHYs already need 3 Watt in idle, probably around 6-8W at full speed. That is a large chunk of the power budget in a micro/scale out server.
  • psurge - Wednesday, June 24, 2015 - link

    No I don't, sorry. But, I thought SFP+ with SR optics (10GBASE-SR) was < 1W per port, and that SFP+ direct attach (10GBASE-CR) was not far behind? 10GBASE-T is a power hog...
  • pjkenned - Tuesday, June 23, 2015 - link

    Hey Johan - just re-read. A few quick thoughts:
    First off - great piece. You do awesome work. (This is Patrick @ ServeTheHome.com btw)

    Second - one thing should probably be a bit clearer - you were not using a Xeon D-1540. It was a ES Broadwell-DE version at 2.0GHz. The shipping product has 100MHz higher clocks on both base and max turbo. I did see a 5% or so performance bump from the first ES version we tested to the shipping parts. The 2.0GHz parts are really close to shipping spec though. One both of my pre-release Xeon D and all of the post-release Xeon D systems was nearly identical.

    Those will not change your conclusions but does make the actual Intel Xeon D-1540 a bit better than the one you tested. LMK if you want me to set aside some time on a full speed version on a Xeon D-1540 system for you.

Log in

Don't have an account? Sign up now