Java Server Performance

The SPECjbb 2013 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communication, and messaging with security.

Benchmark architecture diagram

We tested with four groups of transaction injectors and back-ends. We applied relatively basic tuning to mimic real-world use. We used this JVM configuration setting for the systems limited to 32 GB (all Xeon E3):

"-server -Xmx4G -Xms4G -Xmn2G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 20-27GB of RAM. For the servers that could address 64 GB or more (Atom, Xeon D and Xeon E5), we used a slightly beefier setting:

"-server -Xmx8G -Xms8G -Xmn4G -XX:+AlwaysPreTouch -XX:+UseLargePages"

With these settings, the benchmark takes about 43-57GB of RAM. The first metric is basically maximum throughput.

SPECJBB 2013-Multi max-jOPS

As long as you run enough JVMs on top your server, the Xeon D and Xeon E5 will not dissapoint. The Xeon D is at least 37% faster than the previous Xeon E3 generation, the Xeon E5 delivers 50% more. 

The Critical-jOPS metric is a throughput metric under response time constraint.

SPECJBB 2013-Multi Critical-jOPS

The Xeon D seems to be slightly hindered by the lack of memory bandwidth in the max throughput benchmark, but less than in our HPC benchmark. It is important to understand that maximum throughput is very important in a HPC benchmark, but for a Java based back-end server, the critical benchmark matters much more than the maximum one. The reason is simple: the critical benchmark tells you what your customers will experience on a daily basis, the maximum throughput benchmark descibes what you will get in the worst case scenario when your server is pushed to its limits. 

In the critical benchmark, the Xeon D is at least 65% faster than any Xeon E3. The Broadwell core is a minor improvement over the Haswell core when you look at performance only (single threaded integer performance), but once it is integrated in a chip like the Xeon D, it is astonishing how much performance per watt you get. A 60-70% increase in performance per watt is a rare thing indeed. 

SPECJBB®2013 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).
HPC: Fluid Dynamics Web Server Performance
Comments Locked

90 Comments

View All Comments

  • zodiacfml - Tuesday, June 23, 2015 - link

    this is the reason why Intel focuses on mobile, it benefits their server cpus too.

    the 14nm process is the one to thank for these massive improvements. Samsung also has 14nm and the S6 Exynos is in similar achievement
  • Refuge - Tuesday, June 23, 2015 - link

    I disagree, the Exynos is no where close to a similar achievement.

    Granted it is doing better than Qualcomm's equivalent at the moment.

    But I'm also faster than a fat man with a broken leg running on a hot and humid day.
  • zodiacfml - Tuesday, June 23, 2015 - link

    Still, these 14nm SoCs are the best in their class as they pack more cores while using less power.
  • LukaP - Thursday, June 25, 2015 - link

    Just a note, Samsung's (and TSMC's 16nm FF(+) process isnt really 16nm entirely. The interconnects are still 28nm making it not nearly as dense as intel's 14nm, as well as being more leaky. IIRC their density and leakage can be compared to intels 22nm TriGate in the times of Ivy Bridge
  • nils_ - Tuesday, June 23, 2015 - link

    Few questions:
    1. Why did you disable x2apic?
    2. Did the Large Page allocation in the Java Benchmark actually work? It can be a bit tricky some times and then falls back to 4KiB pages
    3. What were the JVM settings for elasticsearch?
  • JohanAnandtech - Thursday, June 25, 2015 - link

    1. Was out of the box disabled. I have to admit I did not check that option. Performance impact should be neglible though.
    2. I have no monitored that, but there was a performance impact if we disabled it.
    3. ES_heap_size = 20 G; otherwise standard ES settings
  • Daniel Egger - Tuesday, June 23, 2015 - link

    Wow, that is still quite pricey here. For the price of the SuperMicro tower you can actually get a 1U 2S Xeon E5 system with one socket equipped and some memory. I'd really love to replace my home server (running on Core i5 rather than Xeon E3 for efficiency reasons, those C chipset suck balls) with one of those systems if they can make them efficient and quiet.
  • hifiaudio2 - Tuesday, June 23, 2015 - link

    Two questions:

    1. How does the Xeon D compare to the c2700 series for a home NAS that will also serve as an Emby server and HDHR DVR (when that software is available). Could be one or two 1080p transcodes going on at the same time at most. Usually no transcoding if I am using Kodi or something that can natively play back the file, but for remote viewing or random uses over the network, some transcoding by Emby could be required -- if you are not familiar with Emby think of the same thing using Plex. So would the extra power of the Xeon D be of use to me, or is the 8 core c2750 plenty for the aforementioned use case?

    2. If I do go with this unit, which dimms specifically does it use? The Supermicro c2750 board takes laptop style dimms. What does this take?
  • JohanAnandtech - Tuesday, June 23, 2015 - link

    I can answer 2: see the picture here: http://www.anandtech.com/show/9185/intel-xeon-d-re... RDIMMs or UDIMMS (= basically "normal" DDR-4) will do.
  • hifiaudio2 - Tuesday, June 23, 2015 - link

    Thanks.. So this ram:?

    http://www.amazon.com/Crucial-PC4-2133-Registered-...

    And what is the SR x4 / DR x8 difference in the two choices for the 8gb sticks?

Log in

Don't have an account? Sign up now