SPECjbb2005

SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a possible disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections, rather than a separate database. The SPECjbb score thus depends on:
  • The JVM (Java Virtual Machine) and the way the JVM is tuned
  • CPU processing power
  • Caching and memory speed
  • Multiprocessing configuration (Scalability)
The latest version SPECjbb2005 is much more memory intensive and uses XML processing among other changes. From spec.org:
"SPECjbb2005 is a follow-on release to SPECjbb2000, which was inspired by the TPC-C benchmark and loosely follows the TPC-C specification for its schema, input generation, and transaction profile. SPECjbb2005 runs in a single JVM in which threads represent terminals, where each thread independently generates random input before calling transaction specific logic. There is neither network nor disk IO in SPECjbb2005."
SPECjbb starts up to two threads per core. For example, with Hyper-Threading enabled on our eight core/quad CPU Xeon MP 7030M system, 32 threads were started on the 16 logical CPUs. Each thread is a warehouse. Again from SPEC.org:
"A warehouse is a unit of stored data. It contains roughly 25MB of data stored in many objects in several Collections (HashMaps, TreeMaps). A thread represents an active user posting transaction requests within a warehouse. There is a one-to-one mapping between warehouses and threads, plus a few threads for SPECjbb2005 main and various JVM functions. As the number of warehouses increases during the full benchmark run, so does the number of threads. A "point" represents the throughput during the measurement interval at a given number of warehouses. A full benchmark run consists of a sequence of measurement points with an increasing number of warehouses (and thus an increasing number of threads)"
First we tested with some decent but rather generic tuning that we could use on all systems. The JVM was Sun's, version 1.5.0_08.
java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props

Our first test is done with only one instance, and you might recall from our Xeon MP coverage that this is a setup that the Opteron does not like. Let us focus mostly on the Intel results. Interestingly, the "Core based" Xeon 5345 cannot outperform the "Pentium 4 based" Xeon MP 7130. The higher clock speed of the Xeon MP (3.2GHz) helps of course, but it is still a surprise, especially considering that the cache system of the Xeon 5345 is quite competitive (4MB low latency L2 per two cores) compared to the Xeon MP (1MB L2 per core, a high latency 8MB L3 per two cores). Clovertown has also the better memory subsystem, especially if you compare the memory latency (120 versus 195 ns).

Next, we also tested SPECjbb with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to one CPU node on the HP DL585. We didn't bind instances to CPUs on the Intel platforms (it is possible with taskset) as it gives worse performance.

On the Opteron we used:
numactl -cpubind=(1-4) -membind=(1-4) java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1-4)
On the Xeons we used:
java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1 to 4)

As we have noticed before, the Xeons do not benefit from using more instances, while Opteron performance is boosted significantly. That is quite good news for AMD, as testing with multiple instances is more realistic according to most java people we talked to. The four dual core 2.4GHz Opterons outperform the 2.33GHz Xeon E5345 by a small margin. This really deserves more attention, as normally the core based CPUs are capable of outperforming similarly clocked Opterons by a 20% margin and more. We decided to check out the scaling of the different CPUs by testing with four and eight cores. We also tested the Opteron 880 with DDR-400. Unfortunately, we were not able to test with more than 8GB, so we could only test with two CPUs. The blue numbers are extrapolated numbers. The 2.8GHz Opteron numbers were based on the performance scaling we saw from 2.2GHz Opteron to a 2.4GHz Opteron.

Specjbb2005 4 instances (Sun Hotspot)
Per core performance
CPU Quad core Octal core Scaling 4->8
Xeon 7130 3.2 GHz 39942 72980 83%
Xeon 5345 2.33 GHz 39781 67447 70%
Opteron 880 2.4 GHz 37397 71364 91%
Opteron 880 2.4 GHz DDR400 41137 78500 91%
Opteron 890 2.8 GHz 46073 87920 91%
Xeon 5160 3 GHz 47743 N/A N/A
Xeon Scaling 2.33 -> 3 GHz 20%    
Opteron 880 vs. Quad core Xeon 2.33 GHz 3% 16% 31%

Here is a first indication that quad core Xeon does not scale as well as the other systems. Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%. In other words, the quad Opteron system scales 31% better than the Xeon system.

When you are in the market for a new server system, you typically care less about performance per core; instead, you care about the performance per dollar. That is why we should also look at performance per socket. If we look at typical HP systems for example, a two socket system with 8GB RAM can be found in the $6000-$7000 price range, a similar quad socket system can cost $11000-14000.

Specjbb2005 4 instances
Per socket performance
CPU Dual Socket
Quad core Xeon 2.33 GHz vs. Xeon 5160 41%
Quad core Xeon 2.33 GHz vs. Opteron 880 64%
Quad core Xeon 2.33 GHz vs. Opteron 890 46%

The Xeon 5345 might scale worse than the Opteron, but it offers a remarkable price/performance ratio. The Opteron 890/8220 costs about the same as Xeon 5345, but to be fair FB-DIMMs seem to be about 30% more expensive than comparative DDR2 DIMMs. In case of 8GB of RAM, this might amount to an extra cost of $300, making the Xeon 5345 system more expensive. Still the Xeon 5345 offers a compelling performance advantage.

Specjbb 2005 - Bea JRockit

We suspected that the Sun JVM is reasonably well optimized for the Opteron and maybe a little bit less effort went into the Intel optimizations. After all, Sun sells Opteron and Sparc servers. The BEA JRockit JDK provides a highly optimized JVM for running JAVA applications on the x86-64 and Itanium CPUs, so we did also some testing with the BEA Jrockit JVM. BEA is known for being a rather memory gobbling but highly tunable JVM, so we aggressively tuned our server JVM.

On the Xeons we used following parameters:
/java/jrockit-jdk1.5.0_06/bin/java -cp jbb.jar:check.jar -Xms2048m -Xmx2048m -XXaggressive -XXthroughputcompaction -XXallocprefetch -XXallocRedoPrefetch -XXcompressedRefs -XXlazyUnlocking -XXtlasize128k spec.jbb.JBBmain -propfile SPECjbb.props -id 1-4
On the Opterons we used the following parameters:
numactl --cpubind=0-4 --membind=0-4 /java/jrockit-jdk1.5.0_06/bin/java -classpath jbb.jar:check.jar -XXaggressive -XXcompressedRefs -XXthroughputCompaction -XXlazyUnlocking -XXtlasize=64k -Xms1536m -Xmx1536m spec.jbb.JBBmain -propfile SPECjbb.props -id 1-4

As we suspected, Jrockit is better optimized for Intel. A single Xeon 5345 outperforms a dual Opteron 880 by a large margin (26-39%). The victory is significant; however, the Clovertown scaling remains quite mediocre.

Specjbb2005 / Bea
Per core performance
CPU Quad core Octal core Scaling 4->8
Xeon 7130 3.2 GHz 50000 85909 72%
Xeon 5345 2.33 GHz 70035 103957 48%
Opteron 880 2.4 GHz 50346 92213 83%
Opteron 880 2.4 GHz DDR400 55381 101434 83%
Xeon 5160 3 GHz 79154 N/A N/A
Xeon Scaling 2.33 -> 3 GHz 13%    
Opteron 880 vs. Quad core Xeon 2.33 GHz -28% -11% 72%

Even with DDR-400, a dual Opteron 880 is not able to come close to a single Xeon E5345. However, the picture changes when we look at the "octal core" numbers. A dual Xeon E5345 is only 50% faster, while the Opteron increases its performance by 83% when the number of cores doubles.

Specjbb2005 / Bea
Per socket performance
CPU Dual Socket
Quad core Xeon 2.33 GHz vs. Xeon 5160 41%
Quad core Xeon 2.33 GHz vs. Opteron 880 64%

Still, the Quad core Xeon is still a champion, offering 41% more performance for the same price as its 3GHz dual core brother. If you are using the BEA JVM, the Xeon is a much better choice than the AMD Opteron.

Thanks and Testing Setup Secure Socket Layers RSA Performance
Comments Locked

15 Comments

View All Comments

  • zsdersw - Friday, December 29, 2006 - link

    quote:

    as opposed to a single die approach like Smithfield and Paxville DP


    Smithfield/Paxville is a MCM chip (two pieces of silicon in one package), as well.
  • Khato - Wednesday, December 27, 2006 - link

    Agreed on it being quite the good review, save for the lack of power consumption numbers/analysis. Form factor and power consumption can be just as important as the performance when the application can be spread across multiple machines, now can't it? At the very least, it would be nice to link to the power consumption numbers for the opteron platform in the first review it showed up in (which puts the dual clovertown at 365W load, while the quad 880 is supposedly 657W load.)
  • rowcroft - Wednesday, December 27, 2006 - link

    Loved the article, great job.

    I'm in the process of purchasing two dual quad core servers for VMWare use. Looking at the cost to performance analysis, it would be worth mentioning that many of the high end applications are licensed on a per socket basis. This alone is saving us $20,000 on our VMWare license and making it a compelling solution.

    I would love to see more of this type of article as well- very interesting and not something you can easily find elsewhere on the net. (Tom's hardware reviewed the chip running XP Pro!)
  • duploxxx - Friday, December 29, 2006 - link

    If you think that reading this review will help you to decide what to buy as VMWARE base you are going the wrong way! Yes these small tests are in favor for the new MCW architecture as we saw before and since haevy workload seems hard to test for some sites like anand! keep in mind that VMWARE is heavy workload, you combine the cpu and ram to whatever you want, guess what the fsb can't be combined like you wish!

    thinking that a 2x quad will outperform the 4p opteron is a big laugh! the fsb will kill youre whole ESX instantly from 4+ os on your system with normal load.

    the money you save is indeed for sure, the power you loose is an other thing!

    friendly info from a certified esx 3.0 beta tester :)
  • Viditor - Wednesday, December 27, 2006 - link

    Probably one of your most thorough and well-rounded articles Johan...many thanks!
    It was nice to see you working with large (16GB) memory.
    If you do get a Socket F system, will you be updating the article?

Log in

Don't have an account? Sign up now