64-bit Linux Java Performance: SPECjbb2005

SPECjbb needs good integer performance and an excellent memory subsystem, especially if you test with several instances as we do. So what integer improvements could help Barcelona here?

Fetching 32 bytes instead of 16 bytes (Intel Core, AMD previous Opterons) makes decoding a bit faster as the average decoding bandwidth increases, but will only help performance when the CPU is able to calculate many instructions per cycle, which is not the case in a lot of applications, including SPECjbb (IPC of 0.2 - 0.5). It might help with some branch intensive code however (unaligned branch targets).

The biggest improvement for integer code and especially code that accesses the memory a lot is the fact that finally AMD has an architecture that can reorder loads ahead of a load and in some cases a store. This feature has been lacking in the AMD family, while it has been present in the Intel CPUs since the Pentium Pro. It makes the newest AMD quad CPUs more "out of order" than previous CPUs; Intel's Core architecture is still a lot more flexible in this, but the AMD Barcelona should like the SPECjbb benchmark quite a bit: it has more memory bandwidth than the Core CPUs have available, and the gap in OOO integer processing with Core has been reduced quite a bit.

SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a separate possibly disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections. A longer description can be found here.

Again, it is not our objective to show the best possible scores. Very few people will take the time to fully tune the JVM and take the risk that some of the ultra aggressive optimizations backfire. So we tested with some decent but rather generic tuning that we could use on all systems. The JVM is Sun's version 1.5.0_08, which allows us to compare scores with previous results as we have had only a few days to test the newly arrived systems.

We tested SPECjbb2005 with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to a separate node. We didn't bind instances to CPUs on the Intel platforms (though it is possible with taskset) as it gives lower performance. The parameters in bold show the actual JVM optimizations.

On the Opteron we used:
numactl --cpunodebind=$node --membind=$node -- java -cp jbb.jar:check.jar -Xms2g -Xmx2g -Xmn1g -Xss128K -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id $x
On the Xeons we used:
java -classpath jbb.jar:check.jar -Xms2g -Xmx2g -Xmn1g -Xss128K -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id $x
Below you can find the final score reported by SPECjbb2005, which is an average of the last four runs.

SPECjbb2005


The newest Opteron does well, and performs like a 2.4GHz Clovertown. Note that it cannot outperform the old four socket (but more expensive) 880 Opteron as this platform has even more bandwidth available and runs at an almost 20% higher clock speed. Still, we can conclude that the improved memory subsystem does pay off in SPECjbb. That's a good sign for the majority of server applications, but what about the HPC world?

"Native Quad-Core" 64-bit Linux HPC Performance: LINPACK
Comments Locked

46 Comments

View All Comments

  • Phynaz - Monday, September 10, 2007 - link

    Isn't this intentionally crippling the system?
  • JohanAnandtech - Monday, September 10, 2007 - link

    No. Just check what Intel and other companies do when they submit Specjbb scores for example. With HW prefetch on, you get about 10% lower scores.
  • nj2112 - Tuesday, September 11, 2007 - link

    Was HW prefetching off for all tests ?
  • lplatypus - Monday, September 10, 2007 - link

    I thought that 2x00 series CPUs only supported one coherent hypertransport link, so would this mean that the "Dual Link" feature involving two HT links would require 8300 series CPUs?
  • mino - Tuesday, September 11, 2007 - link

    Well, maybe the changed that and all links are active (to enable setups like this) and the CPU just refuses to comunicate more than one coherent hopa away..
  • mino - Tuesday, September 11, 2007 - link

    Well, maybe the changed that and all links are active (to enable setups like this) and the CPU just refuses to comunicate more than one coherent hopa away..
  • MDme - Monday, September 10, 2007 - link

    Let the games begin!
  • Viditor - Thursday, September 13, 2007 - link

    Are you going to be re-doing the review with the shipping version (stepping BA) anytime soon?
    I'm most curious to see if the improvement of 5%+ claims are true...
  • MDme - Monday, September 10, 2007 - link

    I think Barcelona will be a success in the server world. It's performance is around 20% faster than equivalently clocked xeons with the exception of certain programs like fritz and the linpack intel library where it is around 5-10% slower. But since it scales better than the xeon chips it should negate that and increase it's lead on others as core/sockets increase. add to that it's power efficiency tweaks and aggressive pricing, AMD will be able to hold off intel in the server world.....maybe.

    With 2.5Ghz Barceys coming up that would be equivalent to around 3-3+ Ghz xeons. So AMD was right that they need to get to 2.6 Ghz....AMD needs to ramp up clock to get the highest-end performance crown, but for now, their offering offers a nice balance of performance and power efficiency for the price.

    Now time for the Phenom to get it's act together.
  • TA152H - Monday, September 10, 2007 - link

    The article should have mentioned the performance penalty Intel chips are suffering from with regards to FB-DIMMS. While it's true they should be benchmarked in servers with with memory, it's also widely rumored that they are going to be offering choices in the near future. This memory has a really big impact on a lot of benchmarks, so when looking towards the future, or desktop, it's important to keep in mind the importance of Intel using different memory. I don't think even Intel is stubborn enough to stick with this seriously slow, and power hungry memory. Maybe as a choice it's fine, but it must be clear to them that offering something else as well as FB-DIMMs is very desirable in the server space. Then again, look at how long they stuck with Rambus.

Log in

Don't have an account? Sign up now