SPECjbb2005

SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a possible disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections, rather than a separate database. The SPECjbb score thus depends on:
  • The JVM (Java Virtual Machine) and the way the JVM is tuned
  • CPU processing power
  • Caching and memory speed
  • Multiprocessing configuration (Scalability)
The latest version SPECjbb2005 is much more memory intensive and uses XML processing among other changes. From spec.org:
"SPECjbb2005 is a follow-on release to SPECjbb2000, which was inspired by the TPC-C benchmark and loosely follows the TPC-C specification for its schema, input generation, and transaction profile. SPECjbb2005 runs in a single JVM in which threads represent terminals, where each thread independently generates random input before calling transaction specific logic. There is neither network nor disk IO in SPECjbb2005."
SPECjbb starts up to two threads per core. For example, with Hyper-Threading enabled on our 8 core quad CPU Xeon MP 7030M system, 32 threads were started on the 16 logical CPUs. Each thread is a warehouse. Again from SPEC.org:
"A warehouse is a unit of stored data. It contains roughly 25MB of data stored in many objects in several Collections (HashMaps, TreeMaps). A thread represents an active user posting transaction requests within a warehouse. There is a one-to-one mapping between warehouses and threads, plus a few threads for SPECjbb2005 main and various JVM functions. As the number of warehouses increases during the full benchmark run, so does the number of threads. A "point" represents the throughput during the measurement interval at a given number of warehouses. A full benchmark run consists of a sequence of measurement points with an increasing number of warehouses (and thus an increasing number of threads)"
First we tested with some decent but rather generic tuning that we could use on all systems. The JVM was Sun's, version 1.5.0_08.
java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props

Performance on the quad Opteron machine is absolutely horrible: the dual Xeon DP 5160 is only a few percent slower than our quad Opteron. As SPECjbb is very memory sensitive we suspected that the NUMA architecture of the Opteron might be influencing the result. The scaling numbers confirmed our assumption: the dual Opteron scored only 48% lower, while we expect a 70% increase from 2 extra cores.

In many cases you would like to run several Java applications on one server with or without virtualization, especially on quad socket machines. Therefore we also tested SPECjbb with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to one CPU node on the HP DL585.

On the Opteron we used:
numactl -cpubind=(1-4) -membind=(1-4) java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1-4)
On the Xeon MP we used:
java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1 to 4)

If we let Linux manage the four instances, performance increases about 16% compared to using one instance. If we force each instance to stay on one node (one CPU + memory), performance increases spectacularly by 56%! So it seems that it is rather hard for the Linux kernel to keep the instances where they should be. This is good and bad news for AMD: it means that the Opteron 880 can compete with the more expensive Xeon MP, but it also means that the Opteron requires more "manual" optimization than the Xeon MP. The Xeon MP performs at the same level with 4 instances as it does with one.

We suspect that the Sun JVM is reasonably well optimized for the Opteron, and maybe a little bit less effort went into the Intel optimizations as Sun features mostly Opteron and Sparc servers. The BEA JRockit JDK provides a highly optimized JVM for running JAVA applications on the x86-64 and Itanium CPUs. We are still in the process of testing with this JVM, but it seems that the HP DL585 is capable of attaining 110,000 bops, the Supermicro Dual Xeon 5160 about 70 to 75,000 bops and the Tulsa system about 140,000 bops so far. We are trying to find out which tuning parameters are realistic and which ones are maybe a little too extreme. We'll report back soon with our findings, as we have another new server CPU to show you in the near future.

The Official SPEC Numbers Secure Socket Layers RSA Performance
Comments Locked

88 Comments

View All Comments

  • JohanAnandtech - Saturday, November 11, 2006 - link

    Well, we did mentione it at our price comparison. From a performance point of view, the G2 is within 2% of the DL585 given a similar configuration.

    Getting a server in the lab is not like getting a videochip for review. The machines are much more expensive, and you need much more time to review them properly. So OEMs are less likely to send you the necessary hardware. For a videocard they send out a $500 item that can be reviewed in a few weeks, maybe even a few days. For Server like these, they have to send out a $20000 machine and be able to miss it for a month or two at the least.
  • Viditor - Saturday, November 11, 2006 - link

    quote:

    Well, we did mentione it at our price comparison. From a performance point of view, the G2 is within 2% of the DL585 given a similar configuration


    I can certainly understand and empathise with the situation...and I did enjoy the article, Johan!
    The reason I mentioned it is that line in your conclusion...
    quote:

    The HP DL585 also has a few shortcomings: it does not offer any PCIe expansion slots, the SCSI controller is an old SCSI 160 model, and there are no USB ports on the front of the machine

    I thought that (considering the circumstances) it was a bit unfair and misleading...
  • JohanAnandtech - Saturday, November 11, 2006 - link

    I just pointed out that it is a bit weird that a newer revision of the DL585 (it was thé HP Opteron machine just a few months ago) used SCSI 160. There is no reason at all why HP could not replace this: they revised the server anyway.

    I should mentioned that these results were solved in the G2, but still it is a missed chance... eventhough I reported it a bit too late :-)
  • photoguy99 - Friday, November 10, 2006 - link

    yes, bring it on!
  • finalfan - Friday, November 10, 2006 - link

    On page The Official SPEC Numbers, in second table SPEC FP 2000 Performance, the positions of (4/8) HP Opteron AM2 and (8/8) Hitachi Itanium 2 should be switched. No Itanium runs at 3.4G and no way a 4way 1.6G AM2 can sit in second place.
  • JohanAnandtech - Friday, November 10, 2006 - link

    Corrected. It is weird, the accurate numbers were in the orginal document. The generation of the table went wrong. I have double checked and now the FP numbers should all be accurate
  • JarredWalton - Friday, November 10, 2006 - link

    Probably my fault. I think when it got put into Excel that the various x/y numbers were converted to dates. I thought I fixed all of those, but probably missed one or two. Sorry.
  • icarus4586 - Friday, November 10, 2006 - link

    quote:

    There has been a relentless assault without any mercy on the Server CPU market...


    This report brought to you by the department of redundancy department.
  • bwmccann - Friday, November 10, 2006 - link

    When are you guys going to start benchmarking server CPUs using applications that are widely used in organizations on a daily basis?

    Most companies have a very high percentage of servers running Windows. With that I would love to see some test on SQL, Oracle, Exchange, and other core components of enterprises today.

    Also it would be nice to see a closer comparison of the servers. For example you tested a DL585. A DL580 (Intel Woodcrest) would have been better suited since some of the components would be the same.
  • JohanAnandtech - Friday, November 10, 2006 - link

    http://www.anandtech.com/IT/showdoc.aspx?i=2793">http://www.anandtech.com/IT/showdoc.aspx?i=2793

    Most of the time Jason does the Windows benchmarking, me and my team do the Linux benchmarking.

    Java, MySQL and SSL are also core components of many enterprise apps.


    We are working on Oracle and got access to a realworld Oracle database a few weeks ago (for the first time), but it takes time to really understand what your benchmark is telling you and how you must configure your db. And Oracle is ...very stubborn, even patching to a slightly higher version can lead to big trouble.

    The DL585 is a direct competitor (quad socket) in this space, more so than the DL580 (DUal Socket)



Log in

Don't have an account? Sign up now