Intel's newest Quad Xeon MP versus HP's DL585 Quad Opteron
by Johan De Gelas on November 10, 2006 12:00 PM EST- Posted in
- IT Computing
SPECjbb2005
SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a possible disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections, rather than a separate database. The SPECjbb score thus depends on:
Performance on the quad Opteron machine is absolutely horrible: the dual Xeon DP 5160 is only a few percent slower than our quad Opteron. As SPECjbb is very memory sensitive we suspected that the NUMA architecture of the Opteron might be influencing the result. The scaling numbers confirmed our assumption: the dual Opteron scored only 48% lower, while we expect a 70% increase from 2 extra cores.
In many cases you would like to run several Java applications on one server with or without virtualization, especially on quad socket machines. Therefore we also tested SPECjbb with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to one CPU node on the HP DL585.
On the Opteron we used:
If we let Linux manage the four instances, performance increases about 16% compared to using one instance. If we force each instance to stay on one node (one CPU + memory), performance increases spectacularly by 56%! So it seems that it is rather hard for the Linux kernel to keep the instances where they should be. This is good and bad news for AMD: it means that the Opteron 880 can compete with the more expensive Xeon MP, but it also means that the Opteron requires more "manual" optimization than the Xeon MP. The Xeon MP performs at the same level with 4 instances as it does with one.
We suspect that the Sun JVM is reasonably well optimized for the Opteron, and maybe a little bit less effort went into the Intel optimizations as Sun features mostly Opteron and Sparc servers. The BEA JRockit JDK provides a highly optimized JVM for running JAVA applications on the x86-64 and Itanium CPUs. We are still in the process of testing with this JVM, but it seems that the HP DL585 is capable of attaining 110,000 bops, the Supermicro Dual Xeon 5160 about 70 to 75,000 bops and the Tulsa system about 140,000 bops so far. We are trying to find out which tuning parameters are realistic and which ones are maybe a little too extreme. We'll report back soon with our findings, as we have another new server CPU to show you in the near future.
SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a possible disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections, rather than a separate database. The SPECjbb score thus depends on:
- The JVM (Java Virtual Machine) and the way the JVM is tuned
- CPU processing power
- Caching and memory speed
- Multiprocessing configuration (Scalability)
"SPECjbb2005 is a follow-on release to SPECjbb2000, which was inspired by the TPC-C benchmark and loosely follows the TPC-C specification for its schema, input generation, and transaction profile. SPECjbb2005 runs in a single JVM in which threads represent terminals, where each thread independently generates random input before calling transaction specific logic. There is neither network nor disk IO in SPECjbb2005."SPECjbb starts up to two threads per core. For example, with Hyper-Threading enabled on our 8 core quad CPU Xeon MP 7030M system, 32 threads were started on the 16 logical CPUs. Each thread is a warehouse. Again from SPEC.org:
"A warehouse is a unit of stored data. It contains roughly 25MB of data stored in many objects in several Collections (HashMaps, TreeMaps). A thread represents an active user posting transaction requests within a warehouse. There is a one-to-one mapping between warehouses and threads, plus a few threads for SPECjbb2005 main and various JVM functions. As the number of warehouses increases during the full benchmark run, so does the number of threads. A "point" represents the throughput during the measurement interval at a given number of warehouses. A full benchmark run consists of a sequence of measurement points with an increasing number of warehouses (and thus an increasing number of threads)"First we tested with some decent but rather generic tuning that we could use on all systems. The JVM was Sun's, version 1.5.0_08.
java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props
Performance on the quad Opteron machine is absolutely horrible: the dual Xeon DP 5160 is only a few percent slower than our quad Opteron. As SPECjbb is very memory sensitive we suspected that the NUMA architecture of the Opteron might be influencing the result. The scaling numbers confirmed our assumption: the dual Opteron scored only 48% lower, while we expect a 70% increase from 2 extra cores.
In many cases you would like to run several Java applications on one server with or without virtualization, especially on quad socket machines. Therefore we also tested SPECjbb with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to one CPU node on the HP DL585.
On the Opteron we used:
numactl -cpubind=(1-4) -membind=(1-4) java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1-4)
On the Xeon MP we used:java -classpath jbb.jar:check.jar -Xms3072m -Xmx3072m -Xmn1024m -Xss128k -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id (1 to 4)
If we let Linux manage the four instances, performance increases about 16% compared to using one instance. If we force each instance to stay on one node (one CPU + memory), performance increases spectacularly by 56%! So it seems that it is rather hard for the Linux kernel to keep the instances where they should be. This is good and bad news for AMD: it means that the Opteron 880 can compete with the more expensive Xeon MP, but it also means that the Opteron requires more "manual" optimization than the Xeon MP. The Xeon MP performs at the same level with 4 instances as it does with one.
We suspect that the Sun JVM is reasonably well optimized for the Opteron, and maybe a little bit less effort went into the Intel optimizations as Sun features mostly Opteron and Sparc servers. The BEA JRockit JDK provides a highly optimized JVM for running JAVA applications on the x86-64 and Itanium CPUs. We are still in the process of testing with this JVM, but it seems that the HP DL585 is capable of attaining 110,000 bops, the Supermicro Dual Xeon 5160 about 70 to 75,000 bops and the Tulsa system about 140,000 bops so far. We are trying to find out which tuning parameters are realistic and which ones are maybe a little too extreme. We'll report back soon with our findings, as we have another new server CPU to show you in the near future.
88 Comments
View All Comments
Niv KA - Saturday, November 11, 2006 - link
I belive Clovertown is going to be announced somethime in the next week or two. On thursday I went to the "Microsoft: Ready for a New Day" here in Belgium (where Bill gates made an appearance of about half an hour, although not related!) and at the Intel booth they were showing off 4 servers which where running an "unannounced platform"! One of the technical guys at the booth let me in on a little "secret"! The Supermicro Systems were running "two sockets each box, each socket 4 cores! Eight cores each box! And the best part is its woodcrest arch!". I asked him if it was clovertown and he sayed that he "is just a technical assistant, not alowed to say anything" but he made the answer clear on his face! Clovertown is ready to go, and its FAST! They were running benchmarks all the time! I will post pictures on the fourms if I have enough time, but I have a HUGE project I need to hand in by tuesday so I might forget!
---Niv K Aharonovich
PS: About the "outdated" system comments above, I am fully on Anandtechs side, it is impossible for an online newspaper company to make enough money to BUY everything, esp. in the $15,000 area! The only way is to ask for it from the vendors, and the vendors decide what to provide! Good job anandtech and continue the good work!!!!!!!
Dennis Travis - Saturday, November 11, 2006 - link
Great job as usuall. Keep up the excellent work.AnandThenMan - Friday, November 10, 2006 - link
Another bullshit "comparison" nice job guys. You are comparing an AMD system that has been out for over 2 years. Useless review as usual. Why are you not comparing new with new? Why don't you use a Xeon box that was out 2 years ago?Anandtech's reviews have become more and more worthless.
JohanAnandtech - Saturday, November 11, 2006 - link
1. AMD has confirmed that they feel the HP DL585 with 4x 880 is a worthy competitor for our Tulsa machine.2. This server is 5 months old, not 2 years. As I made clear in the article, this is the 2006 revision.
As we invest a lot of time of effort to convince OEMs and others to send us extremely expensive hardware for review, spend weeks tweaking benchmarks and OS to give you benchmarks, we hope we may expect some useful feedback from our readers.
Just writing "useless" with little or no explanation why you feel it is worthless is not helping anyone.
AnandThenMan - Sunday, November 12, 2006 - link
I was going to post an explanation as to why the "review" is very poorly done. But Scientia over at AMDz did a far better explanation then I could come up with.http://www.amdzone.com/index.php?name=PNphpBB2&...">http://www.amdzone.com/index.php?name=PNphpBB2&...
Either the review is intentionally authored to show Intel in as best light as possible, or the author is incompetent and should not be doing reviews at all. I stand by what I originally posted, the review is bullshit.
primer - Saturday, November 11, 2006 - link
Agreed.goldfish2 - Friday, November 10, 2006 - link
Can I just quickly mention how nice it is to read an article where the author has managed to present all the relevant informatiom in as concise a manner as is possible, good job.JohanAnandtech - Saturday, November 11, 2006 - link
Thanks!Server reviews are extremely time consuming so most publications are not interested in it, so I am glad AT allows me to do this kind of reviews.
AllYourBaseAreBelong2Us - Friday, November 10, 2006 - link
Can you guys get a new DL585 G2 and do benchmarks with this new model instead?Viditor - Friday, November 10, 2006 - link
I thought this too...the G2 has 7 PCIe slots (3 x8, 4 x4), is $800 less expensive, and offers newer SCSI controllers.