AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look

Name: AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
Item: AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
Author: Johan De Gelas

by Johan De Gelas on November 27, 2007 6:00 AM EST

Posted in
IT Computing

43 Comments | Add A Comment

43 Comments

64-bit Linux Java Performance: SPECjbb2005

If you are not familiar with SPECJbb2005, please read our introduction to it here. SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier.

We tested with two JVMs:

SUN Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)
BEA JRockit(R) (build R27.4.0-90 linux-x86_64, compiled mode)

We used the following optimizations:

Sun JVM: -Xms2g -Xmx2g -Xmn1g -Xss128K -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC
BEA JVM: -Xms1800m -Xmx1800m -Xns1500m -XXaggressive -XXlazyunlocking -Xgc:genpar -XXtlasize:min=4k,preferred=512k -XXcallprofiling

The BEA JVM uses memory more aggressively, making more use of the assigned memory. A heap size of 2GB would probably result in the JVM gobbling up too much memory, which could result in errors or poor performance on our 8GB system. That is why we lowered the JVM heap size from 2G to 1.8 G. We also applied slightly more aggressive tuning to the BEA JVM, as their customers are more interested in squeezing out the last bit of performance.

We also used four JVMs per system. The reason is that most IT departments emphasize consolidation today, and it is very rare that one JVM gets eight cores. We fully disclosed our testing methods here. However, note that you cannot compare the results below with our previous findings as we use newer JVMs now.

Below you can find the final score reported by SPECjbb2005, which is an average of the last four runs.

It took us hours, but we managed to complete one run of SPECjbb with faster 800MHz DDR. It shows with four JVMs that the memory subsystem on the Xeon systems is a clear bottleneck. It is also important to note that SPECjbb is one of the most sensitive benchmarks to memory bandwidth. If SPECjbb improves 6% when we use 800MHz ram instead of 667MHz, this is probably about the maximum boost the new Xeon will get from slightly faster 800MHz memory.

Let us see how the systems fare with the BEA JRockit JVM.

BEA uses some clever compression techniques to alleviate the pressure on the memory system. However, it is a bit funny to see how the hardware prefetching gets in the way. While it boosted performance by 14% in our zVisuel Kribi 3D, it is now slowing down the SPECjbb benchmark by 14%. When we disable it, the Xeon 5472 takes the lead.

The Xeon 3GHz looks like a sports car with his wheels deeply entrenched in the mud: the raw power keeps spinning on the lack of memory bandwidth. A 3GHz chip is hardly faster than a 2.33GHz chip, and here AMD's quad-core at 2GHz beats Xeon.

Software Rendering: zVisuel (32-bit Windows) MySQL and WinRAR

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

43 Comments

View All Comments

befair - Friday, November 28, 2008 - link
ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.

1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags

2 - Clever words: sometimes even 4 GFLOPS is described as significant performance difference

3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547

I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load!
DonPMitchell - Friday, December 7, 2007 - link
I think a lot of us are intrigued by AMD's memory architecture, its ability to support NUMA, etc. A lot of benchmarch test how fast a small application runs with a high cash-hit rate, and that's not necessarily interesting to everyone.

The MySQL test is the right direction, but I'd rather see numbers for a more sophisticated application that utilizes multiple cores -- Oracle or MS SQL Server, for example. These are products designed to run on big iron like Unisys multi-proc servers, so what happens when they are running on these more economical Harpertown or Barcelona.
kalyanakrishna - Thursday, November 29, 2007 - link
http://scalability.org/?p=453">http://scalability.org/?p=453
kalyanakrishna - Thursday, November 29, 2007 - link
a much better review than the original one. But I still see some cleverly put sentences, wish it were otherwise.
Viditor - Thursday, November 29, 2007 - link
Nice review Johan!

On the steppimgs note you made, it's not the B2 stepping that is supposed to perform better, it's the BA stepping...
The BA stepping was the improved form for B1s, and the B3 stepping is the improved form of the B2. BA and B2 came out at the same time in Sept (though BA was the one launched, B1 was what was reviewed), B2 for Phenom and performance clockspeeds, BA for standard and low power chips.
Do you happen to have a BA chip to test (those are the production chips)?
BitByBit - Wednesday, November 28, 2007 - link
Despite K10's rather extensive architectural improvements, it looks likes its core performance isn't too different to K8. In fact, the gains we've seen so far could easily be attributable to the improved memory controller and increased cache bandwidth. It seems that introducing load reordering, a dedicated stack, improved branch prediction, 32B instruction fetch, and improved prefetching has had little impact, certainly far less than expected. The question is, why?
JohanAnandtech - Wednesday, November 28, 2007 - link
Well, we are still seeing 5-10% better integer performance on applications that are runing in the L2, so it is more than just a K8 with a better IMC. But you are right, I expected more too.

However, the MySQL benchmark deserves more attention. In this case the Barcelona core is considerably faster than the previous generation (+ 25%). This might be a case where 32 bit fetch and load reordering are helping big time. But unfortunately our Codeanalyst failed to give all the numbers we needed
BaronMatrix - Wednesday, November 28, 2007 - link
At any rate, it was the most in-depth review I've seen, especially with the code analysis. I too, thought it would be higher, but remember that Barcelona is NOT HT3 and doesn't have the advantage of "gangning\unganging." There was an interesting article recently that showed perf CAN be improved by unganging (maybe it was ganging, can't find it) the HT3 links.

I really hate that OEMs decided to stand up to the big, bad AMD and DEMAND that Barcelona NOT have HT3 with ALL OF ITS BENEFITS.

I mean people complain that Barcelona uses more power, but HT3 would cut that somewhat. At least in idle mode, and even in cases where IMC is used more than the CPU or vice versa.

I also may as well use this to CONDEMN all of these "analysts" who insist on crapping on the underdog that keeps prices reasonable and technology advancing.

INSERT SEVERAL EXPLETIVES. REPEATEDLY. FOR A FEW DAYS. A WEEK. FOR A YEAR.

INSERT MORE EXPLETIVES.
donaldrumsfeld - Wednesday, November 28, 2007 - link
Conjecture regarding why AMD went quad core on the same die... and this has nothing to do with performance. I think one place where Intel is way ahead of AMD is package technology. Remember they were doing a type of Multichip module with the P6. Having 2 dice instead of a single die allows them to have an overall lower defect rate, higher yield, and higher GHz. This is vs. AMD's lower GHz but (it was hoped) greater data efficiency using an L3 die and lower latency of on-die communications amongst cores vs. Intel's solution of die to die communication.

Can anyone confirm/deny this?

thanks
tshen83 - Tuesday, November 27, 2007 - link
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.

The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I have trouble locating Phenom 9500s.

AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look

Post Your Comment

43 Comments

View All Comments

befair - Friday, November 28, 2008 - link

DonPMitchell - Friday, December 7, 2007 - link

kalyanakrishna - Thursday, November 29, 2007 - link

kalyanakrishna - Thursday, November 29, 2007 - link

Viditor - Thursday, November 29, 2007 - link

BitByBit - Wednesday, November 28, 2007 - link

JohanAnandtech - Wednesday, November 28, 2007 - link

BaronMatrix - Wednesday, November 28, 2007 - link

donaldrumsfeld - Wednesday, November 28, 2007 - link

tshen83 - Tuesday, November 27, 2007 - link

Log in

Don't have an account? Sign up now