Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
Intel Xeon 5570: Smashing SAP records (scoop!)
Intel Xeon 5570: Smashing SAP records (scoop!)
Date: December 16th, 2008
Author: Johan De Gelas
 
 

We have emphasized it more than once: the Nehalem architecture is all about regaining the performance crown in servers and HPC, desktop and mobile use were sometimes a bonus, sometimes an afterthought. Today it becomes almost painfully obvious. Just read Anand's thoughts about the Core i7:
 
"The Core i7's general purpose performance is solid, you're looking at a 5 - 10% increase in general application performance at the same clock speeds as Penryn"
and now look at the graph below.

 
Intel has apparantely allowed HP and Fujitsu-Siemens to break the NDA on the Xeon 5570 processor for PR reasons as both companies have published SAP numbers on a Dual Xeon 5570. The Xeon 5570 is based on the same architecture as the Core i7. It is a 2.93 GHz quadcore CPU with 4 times a 256 KB L2-cache and one huge shared 8 MB L3. 
 
 
SAP Sales & Distribution 2 Tier benchmark
 
The SAP numbers are absolutely astonishing, as Intel's dual socket is able to outperform quad socket opteron machines. Based on the scaling of Barcelona, we speculate that a quad Shanghai at 2.7 GHz would obtain the performance of the Dual Xeon 5570 w/o HT.The new Xeon 5570 outperforms the "old" 5450 by 119%!!!
 
These numbers are so high, that we checked and checked again. The database used is the same (SQL Server 2005), so unless there is some incredible tuning parameter that HP and FS have discovered and that we have yet to hear about, that is not it.
 
At this point we have no idea how it is possible that a 3 GHz Nehalem outperforms the latest Opteron by a margin as high as 80% and more. But we can give it a try. In a previous server oriented article, we summed up a rough profile of SAP S&D:

• Very parallel resulting in excellent scaling
• Low to medium IPC, mostly due to “branchy” code
• Not really limited by memory bandwidth
• Likes large caches
• Sensitive to Sync (“cache coherency”) latency
 
One of the biggest bottlenecks for Intel has been the sync latency. It is possible that once the "sync" bottleneck was removed, the intel architecture is able to show it's real integer crunching power thanks to the out of order loads (memory disambiguation) and better branch prediction.Those are two areas where the opteron architecture is still weak.
 
The slightly lower latency of the L3-cache of Nehalem helps too. This kind of software also makes the buffers fill up due to the long dependency chains. Those OOO buffers have been increased and the depencency chains have been shortened by a very low latency L2 cache and relatively fast L3.
 
Still we are absolutely amazed that the difference is this large. We would have expected Nehalem to outperform Shanghai by lower margins. Although we still are a bit skeptical that the difference is this large ("too good to be true" syndrome), we do not see how you could artificially inflate a SAP benchmark. It sure is not as easy as SPECJBB or SPECfp/int. 
 
 
Update (a few hours later): It seems that the SAP page was wrong about HT. It reported 8 threads on 8 cores on the Fujitsu Siemens Primergy Server. The certification page says otherwise: 16 threads on 8 cores. So hyperthreading (SMT) plays probably an important role in this benchmark as the SAP application has very low IPC and is very parallel. So this completely annihilating performance comes from combining a wide superscalar CPU with an excellent Simultaneous Multithreading implementation. Hats off to the Intel engineers...
 
 
 

29 Comments
Username:
Password:
Remember... by Trisagion, 339 days ago
If it's too good to be true, it probably is...

Reply
RE: Remember... by JohanAnandtech, 339 days ago
I agree. Still this is a certified by SAP benchmark, and one that is mostly CPU limited. I don't see how you can "cheat" on this one. It is not like you can recompile the SAP code.

Reply
RE: Remember... by Riek, 339 days ago
My guess would be that they screwed up the number of cores (dual = quad)... That would bring it down to expected gains and figures...

Altough if the performance is indeed correct... The i7 based serverchips will be the fastest cpu's in the servermarket for a very long time... And that might be a very bad thing for AMD and the microprocs industry in general.


Reply
RE: Remember... by defter, 339 days ago
That's not possible, since quad socket Nehalem will not be available until H2 2009.

Reply
RE: Remember... by Riek, 339 days ago
Since it appeared that HT was enabled i was not that wrong :')

Reply
RE: Remember... by duploxxx, 339 days ago
both systems have HT on, check the detailed scores.

to good to be thrue??? no, just obvious that HT is working fine on this SAP benchmark, count 70-80% off when you shut it down. Weather or not if that is required in real life SAP environments is yet to be shown.

Reply
RE: Remember... by JohanAnandtech, 339 days ago
Good point, I updated the blog post. Well, when I see a +100% boost over the previous generation we have to be prudent.

I don't think 70% is a result of HT. Doubling the cores gives you a 70% increase, and there is no way that HT can be as good as doubling the cores. I expect 40% to be more realistic. Still, it is incredible how a dual machine is capable of defeating a quad server which is only a few months older.

Reply
RE: Remember... by Wernte, 339 days ago
The numbers are certainly high, but I think it could be possible. Compared to Dual Opteron 8384, the new Xeon is about 67% faster clock-for-clock based on this benchmark, which isn't too out of whack considering all the changes made to eliminate the various system bottlenecks and HT, since Intel CPU itself (by that I mean capabilities of the CPU only, such as its wider execution core, etc) has always been more powerful than the AMD counterpart.

If this is indeed true, though, it'd mean that Intel will wipe out AMD from their coveted 4 and 8 socket server market even with the new Opteron based on K10.5 architecture. Very scary...

Reply
RE: Remember... by JohanAnandtech, 339 days ago
I have been a hardware journalist for 10 years now, and I never seen this. A new CPU + platform doubles the performance over a previous one without: 1) Using new instructions 2) a newer process technology 3) large jump in clockspeed or 4) running a very exotic benchmark that stresses only a very small part of the CPU.

Reply
RE: Remember... by TeXWiller, 339 days ago
Sufficient bandwidth of the POWER6 results very good scaling with SMT not only in SAP but Spec tests as well. Nehalem's increased bandwidth could be a reason for the good scaling with SMT in this case.

Reply
RE: Remember... by amazi, 338 days ago
Now both web-page and pdf shows that FSC had HT on (16 threads). So you need to correct the chart.

Reply
RE: Remember... by BSMonitor, 339 days ago
Tell that to the guy who won the $207 million lotto this past weekend....


Not surprising really. Wolfdale dual-cores were always competitive against quad-core Phenoms... Now you have removed the one thing keeping Core processors from scaling as well as K10... ie the FSB.. Especially in a highly threaded application, as the writer mentions.. Shows how data starved Penryn really was!

Reply
"Heads off ..." ? by zsdersw, 339 days ago
Don't you mean "hats off"? I don't think the Intel engineers should have their heads taken off for this stellar result :)

Reply
RE: "Heads off ..." ? by JohanAnandtech, 339 days ago
ouch. Fixed :-).

Reply
Heads off or hats off? by icrf, 339 days ago
Heads off sounds more like they're on the chopping block.

Reply
"Heads off" by wpapolis, 339 days ago
Yes, indeed, "Head's off those Intel Engineers!"

How dare they?

Bill

Reply
Core arch. starts to shine by Pablitus, 339 days ago
I think that the Core architechture stars to shine with the add of the Memory controller on die. Having memory controller outside gives you flexibility in the mainboard/chipset selection, but you pay this with latency. Now the improvements in the Nehalem (wider execution units, HT, blah blah blah) plus the Ondie memory controller gives the CPU all the bandwidth neccesary to has the CPU very busy crunching integers.

It was well documented that adding the memory controller on die to any cpu boost the performance, so i think that this record was expected by intel engineers...but not with this huge margin.

Reply
Gentlemen, here is the simple explanation by geekfool, 338 days ago
I have a friend who used to run SAP SD 2-tier benchmarks for his employer. This benchmark is well known to yield very different results depending on how the system is tuned. Here are 2 examples of 2 pretty much identical systems (dual-Opteron 2356, 32 GB RAM, exact same software version), with one achieving a 86% higher SAPS value due to different undisclosed and unreported tunings made to the 2nd system:

5730 SAPS: http://download.sap.com/download.epd?co...4234287185B5E4E64983BA79A4007AC4A4875E
10520 SAPS: http://download.sap.com/download.epd?co...8F3FA6B0C494F83E1FCBA9033CE500D893C1BB

Because of that, absolutely nothing can be extrapolated from the "surprisingly" high Xeon W5570 result... Now all the engineers working on the teams submitting the results know what are causing them to vary so much, but expect the marketing departements to claim this is all due to how their servers are so much better than their competitors'.


Reply
RE: Gentlemen, here is the simple explanation by yasinag, 338 days ago
I think HP benchmarks are well tuned both on AMD and Intel Platform.
Benchmark value of 2384 on HP servers are inline with AMD's claim (30-35% betwer than Barcelona)

Yasin

Reply
RE: Gentlemen, here is the simple explanation by geekfool, 338 days ago
I don't know man... Actually the public results indicate HP is not always doing the best job at optimizing SAP because for example their 8-processor Opteron 8360 score (26180 SAPS) is worse than the best score for a similarly configured system (Sun: 29670 SAPS).

The 30-35% diff between HP's score for Barcelona and Shanghai simply reflects they tuned the systems identically, not that they tuned them optimally (as they presumably did for the Xeon X5570).

Reply
RE: Gentlemen, here is the simple explanation by yasinag, 338 days ago
Could be true as SUN also activated Unicode (15% additional Load).

In the past HP used to publish their disk setup and always used RAID0.


Reply
RE: Gentlemen, here is the simple explanation by RadnorHarkonnen, 338 days ago
Although i do not know much about SAP benchmarks, i tend to agree with you. In 15 years working in IT, i have yet to see a speed bump on this kind. 119% is a lot of improvement. 20% was very nice already, 119% it is just too good to be true.

Several were announced in several fields, The Willamette Core was supposed to be a big bang, and advertised as it. And others of course.
Anyway, the same results could be achieved with a web server. You just need to know how to tinker.

But 119% ? The dice must be rigged anywhere in pipe. Even if they cherry picked what test they did.

Reply
RE: Gentlemen, here is the simple explanation by IntelUser2000, 338 days ago
Looks like Nehalem is about to shake the server market...

Reply
RE: Gentlemen, here is the simple explanation by IntelUser2000, 338 days ago
Here's the explanation from our experts at RealWorldTech:

http://www.realworldtech.com/forums/ind...=95155&threadid=95144&roomid=2

"Basically, there are two classes of SAP-SD 2-tier submissions - "fast" with response time around 1 second and "throughput-oriented" with response time around 1.6-2 seconds."

The difference between the two results your friend put are the response times are also ~2x the difference.

Reply
Sun Sparc by ordoequester, 338 days ago
So what
A 2 Tier T2 from Sun gets 20.900 SAPS
And thats only a 1.4 Ghz 65nm Produkt

Reply
Per Dollar? by alphadog, 338 days ago

In one way, for some business situation, cost doesn't matter. But, this doesn't mean it should be wholly ignored. SO, given Intel tendencies to overprice, can we get a pretty, shiny graph of SAPS/dollar?

Reply
hmm by stimudent, 338 days ago
This kind of sounds like one of those 'Intel-approved' articles.

Reply
RE: hmm by IntelUser2000, 336 days ago
"This kind of sounds like one of those 'Intel-approved' articles."

No it doesn't. It's merely pointing out the results that are out there. Intel is winning hands down in performance so its logical that the review sites would be drooling all over it.

There's nothing in the near term that tells AMD will bring sort of changes.

Reply
WHAT HAPPENED TO GRAPHS? by androticus, 332 days ago
What happened to the 5570 bars on the chart? As of 12/23/08 they disappeared (I remember seeing them in the original article when I viewed it.) Did anandtech get slapped under some non-disclosure of some kind? Shouldn't the article be updated or the graph yanked altogether???

Reply
Comments Page 1 of 1





AnandTech.com Blog Categories
All categories
Anand's Macdates
Anand's Theater Construction
Anand's Updates
Cases and Power Supplies
CeBIT 2008
CES 2008
Computex 2009
Derek Decanted
Eddie's Got Game
Gary's First Looks
IT Computing general
Jarred's Musings
Kris's Corner
Raja's Ramblings
Rob's Experiences...
Ryan's Ramblings
Virtualization
What's New with Wes
Blank
Blank

Blank

Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank


more Blogs Discussions



pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information