We have emphasized it more than once: the Nehalem architecture is all about regaining the performance crown in servers and HPC, desktop and mobile use were sometimes a bonus, sometimes an afterthought. Today it becomes almost painfully obvious. Just read Anand's thoughts about the Core i7:
"The Core i7's general purpose performance is solid, you're looking at a 5 - 10% increase in general application performance at the same clock speeds as Penryn"
and now look at the graph below.

Intel has apparantely allowed HP and Fujitsu-Siemens to break the NDA on the Xeon 5570 processor for PR reasons as both companies have published SAP numbers on a Dual Xeon 5570. The Xeon 5570 is based on the same architecture as the Core i7. It is a 2.93 GHz quadcore CPU with 4 times a 256 KB L2-cache and one huge shared 8 MB L3. 
SAP Sales & Distribution 2 Tier benchmark
The SAP numbers are absolutely astonishing, as Intel's dual socket is able to outperform quad socket opteron machines. Based on the scaling of Barcelona, we speculate that a quad Shanghai at 2.7 GHz would obtain the performance of the Dual Xeon 5570 w/o HT.The new Xeon 5570 outperforms the "old" 5450 by 119%!!!
These numbers are so high, that we checked and checked again. The database used is the same (SQL Server 2005), so unless there is some incredible tuning parameter that HP and FS have discovered and that we have yet to hear about, that is not it.
At this point we have no idea how it is possible that a 3 GHz Nehalem outperforms the latest Opteron by a margin as high as 80% and more. But we can give it a try. In a previous server oriented article, we summed up a rough profile of SAP S&D:

• Very parallel resulting in excellent scaling
• Low to medium IPC, mostly due to “branchy” code
• Not really limited by memory bandwidth
• Likes large caches
• Sensitive to Sync (“cache coherency”) latency
One of the biggest bottlenecks for Intel has been the sync latency. It is possible that once the "sync" bottleneck was removed, the intel architecture is able to show it's real integer crunching power thanks to the out of order loads (memory disambiguation) and better branch prediction.Those are two areas where the opteron architecture is still weak.
The slightly lower latency of the L3-cache of Nehalem helps too. This kind of software also makes the buffers fill up due to the long dependency chains. Those OOO buffers have been increased and the depencency chains have been shortened by a very low latency L2 cache and relatively fast L3.
Still we are absolutely amazed that the difference is this large. We would have expected Nehalem to outperform Shanghai by lower margins. Although we still are a bit skeptical that the difference is this large ("too good to be true" syndrome), we do not see how you could artificially inflate a SAP benchmark. It sure is not as easy as SPECJBB or SPECfp/int. 
Update (a few hours later): It seems that the SAP page was wrong about HT. It reported 8 threads on 8 cores on the Fujitsu Siemens Primergy Server. The certification page says otherwise: 16 threads on 8 cores. So hyperthreading (SMT) plays probably an important role in this benchmark as the SAP application has very low IPC and is very parallel. So this completely annihilating performance comes from combining a wide superscalar CPU with an excellent Simultaneous Multithreading implementation. Hats off to the Intel engineers...
Comments Locked


View All Comments

  • androticus - Tuesday, December 23, 2008 - link

    What happened to the 5570 bars on the chart? As of 12/23/08 they disappeared (I remember seeing them in the original article when I viewed it.) Did anandtech get slapped under some non-disclosure of some kind? Shouldn't the article be updated or the graph yanked altogether???
  • stimudent - Wednesday, December 17, 2008 - link

    This kind of sounds like one of those 'Intel-approved' articles.
  • IntelUser2000 - Friday, December 19, 2008 - link

    "This kind of sounds like one of those 'Intel-approved' articles."

    No it doesn't. It's merely pointing out the results that are out there. Intel is winning hands down in performance so its logical that the review sites would be drooling all over it.

    There's nothing in the near term that tells AMD will bring sort of changes.
  • alphadog - Wednesday, December 17, 2008 - link

    In one way, for some business situation, cost doesn't matter. But, this doesn't mean it should be wholly ignored. SO, given Intel tendencies to overprice, can we get a pretty, shiny graph of SAPS/dollar?
  • ordoequester - Wednesday, December 17, 2008 - link

    So what
    A 2 Tier T2 from Sun gets 20.900 SAPS
    And thats only a 1.4 Ghz 65nm Produkt
  • IntelUser2000 - Wednesday, December 17, 2008 - link

    Here's the explanation from our experts at RealWorldTech:


    "Basically, there are two classes of SAP-SD 2-tier submissions - "fast" with response time around 1 second and "throughput-oriented" with response time around 1.6-2 seconds."

    The difference between the two results your friend put are the response times are also ~2x the difference.
  • yasinag - Wednesday, December 17, 2008 - link

    Could be true as SUN also activated Unicode (15% additional Load).

    In the past HP used to publish their disk setup and always used RAID0.
  • RadnorHarkonnen - Wednesday, December 17, 2008 - link

    Although i do not know much about SAP benchmarks, i tend to agree with you. In 15 years working in IT, i have yet to see a speed bump on this kind. 119% is a lot of improvement. 20% was very nice already, 119% it is just too good to be true.

    Several were announced in several fields, The Willamette Core was supposed to be a big bang, and advertised as it. And others of course.
    Anyway, the same results could be achieved with a web server. You just need to know how to tinker.

    But 119% ? The dice must be rigged anywhere in pipe. Even if they cherry picked what test they did.
  • IntelUser2000 - Wednesday, December 17, 2008 - link

    Looks like Nehalem is about to shake the server market...
  • yasinag - Wednesday, December 17, 2008 - link

    I think HP benchmarks are well tuned both on AMD and Intel Platform.
    Benchmark value of 2384 on HP servers are inline with AMD's claim (30-35% betwer than Barcelona)


Log in

Don't have an account? Sign up now