ERP benchmark 1: SAP SD

The SAP SD (sales and distribution, 2-tier internet configuration) benchmark is an interesting benchmark as it is a real world client-server application. We decided to take a look at SAP's benchmark database. The results below are 2-tier benchmarks, so the database and the underlying OS can make a difference. It is best to keep those parameters the same, although the type of database (Oracle, MS SQL server, MaxDb or DB2) only makes a small difference. The results below all run on Windows 2003 Enterprise Edition and MS SQL Server 2005 database (both 64-bit). Every "2-tier Sales & Distribution" benchmark was performed on SAP's "ERP release 2005".

In our previous server oriented article, we summed up a rough profile of SAP S&D:

  • Very parallel resulting in excellent scaling
  • Low to medium IPC, mostly due to "branchy" code
  • Somewhat limited by memory bandwidth
  • Likes large caches (memory latency!)
  • Very sensitive to sync ("cache coherency") latency
SAP Sales & Distribution 2 Tier benchmark
(*) Estimate based on Intel's internal testing

If you focus on the cores only, the differences between the Xeon 55xx "Nehalem" and the previous generation Xeon 54xx "Harpertown" and Xeon 53xx "Clovertown" is relatively small. The enormous differences in SAP scores are solely a result of Hyper-Threading, the "uncore", and the NUMA platform. According to SAP benchmark specialist Tuan Bui (Intel), enabling Hyper-Threading accounts for a 31% performance boost. Using somewhat higher clocked DDR3 (1066 instead of 800 or 1333 instead of 1066) is good for another 2-3%. Enabling the prefetcher provides another 3% and the Turbo mode increased performance by almost 5%. As this SAP benchmark scales almost perfectly with clock speed, that means that the Xeon X5570 2.93GHz was in fact running at 3.07GHz on average.

Consider the following facts:

  • The quad-core AMD Opteron 8384 at 2.7GHz has no problem beating the higher clocked 5470 at 3.3GHz.
  • It is well known that the Xeon 54xx raw integer power is a lot higher than any of the Opterons (just take a look at SPECint2006).
  • Faster memory and thus bandwidth plays only a minor role in the SAP benchmark.
  • SAP threads share a lot of data (as is typical for these kind of database driven applications).

It is clear that synchronization (between L2 caches) that happens in the L3 cache, the fast inter-CPU synchronization that happens via dedicated interconnects, is what made the "native quad-cores" of AMD winners in this benchmark. Slow cache synchronization is probably the main reason why the integer crunching power hidden deep inside the "Harpertown" cores did not result in better performance.

Take the same (slightly improved) core and give it the right (L3 as quick syncing point for the L2s) cache architecture and NUMA platform with fast CPU interconnects and all that integer power is unleashed. The result is the Nehalem X5570 Xeon is clock for clock about 66% faster than its predecessor (19000 vs. 11420). Add SMT (Simultaneous Multi-Threading) and you allow the integer core to process a second thread when it is bogged down by one of those pesky branches. The last hurdle for supreme SAP performance is taken: The eight core "Nehalem" server is just as fast as a 24 core "Dunnington" and 80% faster than the competition.

AMD has just launched the Opteron 2389 at 2.9GHz. We estimate that this will bring AMD's best SAP score to about 14800, so Nehalem's advantage will be lowered to ~70%. Unfortunately for AMD, that is still a very large advantage!

Benchmark Configuration OLTP - Dell DVD Store
Comments Locked

44 Comments

View All Comments

  • gwolfman - Tuesday, March 31, 2009 - link

    Why was this article pulled yesterday after it first posted?
  • JohanAnandtech - Tuesday, March 31, 2009 - link

    Because the NDA date was noon in the pacific zone and not CET. We were slightly too early...
  • yasbane - Tuesday, March 31, 2009 - link

    Hi Johan,

    Any chance of some more comprehensive Linux benchmarks? Haven't seen any on IT Anandtech for a while.

    cheers
  • JohanAnandtech - Tuesday, March 31, 2009 - link

    Yes, we are working on that. Our first Oracle testing is finished on the AMD's platform, but still working on the rest.

    Mind you, all our articles so far have included Linux benchmarking. All mysql testing for example, Stream, Specjbb and Linpack.
  • Exar3342 - Monday, March 30, 2009 - link

    Thanks for the extremely informative and interesting review Johan. I am definitely looking forward to more server reviews; are the 4-way CPUs out later this year? That will be interesting as well.
  • Exar3342 - Monday, March 30, 2009 - link

    Forgot to mention that I was suprised HT has such an impact that it did in some of the benches. It made some huge differences in certain applications, and slightly hindered it in others. Overall, I can see why Intel wanted to bring back SMT for the Nehalem architecture.
  • duploxxx - Monday, March 30, 2009 - link

    awesome performance, but would like to see how the intel 5510-20-30 fare against the amd 2378-80-82 after all that is the same price range.

    It was the same with woodcrest and conroe launch, everybody saw huge performance lead but then only bought the very slow versions.... then the question is what is still the best value performance/price/power.

    Istanbul better come faster for amd, how it looks now with decent 45nm power consumption it will be able to bring some battle to high-end 55xx versions.
  • eryco - Tuesday, April 14, 2009 - link

    Very informative article... I would also be interested in seeing how any of the midrange 5520/30 Xeons compare to the 2382/84 Opterons. Especially now that some vendors are giving discounts on the AMD-based servers, the premium for a server with X5550/60/70s is even bigger. It would be interesting to see how the performance scales for the Nehalem Xeons, and how it compares to Shanghai Opterons in the same price range. We're looking to acquire some new servers and we can afford 2P systems with 2384s, but on the Intel side we can only go as far as E5530s. Unfortunately there's no performance data for Xeons in the midrange anywhere online so we can make a comparison.
  • haplo602 - Monday, March 30, 2009 - link

    I only skimmed the graphs, but how about some consistency ? some of the graphs feature only dual core opterons, some have a mix of dual and quad core ... pricing chart also features only dual core opterons ...

    looking just at the graphs, I cannot make any conclusion ...
  • TA152H - Monday, March 30, 2009 - link

    Part of the problem with the 54xx CPUs is not the CPUs themselves, but the FB-DIMMS. Part of the big improvement for the Nehalem in the server world is because Intel sodomized their 54xx platform, for reasons that escape most people, with the FB-DIMMs. But, it's really not mentioned except with regards to power. If the IMC (which is not an AMD innovation by the way, it's been done many times before they did it, even on the x86 by NexGen, a company they later bought) is so important, then surely the FB-DIMMs are. They both are related to the same issue - memory latency.

    It's not really important though, since that's what you'd get if you bought the Intel 54xx; it's more of an academic complaint. But, I'd like to see the Nehalem tested with dual channel memory, which is a real issue. The reason being, it has lower latency while only using two channels, and for some benchmarks, certainly not all or even the majority, you might see better performance by using two (or maybe it never happens). If you're running a specific application that runs better using dual channel, it would be good to know.

    Overall, though, a very good article. The first thing I mention is a nitpick, the second may not even matter if three channel performance is always better.

Log in

Don't have an account? Sign up now