ERP: SAP Sales & Distribution

Enterprise Resource Planning software is one type of very complex database application. Studies have shown that the performance profile of these applications can be significantly different from that of the underlying database. So we decided to take a look at SAP's benchmark database, to see if we can extract some extra benchmark information to complete our view of the quad core Xeon.

The results below are two tier benchmarks, so the database and the underlying OS can make a big difference. Unless we keep those parameters the same, we cannot compare the results. As the only results for the quad core Xeon which are available have been run on Windows 2003 Enterprise Edition and MS SQL Server 2005 (both 64 bit), we filtered the results to find systems that were run on the same OS and database. With the exception of the Xeon 5160, all systems are equipped with 32GB of RAM. All these benchmarks are done on the SAP "ERP release 2005" two tier Sales & Distribution benchmark.

SAP ERP Release 2005
Windows 2003 EE
CPU Cores CPU Type CPU Speed (MHz) Response Time (s) SAPS Central Server
4 8 Intel Xeon 7041 3000 1.97 5630 Hitachi HA8000 Model 270
2 4 Intel Xeon 5160 3000 1.71 5020 Hitachi HA8000 Model 130
2 8 Quad-Core Intel Xeon Processor X5355 2660 1.97 8770 FS PRIMERGY Model TX300 S3 / RX300 S3
2 8 Quad-Core Intel Xeon Processor X5355 2660 1.98 8970 HP ProLiant DL380 G5

Unfortunately we have no comparison with an Opteron system. We can solve that by keeping every parameter the same, but now we take a look at the benchmarking that happened on SAP release 2004.

SAP ERP Release 2004
Windows 2003 EE
Number of processors Number of cores CPU Type CPU Speed (MHz) Response Time (s) SAPS Central Server
4 8 Intel XEON 7140M 3400 1.99 10650 HP ProLiant DL580 G4
4 8 AMD Opteron processor Model 8220SE 2800 1.97 9920 HP ProLiant DL585 G2
4 8 Intel XEON 7140M 3400 1.97 9850 Dell PowerEdge 6850
4 8 AMD Opteron processor Model 8218 2600 1.98 9570 HP ProLiant BL45p G2
4 8 AMD Opteron processor Model 885 2600 1.97 8520 Sun Blade x8400
4 8 AMD Opteron processor Model 880 2400 1.96 7520 FS PRIMERGY Model BX630
4 8 AMD Opteron processor Model 875 2200 1.84 7020 FS PRIMERGY Model BFa40
2 4 Intel XEON 5160 3000 1.98 5780 FS PRIMERGY Model RX200 S3
2 4 AMD Opteron processor Model 880 2400 1.87 4400 FS PRIMERGY Model BX630

If you go to SAP's two tier benchmark results page, you will notice that the performance differences between similar systems benchmarked on release 2004 and 2005 are minor. The reason why the difference between the Xeon 5160 servers in our tables is about 15% is that the first benchmark resulted in a 1.71 response time and the second had a response time of 1.98. Given a similar response time we can be pretty sure that the results would be very similar. To summarize, if we keep all parameters the same, the benchmark results of the first table should be comparable to the results of the second table. So, while it is not an exact science, a dual quad core Xeon at 2.66GHz should be about 55% faster than a dual Xeon 5160. It is also a bit slower than a quad socket Opteron 2.6GHz system. SAP scales very well with additional cores; based on our assumptions we may conclude that both the Xeon and the Opteron system improve by about 70% when moving from four to eight cores.

One other item warrants mention: the Xeon MP "Tulsa" seems to outperform its dual socket sibling by a small margin, and confirms the good integer performance profile than we noticed in Specjbb 2005.

MySQL Performance Render Servers
Comments Locked

15 Comments

View All Comments

  • Antinomy - Wednesday, March 7, 2007 - link

    A great review, very interesting.
    But there are a few things to mention. A mistake in results of Cinebench test. In the overall table the uni Clovertown system got 1272 points, but in the next (per core performance) - 1169. The result was swapped with the one of Xeon 7130. And a comment about the scalability extrapolation. The result of scalability 2.33 Clover vs 3.0 Dual Woodcrest can be hardly compared due to different organization of the systems. These MoBo have two independent FSB so this means, that the two Woodcrests will be provided with twice more peak memory bandwith. This can't make no influence on the result. Also the 4 channel memory mode provides a 5% increase versus 2 channel in real bandwith, so we can't say that theese applications do not suffer from lack of memory bandwith.
    It would be interesting to provide a test of uni Woodcrest system and a test of system based on Woodcrest (both uni and dual) at the same frequency as Clovertown has. And a Kentsfield\Conroe systems (despite they aren't server ones) would be nice to look at because of their more efficient usage of memory bandwith and FSB.
  • afuruhed - Thursday, December 28, 2006 - link

    We are getting more Clovertowns. There is a chart at http://www.pantor.com/software.html">pantor.com that indicates that some applications benefit a lot. http://en.wikipedia.org/wiki/FIX_protocol">The FIX protocol is a technical specification for electronic communication of trade-related messages (financial markets).
  • henriks - Thursday, December 28, 2006 - link

    Agree with other responses - good article!

    Some comments on the jbb results page:

    You state that JRockit is (only) available for x86-64 and Itanium. x86 and Sparc should be added to this list.

    The JRockit configuration you're using enables a single-spaced GC. In that configuration, performance is tied to heap size (larger heap means fewer GC events). Increasing the heap size to 3 GB - as for the Sun benchmark results - would increase performance slightly but in particular give much better scalability when you increase the number of warehouses to large numbers.

    It looks like you have not enabled large pages in the OS. Doing this would give a large performance boost and help scalability regardless of chip or JVM vendor.

    Astute readers may note that your results are lower than the published results on www.spec.org. Apart from OS and possibly BIOS tuning, the reason is that the most recent results are using a newer JRockit version (not yet available for public download). This new version improves performance on this benchmark by 20-30% on x86 chips - Intel *and* AMD - with the largest positive effect on high-bin chips from the respective vendors. The effect on other Java applications vary from zero to a lot.

    Cheers!

    Henrik, JRockit team
  • dropadrop - Wednesday, December 27, 2006 - link

    Considering how much we just payed for some DL585's compared to DL380's I think the performance is pretty impressive. There is still something the DL380's (and most other two socket servers) can't do, and that is hosting 64GB or more ram.

    I mainly take care of vmware servers, and there the amount of memory becomes a bottleneck long before the processors, atleast in most setups. I don't think I'd have alot of use for octal processors unless I got a minimum of 32GB of ram, probably 64.
  • rowcroft - Thursday, December 28, 2006 - link

    I've run into the same challenge when planning for the quads. My take is that I'm getting dual quads for half the price of quad dual cores. With ESX 3's HA functionality I can group the host servers and get the 32GB of ram with double the cores and have host based redundancy for critical vm's.
  • mino - Thursday, December 28, 2006 - link

    there is another thing DL380 lacks: no drop-in analog to Barcelona on the horizon...
  • Justin Case - Wednesday, December 27, 2006 - link

    Finally a good article at AT, written by someone who knows what he's talking about. Meaningful benchmarks, meaningful comments, and conclusions that make sense. If only some Johanness could rub off on other AT writers...
  • hans007 - Wednesday, December 27, 2006 - link

    i think an alternative to say a dual dual core AMD though even as a server or workstation is say a quad core socket 775 cpu. I know the lower 3xxx series xeons are made for this (and are exactly the same as core 2 duo) so

    you could do a comparison of 2 amd dual cores vs a single 775 quad with ECC ddr2 etc.
  • mino - Thursday, December 28, 2006 - link

    Check QuadFX vs. Kentsfield reviews.

    With ECC both results will be a bit lower but the conparison remains.

    A small hint: NO ONE tested QuadFX as DB server against Kenstfield....

    Gues what: Quad FX is cheaper and would rules the roost on server-like tasks.
  • ltcommanderdata - Wednesday, December 27, 2006 - link

    Well it's nice to finally see a review of the 5145, although I was hoping for more detailed power consumption numbers. The performance benchmarks were very detailed though which was great.

    Thought I would point out a few errors I noticed as I was flipping through. First on page 2, in the Cache2Cache Latency chart the 201 for the Xeon DP 5060 that is placed in the "Same die, same package" row should be in the "Different die, same package" row. Dempsey uses a dual die approach like Presler and Cloverton as opposed to a single die approach like Smithfield and Paxville DP. And in the last page in the conclusion, you mentioned Clarksboro having "four DIBs", which implies 8 FSBs. I believe that should read two DIBs or really a Quad Independent Bus (QIB) since I'm pretty sure it only has 4 FSBs. (On a side note, Intel slides showed those 4 FSBs clocked at 1066MHz which is really disappointing. Hopefully, now that Cloverton turns out to come in 1333MHz versions instead of only 1066MHz versions that was first announced, Tigerton (and therefore Clarksboro) which is based on Cloverton will also have 1333MHz versions.)

Log in

Don't have an account? Sign up now