SAP S&D

The SAP S&D 2-Tier benchmark has always been one of my favorites. This is probably the most real world benchmark of all server benchmarks done by the vendors. It is a full blown application living on top of a heavy relational database. And don't forget that SAP is one of the most successful software companies out there, the undisputed market leader of Enterprise Resource Planning.

SAP is thus an application that misses the L2 cache much more than most applications out there, with the exception of some exotic HPC apps. We made an in depth profile of SAP S&D, but here is the summary:

  • The application has very low instruction level parallelism (ILP) and as a result is not taxing the integer units much (IPC = 0.3-0.55, SPECint 2006: >1) .
  • SAP misses the L2 cache much more than most applications out there (4 to 10 times more than SPECint2006 apps)
  • The application has a relatively large but "prefetcheable" instruction footprint, which allows the prefetchers to reduce the instruction related cache misses
  • The application has a massive and random data footprint, putting great pressure on the load subsystem. As a result the out of order engine has to hide the latency the best it can, and large ROB and load buffers help a lot. The latency of the memory subsystem matters.

SAP Sales & Distribution 2 Tier benchmark

The new Opteron does not boost SAP performance. A 6% clock increase translates into a 5% performance increase. As we discussed previously, SAP is one of the few complex server applications where the "Interlagos" Opteron performs a lot better than its predecessor. The application does not seem to benefit from any of the small improvements that the Piledrive core offers. Or maybe HP's benchmark team did not spend much time on this particular benchmark. Since the HP score is the only Interlagos score available, we have no other option than to wonder which of the two options is the closest to the truth.

Not that it matters much: the best SAP servers are Xeon E5 based. In this market of expensive consulting and software, $500 dollar savings on hardware is peanuts. So people tend to go for the best performance, and the Xeon E5 are clearly better at delivering raw SAP performance.

Measuring Real-World Power Consumption Java Server Performance
Comments Locked

55 Comments

View All Comments

  • Sivar - Wednesday, February 20, 2013 - link

    Please go away. You don't add any new information to the discussion.

    Your writing is of a teenager who knows nothing of processor architecture, the brilliant engineers at both AMD and Intel, or the competitive landscape.

    You present no data, only misinformed opinion. You reduce the quality of this discussion, and have shown no interest in improving your knowledge.
  • JamesAnthony - Wednesday, February 20, 2013 - link

    In the article it mentions you were using the E5-2660 CPU (8 core 2.2 GHz) 95W, in a Dell PowerEdge R720 server

    It may have been a lot more useful to also have included the E5-2680 (8 core 2.7 GHz) and the E5-2690 (8 Core 2.9 GHz) as while they are 130W parts, they are ones that are often used in the PowerEdge R720 and from what we find in a lot of server sales the higher performance ones are very popular for transactional database servers and payment processing servers.

    If you want to go head to head on Intel's top part vs AMD's top part, then it would seem it should be the E5-2690 vs 6386 SE
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    We all know that when you want top performance, Intel is the way to go. So I don't really see the point, even AMD will tell you that the 6376 and 6380 are their most competitive parts.. It is pretty obvious that the E5-2690 2.9 GHz will be faster and consume less than a 6386SE. I don't think our readers really need to see numbers on that.

    And I really doubt that the E5-2690 are sold that much. Most reports say that the top bins with the highest TDP are less than 5% of the total sales.
  • lwatcdr - Wednesday, February 20, 2013 - link

    Wow this is about the most gibberish I have seen in a post ever.
    Good heavens you are an idiot.
    Let's just tear this post bits so this person will NEVER post on here again.
    1"No, it's worth per dollar that you have paid to buy Intel based servers. Intel is more reliable because it has Hyperthreading so you can reduce the latencies that will occur in every workloads."
    Hyperthreading has nothing to do with reliability. So that was a waste of bandwidth.
    "Unlike AMD's engineers who can not design a microprocessor properly. It was AMD's own fault why AMD did not have money like Intel"
    My I introduce you to Titan http://www.olcf.ornl.gov/titan/ The worlds most powerful computer and powered by AMD cpus. AKA yea I think that AMD can actually do pretty well at designing CPUs so this part of your post is also pure manure.
    "Look 99% Bank's in the world uses Intel based ATM as Intel processor can send information without any error." And here we can see that you understand nothing about digital theory or communications. Again a waste of bandwidth.
    "That is why IBM itself does not use Power based processors for its ATM machine because its CEO has admitted that its engineers are not capable to design a lower power processor. So, IBM uses Intel as the standard processor to exchange information between ATM machine to server, so every digits that sent will come in exact same digits when it has been received."
    The IBM power line is for high end systems not for ATM machines. Odds are good that many banks use Power based system for handling ATM transactions. IBM uses Intel or AMD because it is cheap and you can get standard boards. As to the every digit sent nonsense. IT IS DIGITAL you MORON. The communications links have error checking and correction not the CPUs. Please NEVER WASTE OUR TIME AGAIN, YOU KNOW NOTHING OF VALUE ON THIS SUBJECT.
  • toyotabedzrock - Wednesday, February 20, 2013 - link

    Something is wrong with the LZMA benchmarks.

    Can you do a realworld test? There are scripts out there to do this.

    LZMA is built around the idea that decompression is supposed to be much faster than compression.
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    From the 7zip manual:

    "The benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off. "

    So that is the reason why the compression MIPS values are in the same order as the decompression. The decompression "MB/s" values are indeed about 10x and more higher than compression.
  • Oldboy1948 - Thursday, February 21, 2013 - link

    It is an interesting bench and if cache and memory are fast decompress and compress will be very close. It looks better for Bulldozer in this:
    http://www.7-cpu.com/

    ARM has a long way to go if it will be a server one day.
  • extide - Wednesday, February 20, 2013 - link

    Can we PLEASE get folding@home benches?! musky on the hardocp forums has come up with a system where you can run repeatable benchmarks. Myself as well as many others would really love to see F@H benches on systems like this!
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    Ok, Link? :-)
  • alpha754293 - Wednesday, February 20, 2013 - link

    Because of the way that the current Opteron architecture is (1 FPU per module), did you run with the number of LS-DYNA processes equal to the number of FPUs on chip or did you run it based on per "core" (i.e. 2 processes per module)?

Log in

Don't have an account? Sign up now