Virtualization Performance: Linux VMs on ESXi

We introduced our new vApus FOS (For Open Source) server workloads in our review of the Facebook "Open Compute" servers. In a nutshell, it is a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0. Zimbra is quite a complex application as it contains the following components:

  • Jetty, the web application server
  • Postfix, an open source mail transfer agent
  • OpenLDAP software, user authentication
  • MySQL is the database
  • Lucene full-featured text and search engine
  • ClamAV, an anti-virus scanner
  • SpamAssassin, a mail filter
  • James/Sieve filtering (mail)

All VMs are based on a minimal CentOS 6 setup with VMware Tools installed. All our current virtualization testing is on top of the hypervisor which we know best: ESXi (5.0). We have changed two things in our vApusMark FOS setup: we upgradeded the guestOS from 5.6 to 6.0 and increased the number of vCPUs of the OLAP VM from 2 to 4. This small upgrade means that our latest results should not be compared to the results in our older articles. We test with four tiles (one tile = four VMs). Each tile needs nine vCPUs, so the test requires 36 vCPUs.

vApusMark FOS

For being just a minor update, the new Piledriver core does pretty well. Clock for clock performance goes up by 11%. The total performance gain (IPC+clock) is 20%, which is significant. The Opteron 6376 performs only 4% better than its direct competitor the E5-2630 (as the latter will perform very similar to our E5-2660 with 6 cores), but that is not bad at all: you get slightly better performance for a lower (server) price.

The top of the line 6380 cannot keep up with the best Xeons. Offering 86% of the more expensive Xeon E5-2660 is hardly a disaster, however. Note "maximum amount of affordable memory" is on top of many virtualization hosts shopping lists followed by price/performance. For those buyers, considering that a server based upon the Opteron cost less, the Opteron is once again a potent virtualization host if the power usage is similar.

With the lack of c-states, the Opteron 6174 did pretty poorly. The Opteron 6276 consumed a lot less at idle than its predecessor, but consumed a lot more when pressured to perform at high load. So we were very keen to learn whether AMD has improved power consumption too. Did AMD finally get that part right?

Benchmarking Configuration Measuring Real-World Power Consumption
Comments Locked

55 Comments

View All Comments

  • Sivar - Wednesday, February 20, 2013 - link

    Please go away. You don't add any new information to the discussion.

    Your writing is of a teenager who knows nothing of processor architecture, the brilliant engineers at both AMD and Intel, or the competitive landscape.

    You present no data, only misinformed opinion. You reduce the quality of this discussion, and have shown no interest in improving your knowledge.
  • JamesAnthony - Wednesday, February 20, 2013 - link

    In the article it mentions you were using the E5-2660 CPU (8 core 2.2 GHz) 95W, in a Dell PowerEdge R720 server

    It may have been a lot more useful to also have included the E5-2680 (8 core 2.7 GHz) and the E5-2690 (8 Core 2.9 GHz) as while they are 130W parts, they are ones that are often used in the PowerEdge R720 and from what we find in a lot of server sales the higher performance ones are very popular for transactional database servers and payment processing servers.

    If you want to go head to head on Intel's top part vs AMD's top part, then it would seem it should be the E5-2690 vs 6386 SE
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    We all know that when you want top performance, Intel is the way to go. So I don't really see the point, even AMD will tell you that the 6376 and 6380 are their most competitive parts.. It is pretty obvious that the E5-2690 2.9 GHz will be faster and consume less than a 6386SE. I don't think our readers really need to see numbers on that.

    And I really doubt that the E5-2690 are sold that much. Most reports say that the top bins with the highest TDP are less than 5% of the total sales.
  • lwatcdr - Wednesday, February 20, 2013 - link

    Wow this is about the most gibberish I have seen in a post ever.
    Good heavens you are an idiot.
    Let's just tear this post bits so this person will NEVER post on here again.
    1"No, it's worth per dollar that you have paid to buy Intel based servers. Intel is more reliable because it has Hyperthreading so you can reduce the latencies that will occur in every workloads."
    Hyperthreading has nothing to do with reliability. So that was a waste of bandwidth.
    "Unlike AMD's engineers who can not design a microprocessor properly. It was AMD's own fault why AMD did not have money like Intel"
    My I introduce you to Titan http://www.olcf.ornl.gov/titan/ The worlds most powerful computer and powered by AMD cpus. AKA yea I think that AMD can actually do pretty well at designing CPUs so this part of your post is also pure manure.
    "Look 99% Bank's in the world uses Intel based ATM as Intel processor can send information without any error." And here we can see that you understand nothing about digital theory or communications. Again a waste of bandwidth.
    "That is why IBM itself does not use Power based processors for its ATM machine because its CEO has admitted that its engineers are not capable to design a lower power processor. So, IBM uses Intel as the standard processor to exchange information between ATM machine to server, so every digits that sent will come in exact same digits when it has been received."
    The IBM power line is for high end systems not for ATM machines. Odds are good that many banks use Power based system for handling ATM transactions. IBM uses Intel or AMD because it is cheap and you can get standard boards. As to the every digit sent nonsense. IT IS DIGITAL you MORON. The communications links have error checking and correction not the CPUs. Please NEVER WASTE OUR TIME AGAIN, YOU KNOW NOTHING OF VALUE ON THIS SUBJECT.
  • toyotabedzrock - Wednesday, February 20, 2013 - link

    Something is wrong with the LZMA benchmarks.

    Can you do a realworld test? There are scripts out there to do this.

    LZMA is built around the idea that decompression is supposed to be much faster than compression.
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    From the 7zip manual:

    "The benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off. "

    So that is the reason why the compression MIPS values are in the same order as the decompression. The decompression "MB/s" values are indeed about 10x and more higher than compression.
  • Oldboy1948 - Thursday, February 21, 2013 - link

    It is an interesting bench and if cache and memory are fast decompress and compress will be very close. It looks better for Bulldozer in this:
    http://www.7-cpu.com/

    ARM has a long way to go if it will be a server one day.
  • extide - Wednesday, February 20, 2013 - link

    Can we PLEASE get folding@home benches?! musky on the hardocp forums has come up with a system where you can run repeatable benchmarks. Myself as well as many others would really love to see F@H benches on systems like this!
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    Ok, Link? :-)
  • alpha754293 - Wednesday, February 20, 2013 - link

    Because of the way that the current Opteron architecture is (1 FPU per module), did you run with the number of LS-DYNA processes equal to the number of FPUs on chip or did you run it based on per "core" (i.e. 2 processes per module)?

Log in

Don't have an account? Sign up now