Virtualization Performance: Linux VMs on ESXi

We introduced our new vApus FOS (For Open Source) server workloads in our review of the Facebook "Open Compute" servers. In a nutshell, it a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0. Zimbra is quite a complex application as it contains the following components:

  • Jetty, the web application server
  • Postfix, an open source mail transfer agent
  • OpenLDAP software, user authentication
  • MySQL is the database
  • Lucene full-featured text and search engine
  • ClamAV, an anti-virus scanner
  • SpamAssassin, a mail filter
  • James/Sieve filtering (mail)

All VMs are based on a minimal CentOS 5.6 setup with VMware Tools installed. All our current virtualization testing is on top of the hypervisor which we know best: ESXi (5.0). CentOS 5.6 is not ideal for the Interlagos Opteron, but we designed the benchmark a few months ago. It took us weeks to get this benchmark working and repeatable (especially the latter is hard). For example it was not easy to get Zimbra fully configured and properly benchmarked due to the complex usage patterns and high I/O usage. Besides, the reality is that VMs often contain older operating systems. We hope to show some benchmarks based on Linux kernel version 3.0 or later in our next article.

We tested with five tiles (one tile = four VMs). Each tile needs seven vCPUs, so the test requires 35 vCPUs.

vApus FOS

The Opteron 6276 stays close to the more expensive Xeons. That makes the Opteron server the one with the best performance per dollar. Still, we feel a bit underwhelmed as the Opteron 6276 fails to outperform the previous Opteron by a tangible margin.

The benchmark above measures throughput. Response times are even more important. Let us take a look at the table below, which gives you the average response time per VM:

vApus FOS Average Response Times (ms), lower is better!
CPU PhpBB1 PHPBB2 MySQL OLAP Zimbra
AMD Opteron 6276 737 587 170 567
AMD Opteron 6174 707 574 118 630
Intel Xeon X5670 645 550 63 593
Intel Xeon X5650 678 566 102 655

The Xeon X5670 wins a landslide victory in MySQL. MySQL has always scaled better with clock speed than with cores, so we expect that clock speed played a major role here. The same is true for our first VM: this VM gets only one CPU and as result runs quicker on the Xeon. In the other applications, the Opteron's higher (integer) core count starts to show. However, AMD cannot really be satisfied with the fact that the old Opteron 6174 delivers much better MySQL performance. We suspect that the high latency L2 cache and higher branch misprediction penalty (20 vs 12) is to blame. MySQL performance is characterized by a relatively high amount of branches and a lot of accesses to the L2. The Bulldozer server does manage to get the best response time on our Zimbra VM, however, so it's not a complete loss.

Performance per watt remains the most important metric for a large part of the server market. So let us check out the power consumption that we measured while we ran vApus FOS.

vApus FOS Power Consumption

The power consumption numbers are surprising to say the least. The Opteron 6174 needs quite a bit less energy than the two other contenders. That is bad news for the newest Opteron. We found out later that some tinkering could improve the situation, as we will see further.

Benchmark Configuration Measuring Real-World Power Consumption, Part One
POST A COMMENT

106 Comments

View All Comments

  • veri745 - Tuesday, November 15, 2011 - link

    Shouldn't there be 8 x 2MB L2 for Interlagos instead of just 4x? Reply
  • ClagMaster - Tuesday, November 15, 2011 - link

    A core this complex in my opinion has not been optimized to its fullest potential.

    Expect better performance when AMD introduces later steppings of this core with regard to power consumption and higher clock frequencies.

    I have seen this in earlier AMD and Intel Cores, this new core will be the same.
    Reply
  • C300fans - Tuesday, November 15, 2011 - link

    1x i7 3960x or 2x Interlagos 6272? It is up to you. Money cow. Reply
  • tech6 - Tuesday, November 15, 2011 - link

    We have a bunch of 6100 in our data center and the performance has been disappointing. They do no better in single thread performance than old 73xx series Xeons. While this is OK for non-interactive stuff, it really isn't good enough for much else. These results just seem to confirm that the Bulldozer series of processors is over-hyped and that AMD is in danger of becoming irrelevant in the server, mobile and desktop market. Reply
  • mino - Wednesday, November 16, 2011 - link

    Actually, for interactive stuff (read VDI/Citrix/containers) core counts rule the roost. Reply
  • duploxxx - Thursday, November 17, 2011 - link

    this is exactly what should be fixed now with the turbo when set correct, btw the 73xx series were not that bad on single thread performance, it was wide scale virtualization and IO throughput which was awefull one these systems. Reply
  • alpha754293 - Tuesday, November 15, 2011 - link

    "Let us first discuss the virtualization scene, the most important market." Yea, I don't know about that.

    Considering that they've already shipped like some half-a-million cores to the leading supercomputers of the world; where some of them are doing major processor upgrades with this new release; I wouldn't necessarily say that it's the most IMPORTANT market. Important, yes. But MOST important...I dunno.

    Looking forward to more HPC benchmark results.

    Also, you might have to play with thread schedule/process affinity (masks) to make it work right.

    See the Techreport article.
    Reply
  • JohanAnandtech - Thursday, November 17, 2011 - link

    Are you talking about the Euler3D benchmark?

    And yes, by any metric (revenue, servers sold) the virtualization market is the most important one for servers. Depending on the report 60 to 80% of the servers are bought to be virtualized.
    Reply
  • alpha754293 - Tuesday, November 15, 2011 - link

    Folks: chip-multithreading (CMT) is nothing new.

    I would explain it this way: it is the physical, hardware manifestation of simultaneous multi-threading (SMT). Intel's HTT is SMT.

    IBM's POWER (since I think as early as POWER4), Sun/Oracle/UltraDense's Niagara (UltraSPARC T-series), maybe even some of the older Crays were all CMT. (Don't quote me on the Crays though. MIPS died before CMT came out. API WOULD have had it probably IF there had been an EV8).

    But the way I see it - remember what a CPU IS: it's a glorified calculator. Nothing else/more.

    So, if it can't calculate, then it doesn't really do much good. (And I've yet to see an entirely integer-only program).

    Doing integer math is fairly easy and straightforward. Doing floating-point math is a LOT harder. If you check the power consumption while solving a linear algebra equation using Gauss elimination (parallelized or using multiple instances of the solver); I can guarantee you that you will consume more power than if you were trying to run VMs.

    So the way I see it, if a CPU is a glorified calculator, then a "core" is where/whatever the FPU is. Everything else is just ancillary and that point.
    Reply
  • mino - Wednesday, November 16, 2011 - link

    1) Power is NOT CMT, it allways was a VERY(even by RISC standards) wide SMT design.

    2) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.

    Bulldozer indeed IS a first of its kind. With all the associated advantages(future scaling) and disadvantages(alfa version).

    There is a nice debate somewhere on cpu.arch groups from the original author(think 1990's) of the CMT concept.
    Reply

Log in

Don't have an account? Sign up now