Virtualization Performance: Linux VMs on ESXi

We introduced our new vApus FOS (For Open Source) server workloads in our review of the Facebook "Open Compute" servers. In a nutshell, it a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0. Zimbra is quite a complex application as it contains the following components:

  • Jetty, the web application server
  • Postfix, an open source mail transfer agent
  • OpenLDAP software, user authentication
  • MySQL is the database
  • Lucene full-featured text and search engine
  • ClamAV, an anti-virus scanner
  • SpamAssassin, a mail filter
  • James/Sieve filtering (mail)

All VMs are based on a minimal CentOS 5.6 setup with VMware Tools installed. All our current virtualization testing is on top of the hypervisor which we know best: ESXi (5.0). CentOS 5.6 is not ideal for the Interlagos Opteron, but we designed the benchmark a few months ago. It took us weeks to get this benchmark working and repeatable (especially the latter is hard). For example it was not easy to get Zimbra fully configured and properly benchmarked due to the complex usage patterns and high I/O usage. Besides, the reality is that VMs often contain older operating systems. We hope to show some benchmarks based on Linux kernel version 3.0 or later in our next article.

We tested with five tiles (one tile = four VMs). Each tile needs seven vCPUs, so the test requires 35 vCPUs.

vApus FOS

The Opteron 6276 stays close to the more expensive Xeons. That makes the Opteron server the one with the best performance per dollar. Still, we feel a bit underwhelmed as the Opteron 6276 fails to outperform the previous Opteron by a tangible margin.

The benchmark above measures throughput. Response times are even more important. Let us take a look at the table below, which gives you the average response time per VM:

vApus FOS Average Response Times (ms), lower is better!
CPU PhpBB1 PHPBB2 MySQL OLAP Zimbra
AMD Opteron 6276 737 587 170 567
AMD Opteron 6174 707 574 118 630
Intel Xeon X5670 645 550 63 593
Intel Xeon X5650 678 566 102 655

The Xeon X5670 wins a landslide victory in MySQL. MySQL has always scaled better with clock speed than with cores, so we expect that clock speed played a major role here. The same is true for our first VM: this VM gets only one CPU and as result runs quicker on the Xeon. In the other applications, the Opteron's higher (integer) core count starts to show. However, AMD cannot really be satisfied with the fact that the old Opteron 6174 delivers much better MySQL performance. We suspect that the high latency L2 cache and higher branch misprediction penalty (20 vs 12) is to blame. MySQL performance is characterized by a relatively high amount of branches and a lot of accesses to the L2. The Bulldozer server does manage to get the best response time on our Zimbra VM, however, so it's not a complete loss.

Performance per watt remains the most important metric for a large part of the server market. So let us check out the power consumption that we measured while we ran vApus FOS.

vApus FOS Power Consumption

The power consumption numbers are surprising to say the least. The Opteron 6174 needs quite a bit less energy than the two other contenders. That is bad news for the newest Opteron. We found out later that some tinkering could improve the situation, as we will see further.

Benchmark Configuration Measuring Real-World Power Consumption, Part One
Comments Locked

106 Comments

View All Comments

  • Kevin G - Tuesday, November 15, 2011 - link

    I'm curious if CPU-Z polls the hardware for this information or if it queries a database to fetch this information. If it is getting the core and thread count from hardware, it maybe configurable. So while the chip itself does not use Hyperthreading, it maybe reporting to the OS that does it by default. This would have an impact in performance scaling as well as power consumption as load increases.
  • MrSpadge - Tuesday, November 15, 2011 - link

    They are integer cores, which share few ressources besides the FPU. On the Intel side there are two threads running concurrently (always, @Stuka87) which share a few less ressources.

    Arguing which one deserves the name "core" and which one doesn't is almost a moot point. However, both designs are nto that different regarding integer workloads. They're just using a different amount of shared ressources.

    People should also keep in mind that a core does not neccessaril equal a core. Each Bulldozer core (or half module) is actually weaker than in Athlon 64 designs. It got some improvements but lost in some other areas. On the other hand Intels current integer cores are quite strong and fat - and it's much easier to share ressources (between 2 hyperthreaded treads) if you've got a lot of them.

    MrS
  • leexgx - Wednesday, November 16, 2011 - link

    but on Intel side there are only 4 real cores with HT off or on (on an i7 920 seems to give an benefit, but on results for the second gen 2600k HT seems less important)

    where as on amd there are 4 cores with each core having 2 FP in them (desktop cpu) issue is the FPs are 10-30% slower then an Phenom cpu clocked at the same speed
  • anglesmith - Tuesday, November 15, 2011 - link

    which version of windows 2008 R2 SP1 x64 was used enterprise/datacenter/standard?
  • Lord 666 - Tuesday, November 15, 2011 - link

    People who are purchasing SB-E will be doing similar stuff on workstations. Where are those numbers?
  • Kevin G - Tuesday, November 15, 2011 - link

    Probably waiting in the pipeline for SB-E base Xeons. Socket LGA-2011 based Xeon's are still several months away.
  • Sabresiberian - Tuesday, November 15, 2011 - link

    I'm not so sure I'd fault AMD too much because 95% of the people that their product users, in this case, won't go through the effort of upgrading their software to get a significant performance increase, at least at first. Sometimes, you have to "force" people to get out of their rut and use something that's actually better for them.

    I freely admit that I don't know much about running business apps; I build gaming computers for personal use. I can't help but think of my Father though, complaining about Vista and Win 7 and how they won't run his old, freeware apps properly. Hey, Dad, get the people that wrote those apps to upgrade them, won't you? It's not Microsoft's fault that they won't bring them up to date.

    Backwards compatibility can be a stone around the neck of progress.

    I've tended to be disappointed in AMD's recent CPU releases as well, but maybe they really do have an eye focused on the future that will bring better things for us all. If that's the case, though, they need to prove it now, and stop releasing biased press reports that don't hold up when these things are benched outside of their labs.

    ;)
  • JohanAnandtech - Tuesday, November 15, 2011 - link

    The problem is that a lot of server folks buy new servers to run the current or older software faster. It is a matter of TCO: they have invested a lot of work into getting webapplication x.xx to work optimally with interface y.yy and database zz.z. The vendor wants to offer a service, not a the latest technology. Only if the service gets added value from the newest technology they might consider upgrading.

    And you should tell your dad to run his old software in virtual box :-).
  • Sabresiberian - Wednesday, November 16, 2011 - link

    Ah I hadn't thought of it in terms of services, which is obvious now that you say it. Thanks for educating me!

    ;)
  • IlllI - Tuesday, November 15, 2011 - link

    amd was shooting to capture 25% of the market? (this was like when the first amd64 chips came out)

Log in

Don't have an account? Sign up now