Virtualization Performance: ESXi 5.1 & vApus FOS Mark 2 (beta)

We introduced our new vApus FOS (For Open Source) benchmark in our review of the Facebook "Open Compute" servers. In a nutshell, it is a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0.

As we try to keep our benchmarks up to date, some changes have been made to the original vApus FOS Mark. We've added more realistic workloads and tuned them in accordance with optimizations performed by our industry partners.

With our latest and greatest version (a big thanks to Wannes De Smet), we're able to:

  • Simulate real-world loads
  • Measure throughput, response times, and energy usage for a each concurrency
  • Scale to 80 (logical) core servers and beyond

We have a grouped our different workloads into what we call a 'tile'. A tile consists of four VMs, each running a different load:

  • A phpBB forum atop a LAMP stack. The load consists of navigating through the forum, creating new threads, and posting replies. There are also large res pictures on the pages, causing proper network load.
  • Zimbra, which is stressed by navigating the site, sending emails, creating appointments, adding and searching contacts, etc.
  • Our very own Drupal-based website. We create new posts, send contact emails, and generate views in this workload.
  • A MySQL database from a news aggregator, loaded with queries from the aggregator for an OLAP workload.

Each VM's hardware configuration is specced to fit each workload's needs. These are the detailed configurations:

Workload CPUs Memory (GB) OS Versions
phpBB 2 4 Ubuntu 12.10 Apache 2.2.22, MySQL server 5.5.27
Zimbra 4 4 Ubuntu 12.04.3 Zimbra 8
Drupal 4 10 Ubuntu 12.04.2 Drupal 7.21, Apache 2.2.22, MySQL server 5.5.31
MySQL 16 8 Ubuntu 12.04.2 MySQL server 5.5.31

Depending on the system hardware, we place a number of these tiles on the stressed system to max it out and compare its performance to other servers. Developing a new virtualization benchmark takes a lot of time, but we wanted to give you our first results. Our benchmark is still in beta, so results are not final yet. Therefore we only tested one system, the Intel system, using three CPUs.

vApusMark FOS 2013 - beta

Intel reports that the Xeon E5-2697 v2 is 30% faster than the Xeon E5-2690 on SPECvirt_sc2010. Our current benchmark is slightly less optimistic, however it is pretty clear that the Ivy Bridge based Xeons are tangibly faster.

We also measured the power needed to run the three tiles of vApusMark FOS 2013 beta. It is by no means realistic, but even then, peak power remains an interesting metric since all CPUs are tested in the same server.

vApusMark FOS 2013 - beta Power Consumption

According to our measurements, the Xeon E5 2697 v2 needs only 85% of the peak power of the Xeon E5-2690. That is considerable power savings, considering that we get 22% more throughput. Also note that the virtualization improvements (vApic, VT-d large pages) are not implemented in ESXi 5.1.

Benchmarking Configuration SAP S&D
Comments Locked

70 Comments

View All Comments

  • mczak - Tuesday, September 17, 2013 - link

    Yes that's surprising indeed. I wonder how large the difference in die size is (though the reason for two dies might have more to do with power draw).
  • zepi - Tuesday, September 17, 2013 - link

    How about adding turbo frequencies to sku-comparison tables? That'd make comparison of the sku's a bit easier as that is sometimes more repsentative figure depending on the load that these babies are run.
  • JarredWalton - Tuesday, September 17, 2013 - link

    I added Turbo speeds to all SKUs as well as linking the product names to the various detail pages at AMD/Intel. Hope that helps! (And there were a few clock speed errors before that are now corrected.)
  • zepi - Wednesday, September 18, 2013 - link

    Appreciated!
  • zepi - Wednesday, September 18, 2013 - link

    For most server buyers things are not this simple, but for armchair sysadmins this might do:
    http://cornflake.softcon.fi/export/ivyexeon.png
  • ShieTar - Tuesday, September 17, 2013 - link

    "Once we run up to 48 threads, the new Xeon can outperform its predecessor by a wide margin of ~35%. It is interesting to compare this with the Core i7-4960x results , which is the same die as the "budget" Xeon E5s (15MB L3 cache dies). The six-core chip at 3.6GHz scores 12.08."

    What I find most interesting here is that the Xeon manages to show a factor 23 between multi-threaded and single-threaded performance, a very good scaling for a 24-thread CPU. The 4960X only manages a factor of 7 with its 12 threads. So it is not merely a question of "cores over clock speed", but rather hyperthreading seems to not work very well on the consumer CPUs in the case of Cinebench. The same seems to be true for the Sandy Bridge and Haswell models as well.

    Do you know why this is? Is hyperthreading implemented differently for the Xeons? Or is it caused by the different OS used (Windows 2008 vs Windows 7/8)?
  • JlHADJOE - Tuesday, September 17, 2013 - link

    ^ That's very interesting. Made me look over the Xeon results and yes, they do appear to be getting close to a 100% increase in performance for each thread added.
  • psyq321 - Tuesday, September 17, 2013 - link

    Hyperthreading is the same.

    However, HCC version of IvyTown has two separate memory controllers, more features enabled (direct cache access, different prefetchers etc.). So it might scale better.

    I am achieving 1.41x speed-up with dual Xeon 2697 v2 setup, compared to my old dual Xeon 2687W setup. This is so close to the "ideal" 1.5x scaling that it is pretty amazing. And, 2687w was running on a slightly higher clock in all-core turbo.

    So, I must say I am very happy with the IvyTown upgrade.
  • garadante - Tuesday, September 17, 2013 - link

    It's not 24 threads, it's 48 threads for that scaling. 2x physical CPUs with 12 cores each, for 24 physical cores and a total of 48 logical cores.
  • Kevin G - Tuesday, September 17, 2013 - link

    Actually if you run the numbers, the scaling factor from 1 to 48 threads is actually 21.9. I'm curious what the result would have been with Hyperthreading disabled as that can actually decrease performance in some instances.

Log in

Don't have an account? Sign up now