Virtualization Performance: ESXi 5.1 & vApus FOS Mark 2 (beta)

We introduced our new vApus FOS (For Open Source) benchmark in our review of the Facebook "Open Compute" servers. In a nutshell, it is a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0.

As we try to keep our benchmarks up to date, some changes have been made to the original vApus FOS Mark. We've added more realistic workloads and tuned them in accordance with optimizations performed by our industry partners.

With our latest and greatest version (a big thanks to Wannes De Smet), we're able to:

  • Simulate real-world loads
  • Measure throughput, response times, and energy usage for a each concurrency
  • Scale to 80 (logical) core servers and beyond

We have a grouped our different workloads into what we call a 'tile'. A tile consists of four VMs, each running a different load:

  • A phpBB forum atop a LAMP stack. The load consists of navigating through the forum, creating new threads, and posting replies. There are also large res pictures on the pages, causing proper network load.
  • Zimbra, which is stressed by navigating the site, sending emails, creating appointments, adding and searching contacts, etc.
  • Our very own Drupal-based website. We create new posts, send contact emails, and generate views in this workload.
  • A MySQL database from a news aggregator, loaded with queries from the aggregator for an OLAP workload.

Each VM's hardware configuration is specced to fit each workload's needs. These are the detailed configurations:

Workload CPUs Memory (GB) OS Versions
phpBB 2 4 Ubuntu 12.10 Apache 2.2.22, MySQL server 5.5.27
Zimbra 4 4 Ubuntu 12.04.3 Zimbra 8
Drupal 4 10 Ubuntu 12.04.2 Drupal 7.21, Apache 2.2.22, MySQL server 5.5.31
MySQL 16 8 Ubuntu 12.04.2 MySQL server 5.5.31

Depending on the system hardware, we place a number of these tiles on the stressed system to max it out and compare its performance to other servers. Developing a new virtualization benchmark takes a lot of time, but we wanted to give you our first results. Our benchmark is still in beta, so results are not final yet. Therefore we only tested one system, the Intel system, using three CPUs.

vApusMark FOS 2013 - beta

Intel reports that the Xeon E5-2697 v2 is 30% faster than the Xeon E5-2690 on SPECvirt_sc2010. Our current benchmark is slightly less optimistic, however it is pretty clear that the Ivy Bridge based Xeons are tangibly faster.

We also measured the power needed to run the three tiles of vApusMark FOS 2013 beta. It is by no means realistic, but even then, peak power remains an interesting metric since all CPUs are tested in the same server.

vApusMark FOS 2013 - beta Power Consumption

According to our measurements, the Xeon E5 2697 v2 needs only 85% of the peak power of the Xeon E5-2690. That is considerable power savings, considering that we get 22% more throughput. Also note that the virtualization improvements (vApic, VT-d large pages) are not implemented in ESXi 5.1.

Benchmarking Configuration SAP S&D
Comments Locked

70 Comments

View All Comments

  • ShieTar - Tuesday, September 17, 2013 - link

    Oops, you are perfectly right of course. In that case the 4960X actually gets the slightly better efficiency (12.08 is 0.28 per thread and GHz) than the dual 2697s (33.56 is 0.26 per thread and GHz), which makes perfect sense.

    It also indicates the 4960X gets about 70% of the performance of a single 2697 at 38% of the cost. Then again, a 1270v3 gets you 50% of the performance at 10% of the price. So when talking farms (i.e. more than one system cooperating), four single-socket boards with 1270v3 will get you almost the power of a dual-socket board with 2697v2 (minus communication overhead), will likely use similar power demand (plus communication overhead), and save you $4400 in the process. Since you use 32 instead of 48 threads, but 4 installations instead of 1, software licensing cost may vary strongly in either direction.

    Would be interesting to see this tested. Anybody willing to send AT four single-socket workstations?
  • hpvd - Tuesday, September 17, 2013 - link

    yes - this would be really interesting. But you should use Infiniband interconnect for a good scaling. And this could only be done without an expensive IB-Switch with 3-maschines...
  • DanNeely - Tuesday, September 17, 2013 - link

    Won't the much higher price of a 4 socket board kill any CPU cost savings?

    In any event, the 1270v3 is a unisocket chip so you'd need to do 4 boxes to cluster.

    Poking around on Intel's site it looks like all 1xxx Xeons are uniprocessor, 2xxx is dual socket, 4xxx quad, 8xxx octo socket. But the 4xxx series is still on 2012 models and 8xxx on 2011 releases. The 4 way chips could just be a bit behind the 2way ones being reviewed now; but with the 8 way ones not updated in 2 years I'm wondering if they're being stealth discontinued due to minimal cases where 2 smaller servers aren't a better buy.
  • hpvd - Tuesday, September 17, 2013 - link

    I think we are talking around about 4 systems with each one cpu, one mainboard, RAM, ..+ network interface card
  • hpvd - Tuesday, September 17, 2013 - link

    another advantage would be that these CPUs uses the latest Hashwell Achitecture: some workloads would greatly benefit from it's AVX2 ...
  • Kevin G - Tuesday, September 17, 2013 - link

    I'd fathom the bigger benefit of Haswell is found in the TSX and L4 cache for server workloads. The benefits of AVX2 would be exploited in more HPC centric workloads. Now if Intel would just release a socketed 1200v3 series CPU with L4 cache.
  • MrSpadge - Tuesday, September 17, 2013 - link

    > Now if Intel would just release a socketed 1200v3 series CPU with L4 cache.

    Agreed! And someone would test it at server loads. And BOINC. And if only Intel would release an overclockalbe Haswell with L4 which we can actually buy!
  • ShieTar - Tuesday, September 17, 2013 - link

    A 4 socket board is expensive, but thats not the discussion I was making. A Xeon E5-4xxx is not likely to be less expensive than the E5-2xxx part anyways.

    The question was specifically how four single socket boards (with 4 cores each, at 3.5GHz, and Haswell technology) would position themselves against a dual-socket board with 24 cores at 2.7GHz and Ivy Bridge EP tech. Admittedly, the 3 extra boards will add a bit of cost (~500$), and and extra memory & communications cards, etc. can also add something depending on usage scenario. Then again, a single 4-core might get the work done with less than half the memory of a 12-core, so you might safe a little there as well.
  • psyq321 - Tuesday, September 17, 2013 - link

    E5-46xx v2 is coming in few months, qualification samples are already available and for all intents and purposes it is ready - Intel just needs to ramp-up production.

    E7-88xx v2 is coming in Q1 2014, it is definitely not discontinued, and the platform (Brickland) will be compatible with both Ivy Bridge EX (E7-88xx v2 among others) and Haswell EX (E7-88xx v3 among others) CPUs and will also be able to take DDR4 RAM. It will require different LGA 2011 socket, though.

    EX platform will come with up to 15 cores in Ivy Bridge EX generation.
  • Kevin G - Tuesday, September 17, 2013 - link

    The E5-46xxx is simply a rebranded E5-26xx with official support for quad socket. The dies are the going to be the same between both families. Intel is just doing extra validation for the quad socket market as the market tends to favor more reliability features as socket count goes up.

    While not socket compatible, Brickland as a platform is expected to be used for the next (last?) Itanium chips.

Log in

Don't have an account? Sign up now