The Quest for an Independent Real-World Virtualization Benchmark

As we explained in our Xeon Nehalem review, comprehensive real-world server benchmarks that cover the market are beyond what one man can perform. Virtualization benchmarking needs much more manpower, and it is always good to understand the motivation of the group doing the testing. Large OEMs want to show off their latest server platforms, and virtualization software vendors want to show how efficient their hypervisor is. So why did we undertake this odyssey?

This virtualization benchmark was developed by an academic research group called the Sizing Server Lab. (I am also part of this research group.) Part of the lab is academic work; the other part is research that is immediately applied in the field, in this case software developers. The main motivation and objective of the applied research is to tell developers how their own software behaves and performs in a virtual environment. Therefore, the focus of our efforts was to develop a very flexible stress test that tells us how any real-world application behaves in a virtualized environment. A side effect of all this work is that we came up with a virtualization server benchmark, which we think will be very interesting for the readers of AnandTech.

Although the benchmark was a result of research by an academic lab, the most important objectives in designing our own virtualization benchmarks are that they be:

  • Repeatable
  • Relevant
  • Comparable
  • Heavy

Repeatable is the hardest one. Server benchmarks tend to run into all kinds of non-hardware related limits such as not enough connections, locking contention, and driver latency. This results in a benchmark that rarely runs at 100% CPU utilization and the CPU percentage load changes for different CPUs. In "Native OS" conditions, this is still quite okay; you can still get a decent idea of how two CPUs perform if one runs at 78% and the other runs at 83% CPU load. However, in virtualization this becomes a complete mess, especially when you have more virtual than physical CPUs. Some VMs will report significantly lower CPU load and others will report significantly higher CPU load when you are comparing two servers. As each VM is reporting different numbers (for example queries per second, transactions per second, and URL/s), average CPU load does not tell you the whole story either. To remedy this, we went through a careful selection of our applications and decided to keep only those benchmarks that allowed us to push the system close to 95-99% load. Note that this was only possible after a lot of tuning.

Comparable: our virtualization benchmark can run on Xen, Hyper-V and ESX.

Heavy: While VMmark and others go for the scenario of running many very light virtual machines with extremely small workloads, we go for a scenario with four or eight VMs. The objective is to find out how the CPUs handle "hard to consolidate" applications such as complex dynamic websites, OnLine Transaction Processing (OLTP), and OnLine Analytical Processing (OLAP) databases.

Most importantly: Relevant. We have been working towards benchmarks using applications that people run every day. In this article we had to make one compromise: as we are comparing the virtualization capabilities of different CPUs, we had to push CPU utilization close to 100%. Few virtualized servers will run close to 100% all the time, but it allows us to be sure that the CPU is the bottleneck. We are using real-world applications instead of benchmarks, but the other side of coin is that this virtualization benchmark is not easily reproducible by third parties. We cannot release the benchmark to third parties, as some of the software used is the intellectual property of other companies. However, we are prepared to fully disclose the details of how we perform the benchmarks to every interested and involved company.

The Virtualization Benchmarking Chaos vApus Mark I: the choices we made
Comments Locked

66 Comments

View All Comments

  • Bandoleer - Thursday, May 21, 2009 - link

    I have been running Vmware Virtual Infrastructure for 2 years now. While this article can be useful for someone looking for hardware upgrades or scaling of a virtual system, CPU and memory are hardly the bottlenecks in the real world. I'm sure there are some organizations that want to run 100+ vm's on "one" physical machine with 2 physical processors, but what are they really running????

    The fact is, if you want VM flexability, you need central storage of all your VMDK's that are accessible by all hosts. There is where you find your bottlenecks, in the storage arena. FC or iSCSI, where are those benchmarks? Where's the TOE vs QLogic HBA? Considering 2 years ago, there was no QLogic HBA for blade servers, nor does Vmware support TOE.

    However, it does appear i'll be able to do my own baseline/benching once vSphere ie VI4 materializes to see if its even worth sticking with vmware or making the move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware.
    But it would really be nice to see central storage benchmarks, considering that is the single most expensive investment of a virtual system.

  • duploxxx - Friday, May 22, 2009 - link

    perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV.

    ESX 3.5 does support Jumbo, iscsi offload adapters and no idea how you are going to gain 600% if iscsi is only about 15% slower then FC if you have decent network and dedicated iscsi box?????
  • Bandoleer - Friday, May 22, 2009 - link

    "perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV. "

    what you are calling functionality here are the same features that will not work in ESX4.0 in order to gain direct hardware access for performance.
  • Bandoleer - Friday, May 22, 2009 - link

    The reality is I lost around 500MBps storage throughput when I moved from Direct Attached Storage. Not because of our new central storage, but because of the limitations of the driver-less Linux iSCSI capability or the lack there of. Yes!! in ESX 3.5 vmware added Jumbo frame support as well as flow control support for iSCSI!! It was GREAT, except for the part that you can't run JUMBO frames + flow control, you have to pick one, flow control or JUMBO.

    I said 2 years ago there was no such thing as iSCSI HBA's for blade servers. And that ESX does not support the TOE feature of Multifunction adapters (because that "functionality" requires a driver).

    Functionality you lose by moving to hyperV? In my case, i call them useless features, which are second to performance and functionality.



  • JohanAnandtech - Friday, May 22, 2009 - link

    I fully agree that in many cases the bottleneck is your shared storage. However, the article's title indicated "Server CPU", so it was clear from the start that this article would discuss CPU performance.

    "move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware. "

    Can you back that up with a link to somewhere? Because the 600% sounds like an MS Advertisement :-).

  • Bandoleer - Friday, May 22, 2009 - link

    My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish.

    I wasn't ranting at the article, its great for what it is, which is what the title represents. I was responding to this part of the article that accidentally came out as a rant because i'm so passionate about virtualization.

    "What about ESX 4.0? What about the hypervisors of Xen/Citrix and Microsoft? What will happen once we test with 8 or 12 VMs? The tests are running while I am writing this. We'll be back with more. Until then, we look forward to reading your constructive criticism and feedback.

    Sorry, i meant to be more constructive haha...



  • JohanAnandtech - Sunday, May 24, 2009 - link

    "My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish. "

    Yes, please do. Very interested in to reading what you found.

    "I wasn't ranting at the article, its great for what it is, which is what the title represents. "

    Thx. no problem...Just understand that these things takes time and cooperation of the large vendors. And getting the right $5000 storage hardware in lab is much harder than getting a $250 videocard. About 20 times harder :-).


  • Bandoleer - Sunday, May 24, 2009 - link

    I haven't looked recently, but high performance tiered storage was anywhere from $40k - $80k each, just for the iSCSI versions, the FC versions are clearly absurd.

  • solori - Monday, May 25, 2009 - link

    Look at ZFS-based storage solutions. ZFS enables hybrid storage pools and an elegant use of SSDs with commodity hardware. You can get it from Sun, Nexenta or by rolling-your-own with OpenSolaris:

    http://solori.wordpress.com/2009/05/06/add-ssd-to-...">http://solori.wordpress.com/2009/05/06/add-ssd-to-...
  • pmonti80 - Friday, May 22, 2009 - link

    Still it would be interesting to see those central storage benchmarks or at least knowing if you will/won't be doing them for whatever reason.

Log in

Don't have an account? Sign up now