The Virtualization Benchmarking Chaos

There are an incredible number of pitfalls in the world of server application benchmarking, and virtualization just makes the whole situation much worse. In this report, we want to measure how well the CPUs are coping with virtualization. That means we need to choose our applications carefully. If we use a benchmark that spends very little time in the hypervisor, we are mostly testing the integer processing power and not how the CPU copes with virtualization overhead. As we have pointed out before, a benchmark like SPECjbb does not tell you much, as it spends less than one percent of its time in the hypervisor.

How is virtualization different? CPU A that beats CPU B in native situations can still be beaten by the latter in virtualized scenarios. There are various reasons why CPU A can still lose, for example CPU A…

  1. Takes much more time for switching from the VM to hypervisor and vice versa.
  2. Does not support hardware assisted paging: memory management will cause a lot more hypervisor interventions.
  3. Has smaller TLBs; Hardware Assisted Paging (EPT, NPT/RVI) places much more pressure on the TLBs.
  4. Has less bandwidth; an application that needs only 20% of the maximum bandwidth will be bottlenecked if you run six VMs of the same application.
  5. Has smaller caches; the more VMs, the more pressure there will be on the caches.

To fully understand this, it helps a lot if you read our Hardware Virtualization: the nuts and bolts article. Indeed, some applications run with negligible performance impact inside a virtual machine while others are tangibly slower in a virtualized environment. To get a rough idea of whether or not your application belongs to the latter or former group, a relatively easy rule of thumb can be used: how much time does your application spend in user mode, and how much time does it need help from the kernel? The kernel performs three tasks for user applications:

  • System calls (File system, process creation, etc.)
  • Interrupts (Accessing the disks, NICs, etc.)
  • Memory management (i.e. allocating memory for buffers)

The more work your kernel has to perform for your application, the higher the chance that the hypervisor will need to work hard as well. If your application writes a small log after spending hours crunching numbers, it should be clear it's a typical (almost) "user mode only" application. The prime example of a "kernel intensive" application is an intensively used transactional database server that gets lots of requests from the network (interrupts, system calls), has to access the disks often (interrupts, system calls), and has buffers that grow over time (memory management).

However, a "user mode only" application can still lose a lot of performance in a virtualized environment in some situations:

  • Oversubscribing: you assign more CPUs to the virtual machines than physically available. (This is a very normal and common way to get more out of your virtualized server.)
  • Cache Contention: your application demands a lot of cache and the other virtualized applications do as well.

These kinds of performance losses are relatively easy to minimize. You could buy CPUs with larger caches, and assign (set affinity) certain cache/CPU hungry applications some of the physical cores. The other less intensive applications would share the CPU cores. In this article, we will focus on the more sensitive workloads out there that do quite a bit of I/O (and thus interrupts), need large memory buffers, and thus talk to the kernel a lot. This way we can really test the virtualization capabilities of the servers.

Index Independent Real-World Virtualization Benchmarking
Comments Locked

66 Comments

View All Comments

  • JohanAnandtech - Friday, May 22, 2009 - link

    Most of the time, the number of sessions on TS are limited by the amount of memory. Can you give some insight in what you are running inside a session? If it is light on CPU or I/O resources, sizing will be based on the amount of memory per session only.
  • dragunover - Thursday, May 21, 2009 - link

    would be interesting if this was done on desktop CPU's with price / performance ratios
  • jmke - Thursday, May 21, 2009 - link

    nope, that would not be interesting at all. You don't want desktop motherboards, RAM or CPUs in your server room;
    nor do you run ESX at home. So there's no point to test performance of desktop CPUs.
  • simtex - Thursday, May 21, 2009 - link

    Why so harsh, virtualization will eventually become a part of desktops users everyday life.

    Imagine, tabbing between different virtualization, like you do in your browser. You might have a secure virtualization for your webapplications, a fast virtualization for your games. Another for streaming music and maybe capturing television. All on one computer, which you seldom have to reboot because everything runs virtualized.
  • Azsen - Monday, May 25, 2009 - link

    Why would you run all those applications on your desktop in VMs? Surely they would just be separate application processes running under the one OS.
  • flipmode - Thursday, May 21, 2009 - link

    Speaking from the perspective of how the article can be the most valuable, it is definitely better off to stick to true server hardware for the time being.

    For desktop users, it is a curiosity that "may eventually" impart some useful data. The tests are immediately valuable for servers and for current server hardware. They are merely of academic curiosity for desktop users on hardware that will be outdated by the time virtualization truly becomes a mainstream scenario on the desktop.

    And I do not think he was being harsh, I think he was just being as brief as possible.

Log in

Don't have an account? Sign up now