Conclusions so Far

Both VMmark and vApus Mark I seem to give results that are almost black and white. They give you two opposite and interesting data points. When you are consolidating extremely high numbers of VMs on one physical server, the Xeon Nehalem annihilates, crushes, and walks over all other CPUs including its own older Xeon brothers… if it is running VMware ESX 4.0 (vSphere). Quickly looking at the VMmark results posted so far seems to suggest you should just rip your old Xeon and Opteron servers out of the rack and start again with the brand-spanking new Nehalem Xeon. I am exaggerating, but the contrast with our own virtualization benchmarking was quite astonishing.

vApus Mark I gives the opposite view: the Xeon Nehalem is without a doubt the fastest platform, but the latest quad-core Opteron is not far behind. If your applications are somewhat similar to the ones we used in vApus mark I, pricing and power consumption may bring the Opteron Shanghai and even the Xeon 54xx back into the picture. However, we are well aware that the current vApus Mark I has its limitations. We have tested on ESX 3.5 Update 4, which is in fact the only available hypervisor from VMware right now. For future decisions, we admit that testing on ESX 4.0 is a lot more relevant, but that does not mean that the numbers above are meaningless. Moving towards a new virtualization platform is not something even experienced IT professionals do quickly. Many scripts might not work properly anymore, the default virtualization hardware is not compatible between the hypervisor, etc. For example, ESX 3.5 servers won't recognize the version 7 hardware from ESX 4 VMs. In a nutshell: if ESX 3.5 is your most important hypervisor platform, both the Xeon 55xx, 54xx, and quad-core Opteron are very viable platforms.

It is also interesting to see the enormous advances CPUs have made in the virtualization area:

  • The latest Xeon 55xx of early 2009 is about 4.2 times faster than the best 3.7GHz dual-core Xeons of early 2006.
  • The latest Opterons are 2.5 times better than the slightly faster clocked 3.0GHz dual-core Opterons of mid 2007, and based on this we calculate that they are about 3 times faster than their three year older brothers.

Moving from the 3-4 year old dual-core servers towards the newest quad-core Opterons/Xeons will improve the total performance of your server by about 3 to 4 times.


What about ESX 4.0? What about the hypervisors of Xen/Citrix and Microsoft? What will happen once we test with 8 or 12 VMs? The tests are running while I am writing this. We'll be back with more. Until then, we look forward to reading your constructive criticism and feedback.

I would like to thank Tijl Deneut for assisting me with the insane amount of testing and retesting; Dieter Vandroemme for the excellent programming work on vApus; and of course Liz Van Dijk and Jarred Walton for helping me with the final editing of this article.

Caches, Memory Bandwidth, or Pure Clock Speed?
Comments Locked

66 Comments

View All Comments

  • GotDiesel - Thursday, May 21, 2009 - link

    "Yes, this article is long overdue, but the Sizing Server Lab proudly presents the AnandTech readers with our newest virtualization benchmark, vApus Mark I, which uses real-world applications in a Windows Server Consolidation scenario."

    spoken with a mouth full of microsoft cock

    where are the Linux reviews ?

    not all of us VM with windows you know..

  • JohanAnandtech - Thursday, May 21, 2009 - link

    A minimum form of politeness would be appreciated, but I am going to assume your were just dissapointed.

    The problem is that right now the calling circle benchmark runs half as fast on Linux as it does on Windows. What is causing Oracle to run slower on Linux than on Windows is a mystery even to some of the experienced DBA we have spoken. We either have to replace that benchmark with an alternative (probably Sysbench) or find out what exactly happened.

    When you construct a virtualized benchmark it is not enough just to throw in a few benchmarks and VMs, you really have to understand the benchmark thoroughly. There are enough halfbaken benchmarks already on the internet that look like a Swiss cheese because there are so many holes in the methodology.
  • JarredWalton - Thursday, May 21, 2009 - link

    Page 4: vApus Mark I: the choices we made

    "vApus mark I uses only Windows Guest OS VMs, but we are also preparing a mixed Linux and Windows scenario."

    Building tests, verifying tests, running them on all the servers takes a lot of time. That's why the 2-tile and 3-tile results are not yet ready. I suppose Linux will have to wait for Mark II (or Mark I.1).
  • mino - Thursday, May 21, 2009 - link

    What you did so far is great. No more words needed.

    What I would like to see is vApus Mark I "small" where you make the tiles smaller, about 1/3 to 1/4 of your current tiles.
    Tile structure shall remain simmilar for simplicity, they will just be smaller.

    When you manage to have 2 different tile sizes, you shall be able to consider 1 big + 1 small tile as one "condensed" tile for general score.

    Having 2 reference points will allow for evaluating "VM size scaling" situations.
  • JohanAnandtech - Sunday, May 24, 2009 - link

    Can you elaborate a bit? What do you menan by "1/3 of my current tile?" . A tile = 4 VMs. are you talking about small mem footprint or number of VCPUs?

    Are you saying we should test with a Tile with small VMs and then test afterwards with the large ones? How do you see such "VM scaling" evaluation?
  • mino - Monday, May 25, 2009 - link

    Thanks for response.

    1/3 I mean smaller VM's. Mostly from the load POW. Probably 1/3 load would go for 1/2 memory footprint.

    The point being that currently the is only a single datapont with a specific load-size per tile/per VM.

    By "VM scaling" I would like to see what effect woul smaller loads have on overal performance.

    I suggest 1/3 or 1/4 the load to get a measurable difference while remaining within reasonable memory/VM scale.

    In the end, if you get simmilar overal performance from 1/4 tiles, it may not make sense to include this in future.
    Even then the information that your benchmark results can be safely extrapolated to smaller loads would be of a great value by itself.
  • mino - Monday, May 25, 2009 - link

    Eh, that last text of mime looks like a nice gibberish...
    Clarification nneded:

    To be able to run more tiles/box smaller memory footprint is a must.
    With smaller mem footprint, smaller DB's are a must.

    The end results may not be directly comparable but shall be able to give some reference point, corectly interpreted

    Please let me know if this makes sense to you.
    There are multiple dimensions to this. I may be easily on the imaginery branch :)
  • ibb27 - Thursday, May 21, 2009 - link

    Can we have a chance to see benchmarks for Sun Virtualbox which is Opensource?
  • winterspan - Tuesday, May 26, 2009 - link

    This test is misleading because you are not using the latest version of VMware that supports Intel's EPT. Since AMD's version of this is supported in the older version, the test is not at all a fair representation of their respective performance.
  • Zstream - Thursday, May 21, 2009 - link

    Can someone please perform a Win2008 RC2 Terminal Server benchmark? I have been looking everywhere and no one can provide that.

    If I can take this benchmark and tell my boss this is how the servers will perform in a TS environment please let me know.

Log in

Don't have an account? Sign up now