vApus Mark I: Performance-Critical applications virtualized

Our vApus Mark I benchmark is not a VMmark replacement. It is meant to be complimentary: while VMmark uses runs 60 to 120 light loads, vApus Mark I runs 8 heavy VMs on 24 virtual CPUs (vCPUs). Our current vApus Stressclient is being improved to scale to much higher amount of vCPUs, but currently we limit the benchmark to 24 virtual CPUs.

A vApus Mark I tile consists of one OLTP, one OLAP and two heavy websites are combined in one tile. These are the kind of demanding applications that still got their own dedicated and natively running machine a year ago. vApus Mark I shows what will happen if you virtualize them. If you want to fully understand our benchmark methodology: vApus Mark I has been described in great detail here. We have changed only one thing compared to our original benchmarking: we used large pages as it is generally considered as a best practice (with RVI, EPT).

The current vApus Mark I uses two tiles. Per tile we have thus 4 VMs with 4 server applications:

  • A SQL Server 2008 x64 database running on Windows 2008 64-bit, stress tested by our in-house developed vApus test (4 vCPUs).
  • Two heavy duty MCS eFMS portals running PHP, IIS on Windows 2003 R2, stress tested by our in house developed vApus test (each 2 vCPUs).
  • One OLTP database, based on Oracle 10G Calling Circle benchmark of Dominic Giles (4 vCPUs).

The beauty is that vApus (stress testing software developed by the Sizing Servers Lab) uses actions made by real people (as can be seen in logs) to stress test the VMs, not some benchmarking algorithm.

vAPUS Mark I 2 tile test - 24 vCPUs - ESX 4.0

As always, vApus Mark paints a totally different picture than VMmark. In this case, “only” 8 Opteron cores are needed to keep up with the six Xeons.  While right now the Xeon X5670 is ahead with a significant margin (34%) on the current six-core Opteron, an octal-core Opteron might be competitive, on the condition that AMD prices it right. 

We are proud to present you our first vApus Mark I on Hyper-V. One of the great advantages of our virtualization benchmark is that it runs on all popular hypervisors. Below we tested with Hyper-V R2 6.1.7600.16385 (21st of July 2009).

vAPUS Mark I 2 tile test - 24 vCPUs - Hyper-V

Hyper-V R2 performs well, very well. The scheduler prefers to work with a number of physical CPUs that can be easily divided among the virtual CPUs. Contrary to ESX, where the 16 logical cores of the Xeon X5570 prevail, Hyper-V prefers the twelve cores of the Opteron 2435, much to our surprise. It interesting to see that ESX seems to prefer the Nehalem based architectures much more than Hyper-V. With ESX the gap between the six-core Opteron and six-core Xeon is 34%. With Hyper-V, this shrinks to 15%.

Take our results with a grain of salt though, as this is the very first time we have run vApus Mark I on Hyper-V on different architectures. We need more analyzing time to understand what is going on. My first bet is that ESX is very well optimized for the Nehalem architecture. This includes the excellent Hyper-threading optimizations and probably some optimizations to avoid one of the few Nehalem architecture limitations: the small “prefetch” (16 byte on Nehalem, 32 byte on Istanbul) and especially the relatively small TLB. That is pure speculation though, we will need more time to investigate this.

Virtualization & consolidation Final Words
POST A COMMENT

39 Comments

View All Comments

  • DigitalFreak - Tuesday, March 16, 2010 - link

    Are you seriously going to buy a dual socket server (or workstation at a minimum) to play games? I'd rather see them take the time to do more enterprise benchmarking than waste it on what 0.00001% of the market wants. Reply
  • Starglider - Wednesday, March 17, 2010 - link

    No but some HPC / CAD / scientific computing benchmarks would be good. Presumably we'll get the full suite when Nehalem EX and Magny Cours turn up. Reply
  • vitchilo - Tuesday, March 16, 2010 - link

    I want to encode video, I mean a s***load of video + play games from time to time. Reply
  • Starglider - Tuesday, March 16, 2010 - link

    > You can now use up to two DIMMs at 1333MHz,
    > while the Xeon 5500 would throttle back to
    > 1066MHz if you did this.

    Presumably you mean 'up to two DIMMs per channel'?
    Reply
  • DigitalFreak - Tuesday, March 16, 2010 - link

    Not sure about the 2 DIMMs per channel forcing 1066Mhz. We've been ordering Dell R710s with the X5570 and 12x4GB of memory, which runs at 1333Mhz. Reply
  • TurboMax3 - Wednesday, March 17, 2010 - link

    You are right. I work for Dell, since a couple of months after the launch of the 5500 Xeons we could do 2 DIMM per Channel (DPC) at 1333 MHz. It is a property of the chipset, rather than the CPU.

    Also, going to 3 DPC will clock the memory down to 800 MHz, and this has been available in R710 (and similar products from others) for some time now.

    The 8GB DIMM is getting cheap enough to be quoted without shame. 16 GB DIMMS still cost as much as my car.
    Reply
  • Navier - Tuesday, March 16, 2010 - link

    Do you have information on Nehalem-EX and how that is going to fit in the updated road map with the latest 6 core systems? Reply
  • DigitalFreak - Tuesday, March 16, 2010 - link

    The Nehalem-EX (probably called the Xeon 7500 series) are for quad socket boxes. From what I've been hearing, they should be released on 3/30. Not sure when the Poweredge R910 and Proliant DL580 G7 will show up though. Reply
  • duploxxx - Wednesday, March 17, 2010 - link

    it is launched on 30/3 but actually only available mid june, call it a paper launch or whatever you want. Reply
  • yinan - Tuesday, March 16, 2010 - link

    Bah 6 cores. 8 sockets by 8 cores is where it is at :) Reply

Log in

Don't have an account? Sign up now