Real-world virtualization benchmarking: the best server CPUs compared

Name: Real-world virtualization benchmarking: the best server CPUs compared
Item: Real-world virtualization benchmarking: the best server CPUs compared
Author: Johan De Gelas

by Johan De Gelas on May 21, 2009 3:00 AM EST

Posted in
IT Computing

66 Comments | Add A Comment

66 Comments

Inquisitive Minds Want to Know

Tynopik, a nickname for one of our readers, commented: "Is Nehalem better at virtualization simply because it's a faster CPU? Or are the VM-specific enhancements making a difference?" For some IT professionals that might not matter, but many of our readers are very keen (rightfully so!) to understand the "why" and "how". Which characteristics make a certain CPU a winner in vApus Mark I? What about as we make further progress with our stress testing, profiling, and benchmarking research for virtualization in general?

Understanding how the individual applications behave would be very interesting, but this is close to impossible with our current stress test scenario. We give each of the four VMs four virtual CPUs, and there are only eight physical CPUs available. The result is that the VMs steal time from each other and thus influence each other's results. It is therefore easier to zoom in on the total scores rather than the individual scores. We measured the following numbers with ESXtop:

Dual Opteron 8389 2.9GHz CPU Usage
	Percentage of CPU Time
Web portal VM1	19.8
Web portal VM2	19.425
OLAP VM	27.2125
OLTP VM	27.0625
Total "Work"	93.5
"Pure" Hypervisor	1.9375
Idle	4.5625

The "pure" hypervisor percentage is calculated as what is left after subtracting the work that is done in the VMs and the "idle worlds". The work done in the VMs includes the VMM, which is part of the hypervisor. It is impossible, as far as we know, to determine the exact amount of time spent in the guest OS and in the hypervisor. That is the reason why we speak of "pure" hypervisor work: it does not include all the hypervisor work, but it is the part that happens in the address space of the hypervisor kernel.

Notice how the scheduler of ESX is pretty smart as it gives the more intensive OLAP and OLTP VMs more physical CPU time. You could say that those VMs "steal" a bit of time from the web portal VMs. The Nehalem based Xeons shows very similar numbers when it comes to CPU usage:

Dual Xeon X5570 CPU Usage (no Hyper-Threading)
	Percentage of CPU time
Web portal VM1	18.5
Web portal VM2	17.88
OLAP VM	27.88
OLTP VM	27.89
Total "Work"	92.14
"Pure" Hypervisor	1.2
Idle	6.66

With Hyper-Threading, we see something interesting. VMware ESXtop does not count the "Hyper-Threading CPUs" as real CPUs but does see that the CPUs are utilized better:

Dual Xeon X5570 CPU Usage (Hyper-Threading Enabled)
	Percentage of CPU time
Web portal VM1	20.13
Web portal VM2	20.32
OLAP VM	28.91
OLTP VM	28.28
Total "Work"	97.64
"Pure" Hypervisor	1.04
Idle	1.32

Idle time is reduced from 6.7% to 1.3%.

The Xeon 54XX: no longer a virtualization wretch

It's also interesting that VMmark tells us that the Shanghais and Nehalems are running circles around the relatively young Xeon 54xx platform, while our vApus Mark I tells us that while the Xeon 54xx might not be the first choice for virtualization, it is nevertheless a viable platform for consolidation. The ESXtop numbers you just saw gives us some valuable clues, and the Xeon 54xx "virtualization revival" is a result of the way we test now. Allow us to explain.

In our case, we have eight physical cores with four VMs and four vCPUs each. So on average the hypervisor has to allocate two physical CPUs to each virtual machine. ESXtop shows us that the scheduler plays it smart. In many cases, a VM gets one dual-core die on the Xeon 54xx, and cache coherency messages are exchanged via a very fast shared L2 cache. ESXtop indicates quite a few "core migrations" but never "socket migrations". In other words, the ESX scheduler keeps the virtual machines on the same cores as much as possible, keeping the L2 cache "warm". In this scenario, the Xeon 5450 can leverage a formidable weapon: the very fast and large 6MB that each two cores share. In contrast, two cores working on the same VM have to content themselves with a tiny 512KB L2 and a slower and a smaller L3 cache (4MB per two cores) on Nehalem. The way we tested right now is probably the best case for the Xeon 54xx Harpertown. We'll update with two and three tile results later.

Quad Opteron: room for more

Our current benchmark scenario is not taxing enough for a quad Opteron server:

Quad Opteron 8389 CPU Usage
	Percentage of CPU time
Web portal VM1	14.70625
Web portal VM2	14.93125
OLAP VM	23.75
OLTP VM	23.625
Total "Work"	77.0125
"Pure" Hypervisor	2.85
Idle	21.5625

Still, we were curious how a quad machine would handle our virtualization workload, even at 77% CPU load. Be warned that the numbers below are not accurate, but give some initial ideas.

Despite the fact that we are only using 77% of the four CPUs compared to the 94-97% on Intel, the quad socket machine remains out of reach of the dual CPU systems. The quad Shanghai server outperforms the best dual socket Intel by 31% and improves performance by 58% over its dual socket sibling. We expect that once we run with two or three "tiles" (8 or 12 VMs), the quad socket machine will probably outperform the dual shanghai by -- roughly estimated -- 90%. Again, this is a completely different picture than what we see in VMmark.

Analysis: "Nehalem" vs. "Shanghai" Caches, Memory Bandwidth, or Pure Clock Speed?

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

66 Comments

View All Comments

Bandoleer - Thursday, May 21, 2009 - link
I have been running Vmware Virtual Infrastructure for 2 years now. While this article can be useful for someone looking for hardware upgrades or scaling of a virtual system, CPU and memory are hardly the bottlenecks in the real world. I'm sure there are some organizations that want to run 100+ vm's on "one" physical machine with 2 physical processors, but what are they really running????

The fact is, if you want VM flexability, you need central storage of all your VMDK's that are accessible by all hosts. There is where you find your bottlenecks, in the storage arena. FC or iSCSI, where are those benchmarks? Where's the TOE vs QLogic HBA? Considering 2 years ago, there was no QLogic HBA for blade servers, nor does Vmware support TOE.

However, it does appear i'll be able to do my own baseline/benching once vSphere ie VI4 materializes to see if its even worth sticking with vmware or making the move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware.
But it would really be nice to see central storage benchmarks, considering that is the single most expensive investment of a virtual system.
duploxxx - Friday, May 22, 2009 - link
perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV.

ESX 3.5 does support Jumbo, iscsi offload adapters and no idea how you are going to gain 600% if iscsi is only about 15% slower then FC if you have decent network and dedicated iscsi box?????
Bandoleer - Friday, May 22, 2009 - link
"perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV. "

what you are calling functionality here are the same features that will not work in ESX4.0 in order to gain direct hardware access for performance.
Bandoleer - Friday, May 22, 2009 - link
The reality is I lost around 500MBps storage throughput when I moved from Direct Attached Storage. Not because of our new central storage, but because of the limitations of the driver-less Linux iSCSI capability or the lack there of. Yes!! in ESX 3.5 vmware added Jumbo frame support as well as flow control support for iSCSI!! It was GREAT, except for the part that you can't run JUMBO frames + flow control, you have to pick one, flow control or JUMBO.

I said 2 years ago there was no such thing as iSCSI HBA's for blade servers. And that ESX does not support the TOE feature of Multifunction adapters (because that "functionality" requires a driver).

Functionality you lose by moving to hyperV? In my case, i call them useless features, which are second to performance and functionality.
JohanAnandtech - Friday, May 22, 2009 - link
I fully agree that in many cases the bottleneck is your shared storage. However, the article's title indicated "Server CPU", so it was clear from the start that this article would discuss CPU performance.

"move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware. "

Can you back that up with a link to somewhere? Because the 600% sounds like an MS Advertisement :-).
Bandoleer - Friday, May 22, 2009 - link
My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish.

I wasn't ranting at the article, its great for what it is, which is what the title represents. I was responding to this part of the article that accidentally came out as a rant because i'm so passionate about virtualization.

"What about ESX 4.0? What about the hypervisors of Xen/Citrix and Microsoft? What will happen once we test with 8 or 12 VMs? The tests are running while I am writing this. We'll be back with more. Until then, we look forward to reading your constructive criticism and feedback.

Sorry, i meant to be more constructive haha...
JohanAnandtech - Sunday, May 24, 2009 - link
"My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish. "

Yes, please do. Very interested in to reading what you found.

"I wasn't ranting at the article, its great for what it is, which is what the title represents. "

Thx. no problem...Just understand that these things takes time and cooperation of the large vendors. And getting the right $5000 storage hardware in lab is much harder than getting a $250 videocard. About 20 times harder :-).
Bandoleer - Sunday, May 24, 2009 - link
I haven't looked recently, but high performance tiered storage was anywhere from $40k - $80k each, just for the iSCSI versions, the FC versions are clearly absurd.
solori - Monday, May 25, 2009 - link
Look at ZFS-based storage solutions. ZFS enables hybrid storage pools and an elegant use of SSDs with commodity hardware. You can get it from Sun, Nexenta or by rolling-your-own with OpenSolaris:

http://solori.wordpress.com/2009/05/06/add-ssd-to-...">http://solori.wordpress.com/2009/05/06/add-ssd-to-...
pmonti80 - Friday, May 22, 2009 - link
Still it would be interesting to see those central storage benchmarks or at least knowing if you will/won't be doing them for whatever reason.

Real-world virtualization benchmarking: the best server CPUs compared

Post Your Comment

66 Comments

View All Comments

Bandoleer - Thursday, May 21, 2009 - link

duploxxx - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

Bandoleer - Sunday, May 24, 2009 - link

solori - Monday, May 25, 2009 - link

pmonti80 - Friday, May 22, 2009 - link

Log in

Don't have an account? Sign up now