Network Performance in ESX 4.0 Update 1

We used up to eight VMs, and each was assigned an “endpoint” in the Ixia IxChariot network test. This way we could measure the total network throughput that is possible to achieve with one, four or eight VMs. Making use of ESX NetQueue, the cards should be able to leverage their separate queues and the hardware Layer 2 “switch”.

First, we test with NetQueue disabled. The cards will behave like a card with only one Rx/Tx queue. To make the comparison more interesting, we added two dual-port gigabit NICs into the benchmark mix. Teamed NICs are currently by far the most cost effective way to increase network bandwidth.

NIC performance on ESX 4.0, no NetQueue

The 10G cards show their potential. With four VMs, they are able to achieve 5 to 6Gbit/s. There is clearly a queue bottleneck: both 10G cards perform worse with eight VMs. Notice also that 4x1Gbit does very well. This combination has more queues and can cope well with the different network streams. Out of a maximum line speed of 4Gbit/s, we achieve almost 3.8Gbit/s with four and eight VMs. Now let's look at CPU load.

CPU load on ESX 4.0, no NetQueue

Once you need more than 1Gbit/s, you should pay attention to the CPU load. Four gigabit ports or one 10G port require 25~35% utilization of eight 2.9GHz Opterons cores. That means that you would need two or three cores dedicated just to keeping the network pipes filled. Let us see if NetQueue can do some magic.

NIC performance on ESX 4.0, NetQueue enabled

The performance of the Neterion card improves a bit, but it's not really impressive (+8% in the best case). The Intel 82598 EB chip on the Supermicro 10G NIC is now achieving 9.5Gbit/s with eight VMs, very close to the theoretical maximum. The 4x1Gbit/s NIC numbers were repeated in this graph for reference (no NetQueue was available).

So how much CPU power did these huge network streams absorb?

CPU load on ESX 4.0, NetQueue enabled

The Neterion driver does not seem to be optimized for ESX 4. Using NetQueue should lower CPU load, not increase it. The Supermicro/Intel 10G combination shows the way. It delivers twice as much bandwidth at half the CPU load compared to the two dual-port gigabit NICs.

The Hardware Wrap-Up
Comments Locked

49 Comments

View All Comments

  • RequiemsAllure - Tuesday, March 9, 2010 - link

    So, basically what these cards are doing (figuratively speaking) they are taking in"multiplexing" 8 or 16 requests (how however many virtual queues) together into a single NIC sorting (demultiplexing) them to a respective VM the VM then takes care of the request and sends it on its way.

    can anyone tell me if i got this right?
  • has407 - Wednesday, March 10, 2010 - link

    Yes, I think you've got it... that's pretty much how it works. At the risk of oversimplifying... these cards are like a multi-port switch with 10Gbe uplinks.

    Consider a physical analog (depending on the card, and not exact but close enough): 8/16x 1Gbe ports on the server connected to a switch with 8/16x 1Gbe ports and 1/2x 10Gbe uplinks to the backbone.

    Now replace that with a card on the server and 1/2x 10Gbe backbone ports. Port/switch/cable consolidation ratios of 8:1 or 16:1 can save serious $$$ (and with better/dynamic bandwidth allocation).

    The typical sticking point is that 10Gbe switches/routers are still quite expensive, and unless you've got a critical mass of 10Gbe, the infrastructure cost can be a tough hump to get over.
  • LuxZg - Tuesday, March 9, 2010 - link

    I've got to admit that I've skimped through the article (and first page ad a half of commnts).. But it seems through your testing & numbers that you haven't used a dedicated NIC for every card in the 4x 1Gbit example (4 VMs test), otherwise you'd get lower CPU numbers simly because you skip on the load scheduling that's done on CPU.

    Any "VM expert" will tell you that you have 3 basic bottlenecks in any VM server:
    - RAM (the more the better, mostly not a problem)
    - disks (again, more is better, and absolutele minimum is at least one drive per VM)
    - NICs

    For NICs basic rule would be - if VM is loaded with network-heavy application, than VM should have a dedicated NIC. CPU utilization drops heavily, and NIC utilization is higher.

    Having one 10Gbit NIC shared among 8 VMs which are all bottlenecked by NICs means you have your 35% CPU load. With one NIC dedicated to each VM you'd have CPU load near zero at file-copy loads (NIC has hardware scheduler, disc controller has the same for HDDs).

    Like I've said, maybe I've overlooked something in article, but it seems to me your test are based on wrong assumptions. Besides, if you've got 8 file servers as VM, you've got an unnecessary overhead as well, it's one application (file serving) so no need to virtualize to 8 VMs on same hardware.

    As a conclusion, VMs are all about planning, so I believe your test had a wrong approach.
  • JohanAnandtech - Tuesday, March 9, 2010 - link

    "a dedicated NIC for every VM"

    That might be the right approach when you have a few VMs on the server, but it does not seem to be reasonable when you have tens of VMs running. What do you mean by dedicating? pass-through? port grouping? Only Pass-through has near zero CPU load AFAIK, and I don't see many scenarios where pass-through is handy.

    Also, if you use dedicated NICs for network intensive apps, that means that you can not use that bandwidth for the occasional spike in another "non NIC priviledged" VM.

    It might not be feasible at all if you use DRS or Live migration.

    The whole point of VMDQ is to offer the bandwidth necessary to the VM that needs it (for example give one VM 5 GBit/s, One VM 1 gbit/s and the others only 1 Mbit/s) and that the layer 2 routing overhead is mostly on the NIC. It seems to me that the planning you promote is very inflexible and I can see several scenario's where dedicated NICs will perform worse than one big pipe which can be load balanced accross the different VMs.





  • LuxZg - Wednesday, March 10, 2010 - link

    Yes, I meant "dedicated" as "pass-through".

    Yes, there are several scenarios where "one big" is better than several small ones, but think if 35% CPU load (and that's 35% of a very-expensive-CPU) is worth as sacrifice to have a reserve for few occasional spikes.

    I do agree that putting several VMs on one NIC is ok, but that's for applications that aren't loaded with heavy network transfers. VM load balancing should be done for example like this (just a stupid example, don't hold onto it too hard):
    - you have file server as one VM
    - you have mail server on second VM
    - you have some CPU-heavy app on separate VM

    File server is heavy on networking and disc subsystem, but almost none on RAM/CPU. Mail server is dependant on several variables (antiSPAM, antivirus, amount of mailboxes & incoming mail, etc), so it can be light-to-heavy load for all subsystems. For this example let's say it's a lighter kind of load. Let's say this hardware machine has 2 NICs. You've got few CPUs with multiple cores, and plenty of disc/RAM. So what's right to do? Adding a CPU intensive VM, so that CPU isn't idle too much. You dedicate one NIC to file server, and you let mail server share NIC with CPU-intensive VM. That way file server has enough bandwidth that isn't taxing CPU to 35% cos of stupid virtual routing of great amounts of network packets, CPU is left mostly free for the CPU-intensive VM, and mail server happily lives in between the two, as it will be satisfied with leftover CPU and networking..

    Now scale that to 20-30 VMs, and all you need is 10 NICs. For VMs that aren't network dependant you put them on "shared NICs", and for network-intensive apps you give those VMs dedicated NIC.

    Just remember - 35% of a multi-socket & multi-core server is a huge expense, when you can do it on a dedicated NIC. NIC is, was, and will be much more cost effective for doing network packet scheduling than CPU.. Why pay several thousand $$$ for CPU if all you need is another NIC.
  • LuxZg - Tuesday, March 9, 2010 - link

    I hate my own typos.. 2nd sentence.. "dedicated NIC for every VM" .. not "for every card".. probably there are more nonsense.. I'm in a hurry, sorry ppl!
  • anakha32 - Tuesday, March 9, 2010 - link

    All the new 10G kit appears to be coming with SFP+ connectors. They can be used either with a transceiver for optical, or a pre-terminated copper cable (known as 'SFP+ Direct Attach').
    CX4 seems to be deprecated as the cables are quite big and cumbersome.
  • zypresse - Tuesday, March 9, 2010 - link

    I've seen some mini-Clusters (3-10 machines) lately with ethernet interconnects. Although I doubt that this is best solution, it would be nice to know how 10G ethernet actually performs in that area.
  • Calin - Tuesday, March 9, 2010 - link

    I don't find a power use of <10W for a 10Gb link such a bad compromise over 0.5W per 1Gb Ethernet link (assuming that you can use that 10Gb link at close to maximum capacity). If nothing else, you're trading two 4-port 1Gb network cards for one 10Gb card.
  • MGSsancho - Tuesday, March 9, 2010 - link

    Suns 40BGs adapters are not terribly expensive (start at $1500.) apparently they support 8 virtual lanes? So Mellanox provides Sun their silicon. went to their site and they do have other silicon/cards that explicitly state they support Virtual Protocol Interconnect. I'm curious if this is the same thing. I know you stated that the need really isn't there but would be interesting to see if you can ask for testing samples or look into the viability of Infiniband. Looking at their partners page they provide the silicon for xsigo as a previous poster stated. Again would be nice to see if 40Gb Infiniband with and without VPI technologies is superior to 10Gb Ethernet with acceleration as you provided with us today. For SANs, anything to lower latency for iscsi is desired. Perhaps spending a little for reduced latency on the network layer makes it worth the extra price for faster transactions? So many possibilities! Thank you for all the insightful research you have provided us!

Log in

Don't have an account? Sign up now