10Gbit Ethernet: Killing Another Bottleneck?

Name: 10Gbit Ethernet: Killing Another Bottleneck?
Item: 10Gbit Ethernet: Killing Another Bottleneck?
Author: Johan De Gelas

by Johan De Gelas on March 8, 2010 12:00 PM EST

Posted in
IT Computing

49 Comments | Add A Comment

49 Comments

Delving Deeper

Let us take a closer look at the Neterion and Intel 10G chips configuration on VMware’s vSphere/ESX platform. First, we checked what the S2IO driver of Neterion did when ESX was booting.

If you look closely, you can see that eight Rx queues are recognized, but only one Tx queue. Compare this to the Intel ixgbe driver:

Eight Tx and Rx queues are recognized, one for each VM. This is also confirmed when we start up the VMs. Each VM gets its own Rx and Tx queue. The Xframe-E has eight transmit and eight receive paths, but it seems that for some reason the driver is not able to use the full potential of the card on ESX 4.0.

Conclusion

The goal of this short test was to discover the possibilities of 10 Gigabit Ethernet in a virtualized server. If you have suggestion for more real world testing, let us know.

CX4 is still the only affordable option that comes with reasonable power consumption. Our one-year-old dual-port CX4 card consumes only 6.5W; a similar 10GBase-T solution would probably need twice as much. The latest 10GBase-T (4W instead of >10W per port) advancements are very promising, as we might see power efficient 10G cards with CAT-6 UTP cables this year.

The Neterion Xframe-E could not fulfill the promise of near 10Gbit speeds at low CPU utilization, but our test can only give a limited indication. It is rather weird, as the card we tested was announced as one of the first to support NetQueue in ESX 3.5. We can only guess that driver support for ESX 4.0 is not optimal (yet). The Xframe X3100 is Neterion’s most advanced product and the spec sheet emphasizes its VMware NetQueue support. Neterion ships mostly to OEMs, so it is hard to get an idea of the pricing. When you spec your HP, Dell or IBM server for ESX 4.0 virtualization purposes, it is probably a good idea to check if the 10G Ethernet card is not an older Neterion card.

At a price of about $450-$550, the Supermicro AOC-STG-I2 dual-port with the Intel 82598EB chip is a very attractive solution. Typically, a quad-port gigabit Ethernet solution will cost you half as much, but it delivers only half the bandwidth at twice the CPU load in a virtualized environment.

In general, we would advise going with link aggregation of quad-port gigabit Ethernet ports in native mode (Linux, Windows) for non-virtualized servers. For heavily loaded virtualized servers, 10Gbit CX4 based cards are quite attractive. CX4 uplinks cost about $400-$500; switches with 24 Gbit RJ-45 ports and two CX4 uplinks are in the $1500-$3000 range. 10Gbit is no longer limited to the happy few but is a viable backbone technology.

This article would not have been possible without the help of my colleague Tijl Deneut.

Network Performance in ESX 4.0 Update 1

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

49 Comments

View All Comments

RequiemsAllure - Tuesday, March 9, 2010 - link
So, basically what these cards are doing (figuratively speaking) they are taking in"multiplexing" 8 or 16 requests (how however many virtual queues) together into a single NIC sorting (demultiplexing) them to a respective VM the VM then takes care of the request and sends it on its way.

can anyone tell me if i got this right?
has407 - Wednesday, March 10, 2010 - link
Yes, I think you've got it... that's pretty much how it works. At the risk of oversimplifying... these cards are like a multi-port switch with 10Gbe uplinks.

Consider a physical analog (depending on the card, and not exact but close enough): 8/16x 1Gbe ports on the server connected to a switch with 8/16x 1Gbe ports and 1/2x 10Gbe uplinks to the backbone.

Now replace that with a card on the server and 1/2x 10Gbe backbone ports. Port/switch/cable consolidation ratios of 8:1 or 16:1 can save serious $$$ (and with better/dynamic bandwidth allocation).

The typical sticking point is that 10Gbe switches/routers are still quite expensive, and unless you've got a critical mass of 10Gbe, the infrastructure cost can be a tough hump to get over.
LuxZg - Tuesday, March 9, 2010 - link
I've got to admit that I've skimped through the article (and first page ad a half of commnts).. But it seems through your testing & numbers that you haven't used a dedicated NIC for every card in the 4x 1Gbit example (4 VMs test), otherwise you'd get lower CPU numbers simly because you skip on the load scheduling that's done on CPU.

Any "VM expert" will tell you that you have 3 basic bottlenecks in any VM server:
- RAM (the more the better, mostly not a problem)
- disks (again, more is better, and absolutele minimum is at least one drive per VM)
- NICs

For NICs basic rule would be - if VM is loaded with network-heavy application, than VM should have a dedicated NIC. CPU utilization drops heavily, and NIC utilization is higher.

Having one 10Gbit NIC shared among 8 VMs which are all bottlenecked by NICs means you have your 35% CPU load. With one NIC dedicated to each VM you'd have CPU load near zero at file-copy loads (NIC has hardware scheduler, disc controller has the same for HDDs).

Like I've said, maybe I've overlooked something in article, but it seems to me your test are based on wrong assumptions. Besides, if you've got 8 file servers as VM, you've got an unnecessary overhead as well, it's one application (file serving) so no need to virtualize to 8 VMs on same hardware.

As a conclusion, VMs are all about planning, so I believe your test had a wrong approach.
JohanAnandtech - Tuesday, March 9, 2010 - link
"a dedicated NIC for every VM"

That might be the right approach when you have a few VMs on the server, but it does not seem to be reasonable when you have tens of VMs running. What do you mean by dedicating? pass-through? port grouping? Only Pass-through has near zero CPU load AFAIK, and I don't see many scenarios where pass-through is handy.

Also, if you use dedicated NICs for network intensive apps, that means that you can not use that bandwidth for the occasional spike in another "non NIC priviledged" VM.

It might not be feasible at all if you use DRS or Live migration.

The whole point of VMDQ is to offer the bandwidth necessary to the VM that needs it (for example give one VM 5 GBit/s, One VM 1 gbit/s and the others only 1 Mbit/s) and that the layer 2 routing overhead is mostly on the NIC. It seems to me that the planning you promote is very inflexible and I can see several scenario's where dedicated NICs will perform worse than one big pipe which can be load balanced accross the different VMs.
LuxZg - Wednesday, March 10, 2010 - link
Yes, I meant "dedicated" as "pass-through".

Yes, there are several scenarios where "one big" is better than several small ones, but think if 35% CPU load (and that's 35% of a very-expensive-CPU) is worth as sacrifice to have a reserve for few occasional spikes.

I do agree that putting several VMs on one NIC is ok, but that's for applications that aren't loaded with heavy network transfers. VM load balancing should be done for example like this (just a stupid example, don't hold onto it too hard):
- you have file server as one VM
- you have mail server on second VM
- you have some CPU-heavy app on separate VM

File server is heavy on networking and disc subsystem, but almost none on RAM/CPU. Mail server is dependant on several variables (antiSPAM, antivirus, amount of mailboxes & incoming mail, etc), so it can be light-to-heavy load for all subsystems. For this example let's say it's a lighter kind of load. Let's say this hardware machine has 2 NICs. You've got few CPUs with multiple cores, and plenty of disc/RAM. So what's right to do? Adding a CPU intensive VM, so that CPU isn't idle too much. You dedicate one NIC to file server, and you let mail server share NIC with CPU-intensive VM. That way file server has enough bandwidth that isn't taxing CPU to 35% cos of stupid virtual routing of great amounts of network packets, CPU is left mostly free for the CPU-intensive VM, and mail server happily lives in between the two, as it will be satisfied with leftover CPU and networking..

Now scale that to 20-30 VMs, and all you need is 10 NICs. For VMs that aren't network dependant you put them on "shared NICs", and for network-intensive apps you give those VMs dedicated NIC.

Just remember - 35% of a multi-socket & multi-core server is a huge expense, when you can do it on a dedicated NIC. NIC is, was, and will be much more cost effective for doing network packet scheduling than CPU.. Why pay several thousand $$$ for CPU if all you need is another NIC.
LuxZg - Tuesday, March 9, 2010 - link
I hate my own typos.. 2nd sentence.. "dedicated NIC for every VM" .. not "for every card".. probably there are more nonsense.. I'm in a hurry, sorry ppl!
anakha32 - Tuesday, March 9, 2010 - link
All the new 10G kit appears to be coming with SFP+ connectors. They can be used either with a transceiver for optical, or a pre-terminated copper cable (known as 'SFP+ Direct Attach').
CX4 seems to be deprecated as the cables are quite big and cumbersome.
zypresse - Tuesday, March 9, 2010 - link
I've seen some mini-Clusters (3-10 machines) lately with ethernet interconnects. Although I doubt that this is best solution, it would be nice to know how 10G ethernet actually performs in that area.
Calin - Tuesday, March 9, 2010 - link
I don't find a power use of <10W for a 10Gb link such a bad compromise over 0.5W per 1Gb Ethernet link (assuming that you can use that 10Gb link at close to maximum capacity). If nothing else, you're trading two 4-port 1Gb network cards for one 10Gb card.
MGSsancho - Tuesday, March 9, 2010 - link
Suns 40BGs adapters are not terribly expensive (start at $1500.) apparently they support 8 virtual lanes? So Mellanox provides Sun their silicon. went to their site and they do have other silicon/cards that explicitly state they support Virtual Protocol Interconnect. I'm curious if this is the same thing. I know you stated that the need really isn't there but would be interesting to see if you can ask for testing samples or look into the viability of Infiniband. Looking at their partners page they provide the silicon for xsigo as a previous poster stated. Again would be nice to see if 40Gb Infiniband with and without VPI technologies is superior to 10Gb Ethernet with acceleration as you provided with us today. For SANs, anything to lower latency for iscsi is desired. Perhaps spending a little for reduced latency on the network layer makes it worth the extra price for faster transactions? So many possibilities! Thank you for all the insightful research you have provided us!

10Gbit Ethernet: Killing Another Bottleneck?

Post Your Comment

49 Comments

View All Comments

RequiemsAllure - Tuesday, March 9, 2010 - link

has407 - Wednesday, March 10, 2010 - link

LuxZg - Tuesday, March 9, 2010 - link

JohanAnandtech - Tuesday, March 9, 2010 - link

LuxZg - Wednesday, March 10, 2010 - link

LuxZg - Tuesday, March 9, 2010 - link

anakha32 - Tuesday, March 9, 2010 - link

zypresse - Tuesday, March 9, 2010 - link

Calin - Tuesday, March 9, 2010 - link

MGSsancho - Tuesday, March 9, 2010 - link

Log in

Don't have an account? Sign up now