10Gbit Ethernet: Killing Another Bottleneck?by Johan De Gelas on March 8, 2010 12:00 PM EST
- Posted in
- IT Computing
Network Performance in ESX 4.0 Update 1
We used up to eight VMs, and each was assigned an “endpoint” in the Ixia IxChariot network test. This way we could measure the total network throughput that is possible to achieve with one, four or eight VMs. Making use of ESX NetQueue, the cards should be able to leverage their separate queues and the hardware Layer 2 “switch”.
First, we test with NetQueue disabled. The cards will behave like a card with only one Rx/Tx queue. To make the comparison more interesting, we added two dual-port gigabit NICs into the benchmark mix. Teamed NICs are currently by far the most cost effective way to increase network bandwidth.
The 10G cards show their potential. With four VMs, they are able to achieve 5 to 6Gbit/s. There is clearly a queue bottleneck: both 10G cards perform worse with eight VMs. Notice also that 4x1Gbit does very well. This combination has more queues and can cope well with the different network streams. Out of a maximum line speed of 4Gbit/s, we achieve almost 3.8Gbit/s with four and eight VMs. Now let's look at CPU load.
Once you need more than 1Gbit/s, you should pay attention to the CPU load. Four gigabit ports or one 10G port require 25~35% utilization of eight 2.9GHz Opterons cores. That means that you would need two or three cores dedicated just to keeping the network pipes filled. Let us see if NetQueue can do some magic.
The performance of the Neterion card improves a bit, but it's not really impressive (+8% in the best case). The Intel 82598 EB chip on the Supermicro 10G NIC is now achieving 9.5Gbit/s with eight VMs, very close to the theoretical maximum. The 4x1Gbit/s NIC numbers were repeated in this graph for reference (no NetQueue was available).
So how much CPU power did these huge network streams absorb?
The Neterion driver does not seem to be optimized for ESX 4. Using NetQueue should lower CPU load, not increase it. The Supermicro/Intel 10G combination shows the way. It delivers twice as much bandwidth at half the CPU load compared to the two dual-port gigabit NICs.