The Final Piece of the Puzzle: SR-IOV

The final step is to add a few buffers and Rx/Tx descriptors to each queue of your multi-queued device, and a single NIC can pretend to be a collection of tens of “small” NICs. That is what PCI SIG did, and they call each small NIC a virtual function. According to the PCI SIG SR-IOV specification you can have up to 256 (!) virtual functions per NIC. (Note: the SR-IOV specification is not limited to NICs; other I/O devices can be SR-IOV capable too.)


Courtesy of the excellent Youtube movie: "Intel SR-IOV"

Make sure there is a chipset with IOMMU/VT-d inside the system. The end result: each of those virtual functions can DMA packets in and out without any help of the hypervisor. That means that it is not necessary anymore for the CPU to copy the packages from the memory space of the NIC to the memory space of the VM. The VT-d/IOMMU capable chipset ensures that the DMA transfers of the virtual functions happen and do not interfere with each other. The beauty is that the VMs are connecting to these virtual functions by a standard paravirtualized driver (such as VMXnet in VMware), and as a result you should be able to migrate VMs without any trouble.

There you have it: all puzzles pieces are there. Multiple queues, virtual to physical address translation for DMA transfers, and a multi-headed NIC offer you higher throughput, lower latency, and lower CPU overhead than emulated hardware. At the same time, they offer the two advantages that made virtualized emulated hardware so popular: the ability to share one hardware device across several VMs and the ability to decouple the virtual machine from the underlying hardware.

SR-IOV Support

Of course, this is all theory until all software and hardware layers work together to support this. You need a VT-d or IOMMU chipset, the motherboard’s BIOS has to adapted to recognize all those virtual functions, and each virtual function must get memory mapped IO space like other PCI devices. A hypervisor that supports SR-IOV is also necessary. Last but not least, the NIC vendor has to provide you with an SR-IOV capable driver for the operating system and hypervisor of your choice.

With some help of mighty Intel, the opensource hypervisors (Xen, KVM) and the commercial product derivatives (Redhat, Citrix) were first to market with SR-IOV. At the end of 2009, both Xen and KVM had support for SR-IOV, more specifically for Intel 10G Ethernet 82599 controller. The Intel 82599 can offer up to 64 VFs. Citrix announced support for SR-IOV in Xenserver 5.6, so the only ones missing in action are VMware’s ESX and Microsoft’s Hyper-V.

Solving the Virtualization I/O Puzzle Meet the NICs
Comments Locked

38 Comments

View All Comments

  • Kahlow - Friday, November 26, 2010 - link

    Great article! The argument between fiber and 10gig E is interesting but from what I have seen it is extremely application and workload dependant that you would have to have a 100 page review to be able to figure out what media is better for what workload.
    Also, in most cases your disk arrays are the real bottleneck and max’ing your 10gig E or your FC isn’t the issue.

    It is good to have a reference point though and to see what 10gig translates to under testing.

    Thanks for the review,
  • JohanAnandtech - Friday, November 26, 2010 - link

    Thanks.

    I agree that it highly depends on the workload. However, there are lots and lots of smaller setups out there that are now using unnecessarily complicated and expensive setups (several physical separated GbE and FC). One of objective was to show that there is an alternative. As many readers have confirmed, a dual 10GbE can be a great solution if your not running some massive databases.
  • pablo906 - Friday, November 26, 2010 - link

    It's free and you can get it up and running in no time. It's gaining a tremendous amount of users because of the recent Virtual Desktop licensing program Citrix pushed. You could double your XenApp (MetaFrame Presentation Server) license count and upgrade them to XenDesktop for a very low price, cheaper than buying additonal XenApp licenses. I know of at least 10 very large organizations that are testing XenDesktop and preparing rollouts right now.

    What gives. VMWare is not the only Hypervisor out there.
  • wilber67 - Sunday, November 28, 2010 - link

    Am I missing something in some of the comments?
    Many are discussing FCoE and I do not believe any of the NICs tested were CNAs, just 10GE NICs.
    FCoE requires a CNA (Converged Network Adapter). Also, you cannot connect them to a garden variety 10GE switch and use FCoE. . And, don't forget that you cannot route FCoE.
  • gdahlm - Sunday, November 28, 2010 - link

    You can use software initiators on switches which support 802.3X flow control. Many web managed switches do support 802.3X as do most 10GE adapters.

    I am unsure how that would effect performance at in a virtualized shared environment as I believe it pauses on the port level.

    If you workload is not storage or network bound it would work but I am betting that when you hit that hard knee in your performance curve that things get ugly pretty quick.
  • DyCeLL - Sunday, December 5, 2010 - link

    To bad HP virtual connect couldn't be tested (a blade option).
    It splits the 10GB nics in a max of 8 Nics for the blades. It can do it for fiber and ethernet.
    Check: http://h18004.www1.hp.com/products/blades/virtualc...
  • James5mith - Friday, February 18, 2011 - link

    I still think that 40Gbps Infiniband is the best solution. By far it seems to be the best $/Gbps ratio of any of the platforms. Not to mention it can pass pretty much any traffic type you want.
  • saah - Thursday, March 24, 2011 - link

    I loved the article.

    I just reminded myself that VMware published official drivers for the ESX4 recently: http://downloads.vmware.com/d/details/esx4x_intel_...
    The ixgbe version is 3.1.17.1.
    Since the post says that "enables support for products based on the Intel 82598 and 82599 10 Gigabit Ethernet Controllers." I would like to see the test redone with an 82599-based card and recent drivers.
    Would it be feasible?

Log in

Don't have an account? Sign up now