10G Ethernet: More Than a Big Pipeby Johan De Gelas on November 24, 2010 2:34 PM EST
Solving the Virtualization I/O Puzzle
Step One: IOMMU, VT-d
The solution for the high CPU load, higher latency, and lower throughput comes in three steps. The first solution was to bypass the hypervisor and assign a certain NIC directly to the network intensive VM. This approach gives several advantages. The VM has direct access to a native device driver, and as such can use every hardware acceleration feature that is available. The NIC is also not shared, thus all the queues and buffers are available to one VM.
However, even though the hypervisor lets the VM directly access the native driver, a virtual machine cannot bypass the hypervisor’s memory management. The guest OS inside that VM does not have access to the real physical memory, but to a virtual memory map that is managed by the hypervisor. So when the hypervisor sends out addresses to the driver, it sends out virtual addresses instead of the expected physical ones (the white arrow).
Intel solved this with VT-d, AMD with the “new” (*) IOMMU. The I/O hub translates the virtual or “guest OS fake physical” addresses (purple) into real physical addresses (blue). This new IOMMU also isolates the different I/O devices from each other by allocating different subsets of physical memory to the different devices.
Very few virtualized servers use this feature as it made virtual machine migration impossible. Instead of decoupling the virtual machine from the underlying hardware, direct assignment firmly connected the VM to the underlying hardware. So the AMD IOMMU and Intel VT-d technology is not that useful alone. It is just one of the three pieces of the I/O Virtualization puzzle.
(*) There is also an "old IOMMU" or Graphics Address Remapping Table which did address translations for letting the graphics card access the main memory.
Step Two: Multiple Queues
The next step was making the NIC a lot more powerful. Instead of letting the hypervisor sort all the received packages and send them off to the right VM, the NIC becomes a complete hardware switch that sorts all packages into multiple queues, one for each VM. This gives a lot of advantages.
Less interrupts and CPU load. If you let the hypervisor handle the packet switching, it means that CPU 0 (which is in most cases is the one doing the hypervisor tasks) is interrupted and has to examine the received package and determine the destination VM. That destination VM and the associated CPU has to be interrupted. With a hardware switch in the NIC, the package is immediately sent into the right queue, and the right CPU is immediately interrupted to come and get the package.
Less latency. A single queue for multiple VMs that receive and transmit packages can get overwhelmed and drop packets. By giving each VM its own queue, throughput is higher and latency is lower.
Although Virtual Machine Devices Queues solves a lot of problems, there is still some CPU overhead left. Every time the CPU of a certain VM is interrupted, the hypervisor has to copy the data from the hypervisor space into the VM memory space.