Hardware Virtualization: the Nuts and Boltsby Johan De Gelas on March 17, 2008 3:00 AM EST
- Posted in
- IT Computing
Much has been written about kernels, but it remains one of the most confusing subjects. Some publications give the impression that the kernel is some kind of "overlord" process that is always watching in the background. This is wrong of course, because this would mean that modern multitasking operating systems would not work on a single-threaded, single-core CPU. When only one thread can be active at a given time, how can the OS keep control?
A kernel is just another process that gets time slices from the multitasking CPU. The difference from other processes is that it has privileged access to CPU instructions that other processes don't have. Therefore, "normal" (user) processes will have to switch to the kernel to perform a privileged task like getting access to the hardware. If they don't, the CPU will cause an exception and the kernel will take over anyway. At the same time, a scheduler of the kernel uses the timer interrupt to intervene from time to time, making sure that no process tries to keep the CPU to itself (preemption) for too long. You could also say that the CPU is forced to load the OS scheduler process from time to time.
A system call is thus the result of a user application that requests a service of the kernel. x86 provides a very low latency way to get system calls done: SYSENTER (or SYSCALL) and SYSEXIT. A system call will give the Virtual Machine Monitor, especially with binary Translation (BT), quite a bit of extra work. As we have stated before, software virtualization techniques (such as BT) place the (32-bit) operating system at a slightly less privileged ring than normal (1 instead of 0). The problem is that a SYSENTER (request the service of the kernel) is sent to a page with privilege 0. It expects to find the operating system but it arrives in the VMM. So the VMM has to emulate every system call, translate the code, and then hand over the control to the translated kernel code which runs in ring 1.
A system call is a lot more complex when it happens on a virtualized machine.
When the binary translated guest OS code is done, it will use a SYSEXIT to return to the user application. However, the guest OS is running at level one and doesn't have the necessary privileges to perform SYSEXIT, so the CPU faults to the level zero and the VMM has to emulate what the guest OS should have done. It is clear that system calls cause a lot of overhead. A system call on virtualized machine will cost roughly 10 times more than on a native machine. Engineers at VMware measured on a 3.8 GHz Pentium 4 :
native system calls takes 242 cycles
- A binary translated one with the 32-bit guest OS running on ring 1 takes 2308 cycles
If you have a few of those virtualized machines running, system calls are suddenly much more than the background noise they were on a modern OS running on a native machine.