Mac OS X versus Linux

Lmbench 2.04 provides a suite of micro benchmarks that measure the bottlenecks at the Unix operating system and CPU level. This makes it very suitable for testing the theory that Mac OS X might be the culprit for the terrible server performance of the Apple platform.

Signals allow processes (and thus threads) to interrupt other processes. In a database system such as MySQL 4.x where so many processes/threads (60 in our MySQL screenshot) and many accesses to the kernel must be managed, signal handling is a critical performance factor.

Larry McVoy (SGI) and Carl Staelin (HP):
" Lmbench measure both signal installation and signal dispatching in two separate loops, within the context of one process. It measures signal handling by installing a signal handler and then repeatedly sending itself the signal."
Host OS Mhz null null
call
open
I/O
stat slct
clos
sig
TCP
sig
inst
Xeon 3.06 GHz Linux 2.4 3056 0.42 0.63 4.47 5.58 18.2 0.68 2.33
G5 2.7 GHz Darwin 8.1 2700 1.13 1.91 4.64 8.60 21.9 1.67 6.20
Xeon 3.6 GHz Linux 2.6 3585 0.19 0.25 2.30 2.88 9.00 0.28 2.70
Opteron 850 Linux 2.6 2404 0.08 0.17 2.11 2.69 12.4 0.17 1.14

All numbers are expressed in microseconds, lower is thus better. First of all, you can see that kernel 2.6 is in most cases a lot more efficient. Secondly, although this is not the most accurate benchmark, the message is clear: the foundation of Mac OS X server, Darwin handles the signals the slowest. In some cases, Darwin is even several times slower.

As we increase the level of concurrency in our database test, many threads must be created. The Unix process/thread creation is called "forking" as a copy of the calling process is made.

lmbench "fork" measures simple process creation by creating a process and immediately exiting the child process. The parent process waits for the child process to exit. The benchmark is intended to measure the overhead for creating a new thread of control, so it includes the fork and the exit time.

lmbench "exec" measures the time to create a completely new process, while " sh" measures to start a new process and run a little program via /bin/ sh (complicated new process creation).

Host OS Mhz fork
hndl
exec
proc
Sh
proc
Xeon 3.06 GHz Linux 3056 163 544 3021
G5 2.7 GHz Darwin 2700 659 2308 4960
Xeon 3.6 GHz Linux 3585 158 467 2688
Opteron 850 Linux 2404 125 471 2393

Mac OS X is incredibly slow, between 2 and 5(!) times slower, in creating new threads, as it doesn't use kernel threads, and has to go through extra layers (wrappers). No need to continue our search: the G5 might not be the fastest integer CPU on earth - its database performance is completely crippled by an asthmatic operating system that needs up to 5 times more time to handle and create threads.

Mac OS X: beautiful but… Workstation, yes; Server, no.
Comments Locked

116 Comments

View All Comments

  • exdeath - Friday, June 3, 2005 - link

    Wow look at a 2.4 GHz Opteron clean house.

    I'd like to see what a 2.6 GHz FX-55 with unregistered memory would do ;) I'll be fair and say keep it at 2.6 GHz stock ;)
  • bersl2 - Friday, June 3, 2005 - link

    Right. GCC 4.0 has an all new optimization framework, including autovectorization:

    http://gcc.gnu.org/projects/tree-ssa/vectorization...
  • Pannenkoek - Friday, June 3, 2005 - link

    It is well known that GCC 3.3 can't vectorize code. However, GCC 4 should be able to, eventually if not already.

    The small cache of the G5 would hamper its server performance I'd reckon, regardless of other factors.
  • jimbailey - Friday, June 3, 2005 - link

    I'm curious if you rebuilt Apache and MySQL from source. Apple has added significant amount of optimization to gcc and I would love to know if it has been included in this test. I don't doubt the results though. The trade off for using the Mach micro-kernel is well known.
  • rubikcube - Friday, June 3, 2005 - link

    Johan, I agree that all the facts point to your conclusions being accurate. I would bet all the money in the world that you are correct. However, this hypothesis is easily confirmed by running mysql on a G5 running linux.
  • Olaf van der Spek - Friday, June 3, 2005 - link

    > In Unix, this is done with a Syscall, and it results in two context switches (the CPU has to swap out one process for another)

    Does it?
    As far as I know it doesn't. The page tables don't need to be swapped and neither does the CPU state. The CPU gets access to the kernel-data because it goes to kernel-mode, but that doesn't require a full context switch I think.
  • WileCoyote - Friday, June 3, 2005 - link

    Tough crowd...
  • Eug - Friday, June 3, 2005 - link

    Of the stuff I understand, I agree with your conclusions, but I think it's reasonable to state that running Linux on the G5 yourself would have been the most definitive test.

    Anyways, I like fusion food. :)
  • cHodAXUK - Friday, June 3, 2005 - link

    Great article, very educational read and it was very interesting to see what is holding the G5 back. IBM/Apple really need to address these issues, people are paying alot of money for G5's that are dilvering nowhere near the level of performance that they *theoretically* should be.
  • Netopia - Friday, June 3, 2005 - link

    WOW... great article.

    I too would like to see Yellow Dog (Or FC4) loaded on the G5 for a true head-to-head. I hope you have the time with the box to get 'er done!

    Joe

Log in

Don't have an account? Sign up now