Mac OS X: beautiful but...

The Mac OS X (Server) operating system can't be described easily. Apple:

"Mac OS X Server starts with Darwin, the same open source foundation used in Mac OS X, Apple's operating system for desktop and mobile computers. Darwin is built around the Mach 3.0 microkernel, which provides features critical to server operations, such as fine-grained multi-threading, symmetric multiprocessing (SMP), protected memory, a unified buffer cache (UBC), 64-bit kernel services and system notifications. Darwin also includes the latest innovations from the open source BSD community, particularly the FreeBSD development community."

While there are many very good ideas in Mac OS X, it reminds me a lot of fusion cooking, where you make a hotch-potch of very different ingredients. Let me explain.


Hexley the platypus, the Darwin mascot

Darwin is indeed the open Source project around the Mach kernel 3.0. This operating system is based around the idea of a microkernel, a kernel that only contains the essence of the operating system, such as protected memory, fine-grained multithreading and symmetric multiprocessing support. This in contrast to "monolithic" operating systems, which have all of the code in a single large kernel.

Everything else is located in smaller programs, servers, which communicate with each other via ports and an IPC (Inter Process Communication) system. Explaining this in detail is beyond the scope of this article (read more here). But in a nutshell, a Mach microkernel should be more elegant, easier to debug and better at keeping different processes from writing in eachother's protected memory areas than our typical "monolithic" operating systems such as Linux and Windows NT/XP/2000. The Mach microkernel was believed to be the future of all operating systems.

However, you must know that applications (in the userspace) need, of course, access to the services of the kernel. In Unix, this is done with a Syscall, and it results in two context switches (the CPU has to swap out one process for another): from the application to the kernel and back.

The relatively complicated memory management (especially if the server process runs in user mode instead of kernel) and IPC messaging makes a call to the Mach kernel a lot slower, up to 6 times slower than the monolithic ones!

It also must be remarked that, for example, Linux is not completely a monolithic OS. You can choose whether you like to incorporate a driver in the kernel (faster, but more complex) or in userspace (slower, but the kernel remains slimmer).

Now, while Mac OS X is based on Mach 3, it is still a monolithic OS. The Mach microkernel is fused into a traditional FreeBSD "system call" interface. In fact, Darwin is a complete FreeBSD 4.4 alike UNIX and thus monolithic kernel, derived from the original 4.4BSD-Lite2 Open Source distribution.

The current Mac OS X has evolved a bit and consists of a FreeBSD 5.0 kernel (with a Mach 3 multithreaded microkernel inside) with a proprietary, but superb graphical user interface (GUI) called Aqua.

Performance problems

As the mach kernel is hidden away deep in the FreeBSD kernel, Mach (kernel) threads are only available for kernel level programs, not applications such as MySQL. Applications can make use of a POSIX thread (a " pthread"), a wrapper around a Mach thread.


Mac OS X thread layering hierarchy (Courtesy: Apple)

This means that applications use slower user-level threads like in FreeBSD and not fast kernel threads like in Linux. It seems that FreeBSD 5.x has somewhat solved the performance problems that were typical for user-level threads, but we are not sure if Mac OS X has been able to take advantage of this.

In order to maintain binary compatibility, Apple might not have been able to implement some of the performance improvements found in the newer BSD kernels.

Another problem is the way threads could/can get access to the kernel. In the early versions of Mac OS X, only one thread could lock onto the kernel at once. This doesn't mean only one thread can run, but that only one thread could access the kernel at a given time. So, a rendering calculation (no kernel interaction) together with a network access (kernel access) could run well. But many threads demanding access to the memory or network subsystem would result in one thread getting access, and all others waiting.

This "kernel locked bottleneck" situation has improved in Tiger, but kernel locking is still very coarse. So, while there is a very fine grained multi-threading system (The Mach kernel) inside that monolithic kernel, it is not available to the outside world.

So, is Mac OS X the real reason why MySQL and Apache run so slow on the Mac Platform? Let us find out... with benchmarks, of course!

The G5 as Server CPU Mac OS X versus Linux
Comments Locked

116 Comments

View All Comments

  • Rosyna - Friday, June 3, 2005 - link

    Actually, for better or worse the GCC Apple includes is being used for most Mac OS X software. OS X itself was compiled with it.
  • elvisizer - Friday, June 3, 2005 - link

    rosyna's right.
    i'm just not sure if there IS anyway to do the kind of comparison you seem to've been shooting for (pure competition between the chips with as little else affecting the outcome as possible). you could use the 'special' compilers on each platform, but those aren't used for compiling most of the binaries you buy at compusa.
  • elvisizer - Friday, June 3, 2005 - link

    why didn't you run some tests with YD linux on the g5?!?!?!?!?!?!? you could've answered the questions you posed yourself!!!!!
    argh.
    and you definitly should've included after effects. "we don't have access to that software" what the heck is THAT about?? you can get your hands on a dual 3.6 xeon machine, a dual 2.5 gr, and adual 2.7 g5, and you can't buy a freaking piece of adobe software at retail?!?!?!?!?!
    some seroiusly weird decisions being made here.
    other than that, the article was ok. re-confirmed suspicions i've had for awhile about OS X server handling large numbers of thread. My OS X servers ALWAYS tank hard with lots of open sessions, so i keep them around only for emergencies. They are so very easy to admin, tho, they're still attractive to me for small workgroup sizes. like last month, I had to support 8 people working on a daily magazine being published at e3. litterally inside the convention center. os x server was perfect in that situation.
  • Rosyna - Friday, June 3, 2005 - link

    There appears to be either a typo or a horrible flaw in the test. It says you used GCC 3.3.3 but OS X comes with gcc version 3.3 20030304 (Apple Computer, Inc. build 1809).

    If you did use GCC 3.3.3 then you were giving the PPC a severe disadvantage as the stock GCC has almost no optimizations for PPC while it has many for x86.
  • Eug - Friday, June 3, 2005 - link

    "But do you really think that Oracle would migrate to this if it wasn't on a par?"

    [Eug dons computer geek wannabe hat]

    There are lots of reasons to migrate, and I'm sure absolute performance isn't always the primary concern. We won't know the real performance until we actually see tests on Oracle/Sybase.

    My uneducated guess is that they won't be anywhere near as bad as the artifical server benches might suggest, but OTOH, I could easily see Linux on G5 significantly besting OS X on G5 for this type of stuff.

    ie. The most interesting test I'd like to see is Oracle on the G5, with both OS X and Linux, compared to Xeon and Opteron with Linux.

    And yeah, it would be interesting to see what gcc 4 brings to the table, since 3.3 provides no autovectorization at all. It would also be interesting to see how xlc/xlf does, although that doesn't provide autovectorization either. Where are the autovectorizing IBM compilers that were supposed to come out???
  • melgross - Friday, June 3, 2005 - link

    As none of us has actual experiance with this, none of us can say yes or no.

    But do you really think that Oracle would migrate to this if it wasn't on a par? After all Ellison isn't on Apple's board anymore, so there's nothing to prove there.

    I also remember that going back to Apple's G4 XServes, their performance was better than the x86 crowd, and the Sun servers as well. Those tests were on several sites. Been a while though.
  • JohanAnandtech - Friday, June 3, 2005 - link

    querymc: Yes, you are right. The --noaltivec flag and the comment that altivec was enabled by default in the gcc 3.3.3 compiler docs made me believe there is autovectorization (or at least "scalarisation"). As I wrote in the article we used -O2 and and then tried a bucket load of other options like --fast-math --mtune=G5 and others I don't remember anymore but it didn't make any big difference.
  • querymc - Friday, June 3, 2005 - link

    The SSE support would probably also be improved by using GCC 4 with autovectorization, I should note. There's a reason it does poorly in GCC 3. :)
  • querymc - Friday, June 3, 2005 - link

    Johan: I didn't see this the first time through, but you need to make a slight clarification to the floating point stuff. There is no autovectorization capability in GCC 3.3. None. There is limited support for SSE, but that is not quite the same, as SSE isn't SIMD to the extent that AltiVec is. If you want to use the AltiVec unit in otherwise unaltered benchmarks, you don't have a choice other than GCC 4 (and you need to pass a special flag to turn it on).

    Also, what compiler flags did you pass on each platform? For example, did you use --fast-math?
  • JohanAnandtech - Friday, June 3, 2005 - link

    Melgross: Apple told me that most xserves in europe are sold as "do it all". A little webserver (apache), a database sybase, samba and so on. They didn't have any client who had heavy traffic on the webserver, so nobody complains.

    Sybase/oracle seems to have done quite a bit of work to get good performance out of Mac OS-x, so it must be interesting to see how they managed to solve those problems. But I am sceptical that Oracle/Sybase runs faster on Mac OS x than on Linux.

Log in

Don't have an account? Sign up now