Mac OS X Achilles Heel

It is clear that profiling MySQL on the kernel is the only way that we are going to be able to pin-point why exactly MySQL is so slow on Mac OS X. So, why did I state that I believe the threading engine in Mac OS X to be rather slow? Well, I admit that I should have made it more clear in the article that I didn't have rock-solid evidence. However, my suspicion is based on more than speculation.

First of all, notice that the Mac OS X performance is decent with a concurrency of one, or one simulated user. It still performs well when a second user is simulated, as the second CPU can kick in and push performance higher. Let us check the scaling, by putting the numbers of our MySQL graphic into a table.

Concurrency Dual G5 2,5 GHz Tiger Scaling (Concurrency one=100%) Dual G5 2,5 GHz Linux 2.6 Scaling (Concurrency one=100%) Dual Opteron 2.4Ghz Scaling (Concurrency one=100%)
1 192 100% 228 100% 290 100%
2 274 143% 349 153% 438 151%
5 113 59% 411 180% 543 187%
10 62 32% 443 194% 629 217%
20 50 26% 439 193% 670 231%
35 50 26% 422 185% 650 224%
50 47 25% 414 182% 669 230%

The performance at concurrency 1 and 2 is mediocre, but not really bad. Notice that the scaling of Mac OS X from one to two is not fantastic, but is almost as good as the Linux machines. Once we worked with 5 concurrent users, however, performance collapses on Mac OS X: we get only 60% of the performance at concurrency one. With Linux, both CPUs are not stressed at a concurrency of two, and increasing the load makes the CPUs work harder.

The G5 (Linux) achieves its peak quicker as it is a bit slower in this integer intensive task than the Opteron. However, it is important to remark that while performance begins to decline very slowly as we increase the number of users, there is no collapse! At a concurrency of 50, we still have 80% more performance than at a concurrency of one, showing that Linux handles the extra load of the extra threads very well. On Mac OS X, performance has plummeted to one quarter of our initial performance, showing that the threads are creating an additional overhead somehow.

Secondly, it is a fact that our benchmark is not disk limited. In that case, it is well documented that MySQL performance depends on the threading performance of the OS. A few examples:
MySQL Reference Manual for version 5.0.3-alpha:

"MySQL is very dependent on the thread package used. So when choosing a good platform for MySQL, the thread package is very important"
More:
"The capability of the kernel and the thread library to run many threads that acquire and release a mutex over a short critical region frequently without excessive context switches. If the implementation of pthread_mutex_lock() is too anxious to yield CPU time, this will hurt MySQL tremendously. If this issue is not taken care of, adding extra CPUs will actually make MySQL slower"
Darwin (6.x and older) used to be quite a bit slower when it came to context switches, but our own LMBench testing shows that the latest Darwin 8.0 performs context switches just as/nearly as fast as Linux kernel 2.6. So, a possible explanation might be that more context switches happen, but we still have to find a method to measure this. Suggestions are welcome....

From the MySQL site:
"As a multithreaded server, MySQL is most efficient on an operating system that has a well implemented threading system"
Thirdly, we have the Lmbench benchmarks, which are not conclusive, but point in the same direction. Even the high latency for the TCP measurements (see above) on Mac OS X might indicate relatively poor threading performance. MySQL has a TCP/IP connection thread, which handles all connection requests. This thread creates a new dedicated thread to handle the authentication and SQL query processing for each connection.

The split funnel suspect

The last suspect is the locking system. In Panther, only two threads could lock into the kernel to execute code of the kernel. One thread could lock into the networking part, while the other into the rest of the kernel services.

In Tiger, the locking is finer. Although Apple's documents indicate that it is still rather coarse grained, it is clear that more than two locks into the kernel can exist at the same time. In the case of MySQL, this should be a very important improvement, but we didn't see any improvement at all when performing the tests on both Panther and Tiger. This is speculation, but based on our data, we are tempted to hypothesize that the new locking system isn't really working right now, and that Tiger continues to behave like Panther.

Does it affect you?

What does this all mean? Whether or not you skipped the technical part, you probably want to know how it affects your Apple (server) experience.

It is clear that if you plan to run MySQL on Apple hardware, it is better to install YDL Linux than to use OS X. If you need excellent read performance, the maximum performance of your server will be up to 8 times better. If your server is only going to serve a limited number of users, YDL Linux will allow you to run with a less expensive system.

If the usage pattern of your server is more OLTP, Transaction processing oriented, we give you the same advice. Our quick tests with InnoDB show the same kind of behavior and we have noticed very slow file system performance. At this point, we do not have enough data to be conclusive. We noticed, for example, that importing data in our database (via the ">" command) took up to 8 times longer.

Low level benchmarking on Mac OS X and Linux The G5 a.k.a. Power 970FX
Comments Locked

47 Comments

View All Comments

  • JohanAnandtech - Friday, September 2, 2005 - link

    Sorry couldn't resist :-). (for the rest of the world, pannekoek is dutch for Pancake)

    Desktop performance is ok, as desktop apps are similar to the workstation apps we tested in the first article. Those apps spend from 5-20% in the OS, while server apps spend up to 80% of their time in the OS!

    However, I should point out that we tested Mac OS X SERVER, so it is a problem for the Xserves.
  • Pannenkoek - Friday, September 2, 2005 - link

    I stand corrected then. However, my reasoning still applies, it's just that Apple relies even more on its brand than on technology to sell server systems apparently. Who runs Mac OS servers anyway, it's an oxymoron. ;-)

    P.S. Do not mock my nick, it served well in beating godlike UT bots, and should be honoured as much as Loque.
  • Tanclearas - Thursday, September 1, 2005 - link

    "Apple told us that the problem lies in the Apachebench (the client side), which stalls from time to time and thus, generates too low of a load on the (Apache) server."

    How does this explanation make any sense? Linux obviously doesn't have a problem with these "stalls".
  • JohanAnandtech - Friday, September 2, 2005 - link

    What follows is not what Apple said, but my interpretation...

    They are probably pointing out that the version for Mac OS X has a Mac OS X specific bug. Of course, who is to blame? I am sceptical like you.
  • mariush - Thursday, September 1, 2005 - link

    Page 4 :

    We used the following on the Opteron based PCs:

    Gcc -O2 -mcpu=G5 flops.c -o flops

    And, on the G5 machines, we used:

    Gcc -O2 -march=k8 flops.c -o flops

    I think it's the other way around.
  • Houdani - Thursday, September 1, 2005 - link

    Aye, was gonna point that out also.

    In addition, on page 3 should you list the Yellow Dog Linux along with OSX in the Software section for the Apple PowerMac G5?
  • Shinei - Thursday, September 1, 2005 - link

    My question is, would the memory latencies be so high for the 970FX if high-end RAM was used for the Linux tests (like, say, some TCCD or BH-5 at 2-2-2-5), instead of the standard 3-3-3-8 SPD that ships with the G5 system? Or is there some limitation to the G5 motherboard that prevents posting with performance RAM as a way for Apple to ensure that only certain, accepted DIMMs are used with their computers?
    Anyway, these results are very telling about what the OSX86 Macs are going to perform like--that is to say, ~25% slower than the equivalent Windows/Linux boxes running the same hardware...
  • IntelUser2000 - Sunday, September 4, 2005 - link

    quote:

    My question is, would the memory latencies be so high for the 970FX if high-end RAM was used for the Linux tests (like, say, some TCCD or BH-5 at 2-2-2-5), instead of the standard 3-3-3-8 SPD that ships with the G5 system? Or is there some limitation to the G5 motherboard that prevents posting with performance RAM as a way for Apple to ensure that only certain, accepted DIMMs are used with their computers?


    That doesn't matter since they are testing workstations, Irwindale and Opteron is also using CAS3 RAM. No workstations/servers use 2-2-2-5 RAM.


    The poor scores of OS X compared to Linux makes sense. G5 was rumored to be fast in speccpu benchmarks but came out to be slower. Must be that rumor systems were benched with Linux and the production was benched with OSX.

    I am impressed with OS X's features though.
  • Jedi2155 - Thursday, September 1, 2005 - link

    The G5 motherboard has the limitations due to Apple's way to insure you only buy certified ram. The SPD settings must be perfect.
  • ceefka - Thursday, September 1, 2005 - link

    I am humbled by the sheer expertise of Johan. Amazing work, Johan!

    This makes me even more curious about Intel's contribution to the next generation of Macs. How will they compare to the best G5s?

Log in

Don't have an account? Sign up now