No more mysteries: Apple's G5 versus x86, Mac OS X versus Linuxby Johan De Gelas on June 3, 2005 7:48 AM EST
- Posted in
Mac OS X versus LinuxLmbench 2.04 provides a suite of micro benchmarks that measure the bottlenecks at the Unix operating system and CPU level. This makes it very suitable for testing the theory that Mac OS X might be the culprit for the terrible server performance of the Apple platform.
Signals allow processes (and thus threads) to interrupt other processes. In a database system such as MySQL 4.x where so many processes/threads (60 in our MySQL screenshot) and many accesses to the kernel must be managed, signal handling is a critical performance factor.
Larry McVoy (SGI) and Carl Staelin (HP):
" Lmbench measure both signal installation and signal dispatching in two separate loops, within the context of one process. It measures signal handling by installing a signal handler and then repeatedly sending itself the signal."
|Xeon 3.06 GHz||Linux 2.4||3056||0.42||0.63||4.47||5.58||18.2||0.68||2.33|
|G5 2.7 GHz||Darwin 8.1||2700||1.13||1.91||4.64||8.60||21.9||1.67||6.20|
|Xeon 3.6 GHz||Linux 2.6||3585||0.19||0.25||2.30||2.88||9.00||0.28||2.70|
|Opteron 850||Linux 2.6||2404||0.08||0.17||2.11||2.69||12.4||0.17||1.14|
All numbers are expressed in microseconds, lower is thus better. First of all, you can see that kernel 2.6 is in most cases a lot more efficient. Secondly, although this is not the most accurate benchmark, the message is clear: the foundation of Mac OS X server, Darwin handles the signals the slowest. In some cases, Darwin is even several times slower.
As we increase the level of concurrency in our database test, many threads must be created. The Unix process/thread creation is called "forking" as a copy of the calling process is made.
lmbench "fork" measures simple process creation by creating a process and immediately exiting the child process. The parent process waits for the child process to exit. The benchmark is intended to measure the overhead for creating a new thread of control, so it includes the fork and the exit time.
lmbench "exec" measures the time to create a completely new process, while " sh" measures to start a new process and run a little program via /bin/ sh (complicated new process creation).
|Xeon 3.06 GHz||Linux||3056||163||544||3021|
|G5 2.7 GHz||Darwin||2700||659||2308||4960|
|Xeon 3.6 GHz||Linux||3585||158||467||2688|
Mac OS X is incredibly slow, between 2 and 5(!) times slower, in creating new threads, as it doesn't use kernel threads, and has to go through extra layers (wrappers). No need to continue our search: the G5 might not be the fastest integer CPU on earth - its database performance is completely crippled by an asthmatic operating system that needs up to 5 times more time to handle and create threads.
Post Your CommentPlease log in or sign up to comment.
View All Comments
michaelok - Saturday, June 4, 2005 - link"with one benchmark showing that the PowerMac is just a mediocre PC while another shows it off as a supercomputer, the unchallenged king of the personal computer world."
Well, things are a little different when you connect, say, 32768 processors together, i.e. you go from running MySQL to Teradata, so yes, the Power architecture seems to dominate, and the Virginia Tech supercomputer is still up there, at 7th.
" The RISC ISA, which is quite complex and can hardly be called "Reduced" (The R of RISC), provides 32 architectural registers"
'Reduced Instruction Set' is misleading, it actually refers to a design philosophy of using *smaller, simpler* instructions, instead of a single complex instruction. This is to be compared with the Itanium for example, which Intel calls 'EPIC' (Explicit Parallel Instruction Computing), but it is essentially derived from VLIW (Very Long Instruction Word).
Anyway, nice article, certainly much more to discuss here, such as SMT (Simultaneous Multithreading), (when that is available for the Apple :), vs. Intel's Hyperthreading. We'll still be comparing Apples to Oranges but isn't that why everybody buys the Motor Trend articles, i.e. '68 Mustang vs. '68 GT?
psychodad - Saturday, June 4, 2005 - linkI agree. Recently I read a review which pitted macs against pcs using software blatantly optimized for macs. If you have ever used unoptimized software, you will know it. It is slow, often unstable and not at all usable, especially if you're after productivity.
Viditor - Friday, June 3, 2005 - linkIntelUser2000 - "about the AMD TDP number, they never state that its max power, they say its maximum power achievable under most circumstances, its not absolute max power"
Not true at all...AMD's datasheet clearly states that it's not only max power, but max theoretical power.
trooper11 - Friday, June 3, 2005 - linkI think its hard enough comparing a G5 to PC systems. I dont belive there will ever be a 'fair' comparison that satisfies everyone on both sides. There are too few general programs to compare and people will always complain about using or not using optimized apps for either platform. many of the varibles are subjective and the benchmarks to be compared are so heavily debated without a clear answer.
I think this was a good attempt, but I gave up trying to 'fairly' compare the two a long time ago. Anyhting that sheds a bit of light is a good thing, but i never expect an end to the contreversy, too many questions that cant be answered.
I would though love to see the addition of dual core amd chips since they are out there and would be serious competition, of course it would fly in server applications. hopefully the numbers for that could be added in a later article.
psychodad - Friday, June 3, 2005 - linkFascinating. You run these tests using a compiler that Apple does not use (unless it is Yellow Dog) against software generally optimized for x86 architectures and you make conclusions. This makes your data tainted (actually biased) and your conclusions faulty. I would suggest that in fairness you make your tests more "real world" by using the software compiled by compilers that the rest of us nontechnical people use on a daily basis.
smitty3268 - Friday, June 3, 2005 - linkRosyna:
Oh, I assumed he was using the Apple version of gcc. If not, then I see what you mean.
crimsonson - Friday, June 3, 2005 - linkThis article may be moot by Monday
Garyclaus16 - Friday, June 3, 2005 - link" Oh and the graph on page 5 doesnt display correctly in firefox. "
AND you are using firefox for what reason?...you deserve to view pages incorrectly
Rosyna - Friday, June 3, 2005 - linksmitty3268, that's part of the problem. Almost no one uses GCC 3.3.3 (stock, from the main gcc branch) for Mac OS X development because it really sucks at optimizing for the PPC. On the other hand, OS X was compiled with the Apple shipped GCC 3.3/GCC 4.0.
smitty3268 - Friday, June 3, 2005 - linkI think its fair to use the compilers most people are going to be using. That would be gcc on both platforms. As far as autovectorization in 4.0, don't expect very much from it. Obviously it will be better than 3.3, but the real work is being added now in 4.1.
I'll join the other 50 posters who would have liked to see at least 1 page showing the G5's performance under linux compared to OSX. That and maybe a few more real world benchmarks. But your article was very informative and answered a lot of questions. It was frustrating that there really wasn't anything done like this before.