The G5 as Server CPU

While it is the Xserve and not the PowerMac that is Apple's server platform, we could not resist the temptation to test the G5 based machine as a server too. Installed on the machine was the server version of Mac OS X Tiger. So in fact, we are giving the Apple platform a small advantage: the 2.5 GHz CPUs are a bit faster than the 2.3 GHz of the Xserve, and the RAM doesn't use ECC as in the Xserve.

A few months before, we had a quick test run with the beautifully designed and incredible silent 1U Xserve and results were similar, albeit lower, than the ones that we measured on the PowerMac.

Network performance wasn't an issue. We used a direct Gigabit Ethernet link between client and server. On average, the server received 4 Mbit/s and sent 19 Mbit/s of data, with a peak of 140 Mbit/s, way below the limits of Gigabit. The disk system wasn't very challenged either: up to 600 KB of reads and at most 23 KB/s writes. You can read more about our MySQL test methods here.

Ever heard about the famous English Plum pudding? That is the best way to describe the MySQL performance on the G5/ Mac OS X server combination. Performance is decent with one or two virtual client connecting. Once we go to 5 and 10 concurrent connections, the Apple plum pudding collapses.

Dual G5 2,5 GHz PowerMac Dual Xeon DP 3,6 GHz (HT on) Dual Xeon DP 3,6 GHz (HT out) Dual Opteron 2.4Ghz
1 192 286 287 290
2 274 450 457 438
5 113 497 559 543
10 62 517 583 629
20 50 545 561 670
35 50 486 573 650
50 47 495 570 669

Performance is at that point only 1/10th of the Opteron and Xeon. We have tested this on Panther (10.3) and on Tiger (10.4.1), triple-checked every possible error and the result remains the same: something is terribly wrong with the MySQL server performance.

SPEC CPU 2000 Int numbers compiled with GCC show that the G5 reaches about 75% of the integer performance of an equally clocked Opteron. So, the purely integer performance is not the issue. The Opteron should be quite faster, but not 10 times faster.

We checked with the activity monitor, and the CPUs were indeed working hard: up to 185% CPU load on the MySQL process. Notice that the MySQL process consists of no less than 60 threads.

We did a check with Apache 1.3 and the standard "ab" (Apachebench) benchmark:

Concurrency Dual Powermac G5 2.5 GHz (Panther) Dual Powermac G5 2.7 GHz (Tiger) Dual Xeon 3.6 GHz
5 216.34 217.6 3776.44
20 216.24 217.68 3711.4
50 269.38 218.32 3624.63
100 249.51 217.69 3768.89
150 268.59 256.89 3600.1

The new OS, Tiger doesn't help: the 2.7 GHz (10.4.1) is as fast as the 2.5 GHz on Panther (10.3). More importantly, Apache shows exactly the same picture as MySQL: performance is 10 times more worse than on the Xeon (and Opteron) on Linux. Apple is very proud about the Mac OS X Unix roots, but it seems that the typical Unix/Linux software isn't too fond of Apple. Let us find out what happened!

Micro CPU benchmarks: isolating the FPU Mac OS X: beautiful but…
Comments Locked

116 Comments

View All Comments

  • Icehawk - Friday, June 3, 2005 - link

    Interesting stuff. I'd like to see more data too. Mmm Solaris.

    Unfortunately the diagrams weren't labeled for the most part (in terms of "higher is better") making it difficult to determine the results.

    And the whole not displaying on FF properly... come on.
  • NetMavrik - Friday, June 3, 2005 - link

    You can say that again! NT shares a whole lot more than just similarites to VMS. There are entire structures that are copied straight from VMS. I think most people have forgotten or never knew what "NT" stood for anyway. Take VMS, increment each letter by one, and you get WNT! New Technology my a$$.
  • Guspaz - Friday, June 3, 2005 - link

    Good article. But I'd like to see it re-done with the optimal compiler per-platform, and I'd like to see PowerPC Linux used to confirm that OSX is the cause of the slow MySQL performance.
  • melgross - Friday, June 3, 2005 - link

    I was just thinking back about this and remembered something I've seen

    Computerworld has had articles over the past two years or so about companies who have gone to XServes. They are using them with Apache, SYbase or Oracle. I don't remember any complaints about performance.

    Also Oracle itself went to XServes for its own datacenter. Do you think they would have done that if performance was bad? They even stated that the performance was very good.

    Something here seems screwed up.
  • brownba - Friday, June 3, 2005 - link

    johan, i always appreciate your articles.

    you've been /.'d !!!!
    and anandtech is holding up well.
    good job
  • bostrov - Friday, June 3, 2005 - link

    Since so much effort went in to vector facilities and instruction sets ever since the P54 days, shouldn't "best effort" on each CPU be used (use the IBM compiler on G5 and the Intel compiler on x86) - by using gcc you're using an almost artifically bad compiler and there is no guarantee that gcc will provide equivilant optimizations for each platform anyway.

    I think it'd be very interesting to see an article with the very best available compilers on each platform running the benchmarks.

    Incidently, intel C with the vector instruction sets disabled still does better.
  • JohanAnandtech - Friday, June 3, 2005 - link

    bostrov: because the Intel compiler is superb at vectorizing code. I am testing x87 FPU and gcc, you are testing SSE-2 performance with the Intel compiler.
  • JohanAnandtech - Friday, June 3, 2005 - link

    minsctdp: A typo which happened during final proofread. All my original tables say 990 MB/s. Fixed now.
  • bostrov - Friday, June 3, 2005 - link

    My own results for flops 2.0: (compiled with Intel C 8.1, 3.2 Ghz Prescott with 160 Mhz - 5:4 ratio - FSB)

    flops20-c_prescott.exe

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 1.7764e-013 0.0109 1288.7451
    2 -1.4166e-013 0.0082 852.7242
    3 8.1046e-015 0.0067 2531.7045
    4 9.0483e-014 0.0052 2858.2062
    5 -6.2061e-014 0.0140 2065.6650
    6 3.3640e-014 0.0100 2906.2439
    7 -5.7980e-012 0.0327 366.4559
    8 3.7692e-014 0.0111 2700.8968

    Iterations = 512000000
    NullTime (usec) = 0.0000
    MFLOPS(1) = 1088.7826
    MFLOPS(2) = 854.7579
    MFLOPS(3) = 1609.7508
    MFLOPS(4) = 2753.5016

    Why are the anandtech results so poor?
  • melgross - Friday, June 3, 2005 - link

    I thought that GCC comes with Tiger. I have read Apple's own info, and it definitely mentions GCC 4. Perhaps that would help the vectorization process.

    Altivec is such an important part of the processor and the performance of the machine that I would like to see properly written code used to compare these machines.

Log in

Don't have an account? Sign up now