The G5 a.k.a. Power 970FX

You might not have noticed it, but there is in fact a lot of good news in this article for owners of current Apple systems. Gcc 4.0 promises a lot better (FP) performance in open source software. The improvement from gcc 4.0 over gcc 3.3.3 and 3.3 is amazing on the PowerFX family: almost a 70% improved FP performance!

Now that the open source community finally has a decent compiler for the Apple platform, Apple management decides to step over to another architecture. Ironically, right now, the Intel architecture needs a super-optimized compiler (Intel's own) to reach the FP performance that the G5 now reaches with a very popular but far less aggressive compiler (gcc).

Combined with the data from our first article, we can safely say that the G5 2.7 GHz FP performance is at least as good as the best x86 CPUs. Integer performance seems to be between 70% and 80% of the fastest x86 CPUs, while FP/SIMD performance can actually surpass x86 in certain situations.

With the dual-core Power 970MP available and IBM's current outstanding track record when it comes to multi-core CPUs, big question marks can be placed on whether or not the switch to Intel CPUs will - from a technical point of view - be such a big step forward as Steve Jobs claims. There is more: each core has 1 MB cache instead of the current 512 KB, which will improve integer performance quite a bit as it lessens the impact of the biggest problem of the G5 - the high latency access to the memory system.


Xserve, silently cooled. Below the G5 with the cover, you can see the heatsink.

It is again ironic that the Power 970MP is far more advanced than the current Intel Dual-cores when it comes to power management. Each core can be placed independently in a power-saving state called doze, while the other core continues operation.

A low power Power 970FX is also available and consumes about 16 Watts at 1.6 GHz; so it seems that IBM, although slightly late, could have provided everything that Apple needs. The G5 with its 58 million transistors and 66 mm² die size is not really a hot CPU. The Xserve (2 x 2.3 GHz G5) was by far the quietest 1U air-cooled server that ever entered our lab in Kortrijk.

The Usual Suspects

The Mac OS X kernel environment includes the Mach kernel, BSD, the I/O Kit, file systems, and networking components. Some of these components slow down MySQL significantly. While our rough profiling has not identified the true culprits, we think that we can narrow the possible suspects to:
  1. Relatively high TCP Latency that we measured
  2. The implementation of the threading system. Does the pthread to Mach threads mapping involve some overhead, or is this the result of the traditional performance problem of the micro kernel, namely the high latency of such a kernel on system calls? While Mac Os X is not a micro-kernel, the problem might still exist as the Mach core is deep inside that kernel. Is there IPC overhead? Lmbench signaling benchmarks seem to suggest that there is.
  3. The finer grained locking in the current version of Tiger does not appear to be working for some reason and we still have the "two lock" system of Panther.
Will this performance problem only be visible in MySQL? At this point, we can only speculate, but we have a strong suspicion that this is not the case. Server workloads spend, contrary to other workloads such as workstation apps, a substantial portion of their execution in the kernel and TCP stack. Porting such applications to Mac OS X is more complicated than just recompiling code. We didn't have to search long before we found examples of companies that increased their number of servers or upgraded in order to run MySQL faster.

We look forward to testing other database and server apps on the Mac OS X platform. Critical reports that point out weaknesses can only help the Apple community move forward and keep the Apple people on their toes.

References

[1] Threading on OS X
http://developer.apple.com/technotes/tn/tn2028.html

[2] Lmbench: Portable Tools for Performance Analysis
Larry McVoy, Silicon Graphics
Carl Staelin, Hewlett-Packard Laboratories

[3] Performance Characterization of a Quad Pentium Pro SMP Using OLTP
Kimberly Keeton*, David A. Patterson*, Yong Qiang He+, Roger C. Raphael+, and Walter E. Baker
Computer Science Division
University of California at Berkeley

Mac OS X Achilles Heel
Comments Locked

47 Comments

View All Comments

  • JohanAnandtech - Friday, September 2, 2005 - link

    Sorry couldn't resist :-). (for the rest of the world, pannekoek is dutch for Pancake)

    Desktop performance is ok, as desktop apps are similar to the workstation apps we tested in the first article. Those apps spend from 5-20% in the OS, while server apps spend up to 80% of their time in the OS!

    However, I should point out that we tested Mac OS X SERVER, so it is a problem for the Xserves.
  • Pannenkoek - Friday, September 2, 2005 - link

    I stand corrected then. However, my reasoning still applies, it's just that Apple relies even more on its brand than on technology to sell server systems apparently. Who runs Mac OS servers anyway, it's an oxymoron. ;-)

    P.S. Do not mock my nick, it served well in beating godlike UT bots, and should be honoured as much as Loque.
  • Tanclearas - Thursday, September 1, 2005 - link

    "Apple told us that the problem lies in the Apachebench (the client side), which stalls from time to time and thus, generates too low of a load on the (Apache) server."

    How does this explanation make any sense? Linux obviously doesn't have a problem with these "stalls".
  • JohanAnandtech - Friday, September 2, 2005 - link

    What follows is not what Apple said, but my interpretation...

    They are probably pointing out that the version for Mac OS X has a Mac OS X specific bug. Of course, who is to blame? I am sceptical like you.
  • mariush - Thursday, September 1, 2005 - link

    Page 4 :

    We used the following on the Opteron based PCs:

    Gcc -O2 -mcpu=G5 flops.c -o flops

    And, on the G5 machines, we used:

    Gcc -O2 -march=k8 flops.c -o flops

    I think it's the other way around.
  • Houdani - Thursday, September 1, 2005 - link

    Aye, was gonna point that out also.

    In addition, on page 3 should you list the Yellow Dog Linux along with OSX in the Software section for the Apple PowerMac G5?
  • Shinei - Thursday, September 1, 2005 - link

    My question is, would the memory latencies be so high for the 970FX if high-end RAM was used for the Linux tests (like, say, some TCCD or BH-5 at 2-2-2-5), instead of the standard 3-3-3-8 SPD that ships with the G5 system? Or is there some limitation to the G5 motherboard that prevents posting with performance RAM as a way for Apple to ensure that only certain, accepted DIMMs are used with their computers?
    Anyway, these results are very telling about what the OSX86 Macs are going to perform like--that is to say, ~25% slower than the equivalent Windows/Linux boxes running the same hardware...
  • IntelUser2000 - Sunday, September 4, 2005 - link

    quote:

    My question is, would the memory latencies be so high for the 970FX if high-end RAM was used for the Linux tests (like, say, some TCCD or BH-5 at 2-2-2-5), instead of the standard 3-3-3-8 SPD that ships with the G5 system? Or is there some limitation to the G5 motherboard that prevents posting with performance RAM as a way for Apple to ensure that only certain, accepted DIMMs are used with their computers?


    That doesn't matter since they are testing workstations, Irwindale and Opteron is also using CAS3 RAM. No workstations/servers use 2-2-2-5 RAM.


    The poor scores of OS X compared to Linux makes sense. G5 was rumored to be fast in speccpu benchmarks but came out to be slower. Must be that rumor systems were benched with Linux and the production was benched with OSX.

    I am impressed with OS X's features though.
  • Jedi2155 - Thursday, September 1, 2005 - link

    The G5 motherboard has the limitations due to Apple's way to insure you only buy certified ram. The SPD settings must be perfect.
  • ceefka - Thursday, September 1, 2005 - link

    I am humbled by the sheer expertise of Johan. Amazing work, Johan!

    This makes me even more curious about Intel's contribution to the next generation of Macs. How will they compare to the best G5s?

Log in

Don't have an account? Sign up now