Workstation, yes; Server, no.

The G5 is a gigantic improvement over the previous CPU in the PowerMac, the G4e. The G5 is one of the most superscalar CPUs ever, and has all the characteristics that could give Apple the edge, especially now that the clock speed race between AMD and Intel is over. However, there is still a lot of work to be done.

First of all, the G5 needs a lower latency access to the memory because right now, the integer performance of the G5 leaves a lot to be desired. The Opteron and Xeon have a better integer engine, and especially the Pentium 4/Xeon has a better Branch predictor too. The Opteron's memory subsystem runs circles around the G5's.

Secondly, it is clear that the G5 FP performance, despite its access to 32 architectural registers, needs good optimisation. Only one of our flops tests was " Altivectorized", which means that the GCC compiler needs to improve quite a bit before it can turn those many open source programs into super fast applications on the Mac. In contrast, the Intel compiler can vectorize all 8 tests.

Altivec or the velocity engine can make the G5 shine in workstation applications. A good example is Lightwave where the G5 takes on the best x86 competition in some situations, and remains behind in others.

The future looks promising in the workstation market for Apple, as the G5 has a lot of unused potential and the increasing market share of the Power Mac should tempt developers to put a little more effort in Mac optimisation.

The server performance of the Apple platform is, however, catastrophic. When we asked Apple for a reaction, they told us that some database vendors, Sybase and Oracle, have found a way around the threading problems. We'll try Sybase later, but frankly, we are very sceptical. The whole "multi-threaded Mach microkernel trapped inside a monolithic FreeBSD cocoon with several threading wrappers and coarse-grained threading access to the kernel", with a "backwards compatibility" millstone around its neck sounds like a bad fusion recipe for performance.

Workstation apps will hardly mind, but the performance of server applications depends greatly on the threading, signalling and locking engine. I am no operating system expert, but with the data that we have today, I think that a PowerPC optimised Linux such as Yellow Dog is a better idea for the Xserve than Mac OS X server.

References

Threading on OS X
http://developer.apple.com/technotes/tn/tn2028.html

Basics OS X
http://developer.apple.com/documentation/macosx/index.html


Mac OS X versus Linux
Comments Locked

116 Comments

View All Comments

  • Icehawk - Friday, June 3, 2005 - link

    Interesting stuff. I'd like to see more data too. Mmm Solaris.

    Unfortunately the diagrams weren't labeled for the most part (in terms of "higher is better") making it difficult to determine the results.

    And the whole not displaying on FF properly... come on.
  • NetMavrik - Friday, June 3, 2005 - link

    You can say that again! NT shares a whole lot more than just similarites to VMS. There are entire structures that are copied straight from VMS. I think most people have forgotten or never knew what "NT" stood for anyway. Take VMS, increment each letter by one, and you get WNT! New Technology my a$$.
  • Guspaz - Friday, June 3, 2005 - link

    Good article. But I'd like to see it re-done with the optimal compiler per-platform, and I'd like to see PowerPC Linux used to confirm that OSX is the cause of the slow MySQL performance.
  • melgross - Friday, June 3, 2005 - link

    I was just thinking back about this and remembered something I've seen

    Computerworld has had articles over the past two years or so about companies who have gone to XServes. They are using them with Apache, SYbase or Oracle. I don't remember any complaints about performance.

    Also Oracle itself went to XServes for its own datacenter. Do you think they would have done that if performance was bad? They even stated that the performance was very good.

    Something here seems screwed up.
  • brownba - Friday, June 3, 2005 - link

    johan, i always appreciate your articles.

    you've been /.'d !!!!
    and anandtech is holding up well.
    good job
  • bostrov - Friday, June 3, 2005 - link

    Since so much effort went in to vector facilities and instruction sets ever since the P54 days, shouldn't "best effort" on each CPU be used (use the IBM compiler on G5 and the Intel compiler on x86) - by using gcc you're using an almost artifically bad compiler and there is no guarantee that gcc will provide equivilant optimizations for each platform anyway.

    I think it'd be very interesting to see an article with the very best available compilers on each platform running the benchmarks.

    Incidently, intel C with the vector instruction sets disabled still does better.
  • JohanAnandtech - Friday, June 3, 2005 - link

    bostrov: because the Intel compiler is superb at vectorizing code. I am testing x87 FPU and gcc, you are testing SSE-2 performance with the Intel compiler.
  • JohanAnandtech - Friday, June 3, 2005 - link

    minsctdp: A typo which happened during final proofread. All my original tables say 990 MB/s. Fixed now.
  • bostrov - Friday, June 3, 2005 - link

    My own results for flops 2.0: (compiled with Intel C 8.1, 3.2 Ghz Prescott with 160 Mhz - 5:4 ratio - FSB)

    flops20-c_prescott.exe

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 1.7764e-013 0.0109 1288.7451
    2 -1.4166e-013 0.0082 852.7242
    3 8.1046e-015 0.0067 2531.7045
    4 9.0483e-014 0.0052 2858.2062
    5 -6.2061e-014 0.0140 2065.6650
    6 3.3640e-014 0.0100 2906.2439
    7 -5.7980e-012 0.0327 366.4559
    8 3.7692e-014 0.0111 2700.8968

    Iterations = 512000000
    NullTime (usec) = 0.0000
    MFLOPS(1) = 1088.7826
    MFLOPS(2) = 854.7579
    MFLOPS(3) = 1609.7508
    MFLOPS(4) = 2753.5016

    Why are the anandtech results so poor?
  • melgross - Friday, June 3, 2005 - link

    I thought that GCC comes with Tiger. I have read Apple's own info, and it definitely mentions GCC 4. Perhaps that would help the vectorization process.

    Altivec is such an important part of the processor and the performance of the machine that I would like to see properly written code used to compare these machines.

Log in

Don't have an account? Sign up now