Architecture and Memory Performance

When Johan did his No More Mysteries article, he found that as a processor, the G5 is quite competitive with modern day x86 CPUs.  In fact, he found that it offered floating point performance on par with that of the fastest x86 processor - the Athlon 64/Opteron. 

Separately, I looked at Core Duo performance and found that clock-for-clock, it was a pretty solid competitor to AMD's offerings.  Intel had effectively created a performance equal to AMD's Athlon 64 at lower clock speeds, without the use of an on-die memory controller. 

But now, it's time for judgment day. How does the Core Duo stack up to the G5?  Let's start at one of the G5's weakest points - memory speed.

I turned to lmbench and compiled it for both G5 and Intel x86 architectures, and used it to give me some hints to how memory speed has changed with the new platform.

I organized the data in terms of distance from the CPU. So first, we have performance of the on-die L2 cache of these two chips.  The Core Duo's L2 cache took 7.649ns to access, which translates into 14 clock cycles, a number that agrees with my ScienceMark results from previous articles. 

L2 Cache Latency - lmbench 2.5

The G5's L2 cache took 6.329ns to access, which at 1.9GHz, translates into a 12 cycle L2 - a slight performance advantage over the Core Duo.  Remember that the Core Duo's predecessor originally had a 10 cycle L2 cache, but thanks to the new power saving technology and some other unmentionable (for now) changes to the cache, Core Duo's L2 now takes 14 cycles to access.  Despite the greater access time, it's important to note that Core Duo's L2 cache is four times as large as the G5's. 

The 1.83GHz Core Duo features a 64-bit wide 667MHz FSB, offering a similar 5.336GB/s of bandwidth.  The FSB connects the chip to a 945 Express MCH with a dual channel DDR2-667 memory controller, providing it with 10.6GB/s of memory bandwidth. However, the Intel based iMac only ships with a single SO-DIMM installed, meaning that it is only operating in single-channel mode - delivering 5.336GB/s of memory bandwidth. I didn't have a DDR2 SO-DIMM on hand to test whether or not installing a second one would actually enable dual channel mode. 

The 1.9GHz G5 features a bi-directional 64-bit wide 633MHz FSB, offering a total of 5.06GB/s of bandwidth.  The chip connects to a North Bridge that appears to have a dual channel DDR2-533 memory controller, which provides it with 4.264GB/s of memory bandwidth, thanks to only one channel being active. This means that the iMac G5 is slightly memory bandwidth starved.

In Johan's article, he uncovered that the G5 is in terrible need of a lower latency memory controller, with memory requests taking almost twice as long as on an Intel platform!  Whether it is the G5's FSB or its chipset's memory controller that is at fault is difficult to isolate, but needless to say, the comparison to the Core Duo isn't pretty:

Memory Access Latency - lmbench 2.5

It takes the G5 almost twice as long just to get data back from memory as the Core Duo. That means that the CPU has to waste around twice as many clock cycles as the Core Duo, which leads to higher power consumption and lower performance.  To make matters worse, the G5 only has a 512KB L2 cache, so it has to go to main memory more often than the Core Duo with its massive 2MB L2 cache.  While sticking with a 512KB L2 cache may have kept the CPU small, the G5 really needed a larger cache much earlier in its lifetime (either that or a better FSB/memory controller). 

The high latency memory access and slower memory bus is why the G5 suffers tremendously when it comes to memory bandwidth:

Memory Read Speed - lmbench 2.5

Memory Write Speed - lmbench 2.5

Although, it is worth noting that the G5 actually posts a higher memory write speed here than the Core Duo.  It's not easy to explain why, as it could very well be a compiler issue. Remember that here, we are relying on gcc 4.0 and not Intel's C compiler to extract the best performance out of their platforms.  Over time, you can expect that to change, but for a first showing, it's not terrible. 

IBM vs. Intel - Performance per Watt Floating Point and Branch Predictor Performance
Comments Locked

35 Comments

View All Comments

  • ohnnyj - Tuesday, January 31, 2006 - link

    I have already preorded one (did so on the day they were announced), but now I am having serious doubts about keeping the order (does not ship until the 15th). The only thing that really worries me is if Apple will release new MacBooks when Intel releases the Conroe processor. I would think by that time (fall?) they would have most of the programs ported (i.e. Photoshop) and then an even better processor to run it with. I have been waiting so long for a laptop,...decisions, decisions.
  • Furen - Tuesday, January 31, 2006 - link

    I would say you should tough it out for a bit. Like Anand said, this is basically a Public Beta test. Kind of sucks that Apple brought out a 32bit version of the OS considering that it could've been x86-64 native if Apple had waited for a couple of quarters. Then again, it makes no difference if the OS is not 64 bits yet, since a 64 bit version would be able to run 32 bit apps anyway.
  • IntelUser2000 - Tuesday, January 31, 2006 - link

    I wonder if Rosetta itself doesn't take advantage of multi-thread...
  • IntelUser2000 - Tuesday, January 31, 2006 - link

    Wait, doesn't X1600 use H.264 decoding on hardware??
  • smitty3268 - Tuesday, January 31, 2006 - link

    It does if the drivers are set up to use it properly. Given that Windows users only got this about a month ago I'd say it probably isn't doing that yet on Macs. Could be, though.

Log in

Don't have an account? Sign up now