Architecture and Memory Performance

When Johan did his No More Mysteries article, he found that as a processor, the G5 is quite competitive with modern day x86 CPUs.  In fact, he found that it offered floating point performance on par with that of the fastest x86 processor - the Athlon 64/Opteron. 

Separately, I looked at Core Duo performance and found that clock-for-clock, it was a pretty solid competitor to AMD's offerings.  Intel had effectively created a performance equal to AMD's Athlon 64 at lower clock speeds, without the use of an on-die memory controller. 

But now, it's time for judgment day. How does the Core Duo stack up to the G5?  Let's start at one of the G5's weakest points - memory speed.

I turned to lmbench and compiled it for both G5 and Intel x86 architectures, and used it to give me some hints to how memory speed has changed with the new platform.

I organized the data in terms of distance from the CPU. So first, we have performance of the on-die L2 cache of these two chips.  The Core Duo's L2 cache took 7.649ns to access, which translates into 14 clock cycles, a number that agrees with my ScienceMark results from previous articles. 

L2 Cache Latency - lmbench 2.5

The G5's L2 cache took 6.329ns to access, which at 1.9GHz, translates into a 12 cycle L2 - a slight performance advantage over the Core Duo.  Remember that the Core Duo's predecessor originally had a 10 cycle L2 cache, but thanks to the new power saving technology and some other unmentionable (for now) changes to the cache, Core Duo's L2 now takes 14 cycles to access.  Despite the greater access time, it's important to note that Core Duo's L2 cache is four times as large as the G5's. 

The 1.83GHz Core Duo features a 64-bit wide 667MHz FSB, offering a similar 5.336GB/s of bandwidth.  The FSB connects the chip to a 945 Express MCH with a dual channel DDR2-667 memory controller, providing it with 10.6GB/s of memory bandwidth. However, the Intel based iMac only ships with a single SO-DIMM installed, meaning that it is only operating in single-channel mode - delivering 5.336GB/s of memory bandwidth. I didn't have a DDR2 SO-DIMM on hand to test whether or not installing a second one would actually enable dual channel mode. 

The 1.9GHz G5 features a bi-directional 64-bit wide 633MHz FSB, offering a total of 5.06GB/s of bandwidth.  The chip connects to a North Bridge that appears to have a dual channel DDR2-533 memory controller, which provides it with 4.264GB/s of memory bandwidth, thanks to only one channel being active. This means that the iMac G5 is slightly memory bandwidth starved.

In Johan's article, he uncovered that the G5 is in terrible need of a lower latency memory controller, with memory requests taking almost twice as long as on an Intel platform!  Whether it is the G5's FSB or its chipset's memory controller that is at fault is difficult to isolate, but needless to say, the comparison to the Core Duo isn't pretty:

Memory Access Latency - lmbench 2.5

It takes the G5 almost twice as long just to get data back from memory as the Core Duo. That means that the CPU has to waste around twice as many clock cycles as the Core Duo, which leads to higher power consumption and lower performance.  To make matters worse, the G5 only has a 512KB L2 cache, so it has to go to main memory more often than the Core Duo with its massive 2MB L2 cache.  While sticking with a 512KB L2 cache may have kept the CPU small, the G5 really needed a larger cache much earlier in its lifetime (either that or a better FSB/memory controller). 

The high latency memory access and slower memory bus is why the G5 suffers tremendously when it comes to memory bandwidth:

Memory Read Speed - lmbench 2.5

Memory Write Speed - lmbench 2.5

Although, it is worth noting that the G5 actually posts a higher memory write speed here than the Core Duo.  It's not easy to explain why, as it could very well be a compiler issue. Remember that here, we are relying on gcc 4.0 and not Intel's C compiler to extract the best performance out of their platforms.  Over time, you can expect that to change, but for a first showing, it's not terrible. 

IBM vs. Intel - Performance per Watt Floating Point and Branch Predictor Performance
Comments Locked

35 Comments

View All Comments

  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    Turning off one core leaves the full 2MB of cache for the other core to use since it is a shared L2.

    Take care,
    Anand
  • Eug - Tuesday, January 31, 2006 - link

    quote:

    Turning off one core leaves the full 2MB of cache for the other core to use since it is a shared L2.

    Take care,
    Anand

    Cool thanks.

    P.S. I have read elsewhere that the new iMac Core Duo uses less than half of the CPU's processing power to play back H.264 Hi-Def 1920x1080 video at a full 24 fps. If true, that's great, because my iMac 2.0 chokes on that. It plays back relatively smoothly, but only at about 12-15 fps.

    That bodes well for a future single-core Yonah Mac mini.

    Then again, probably not, considering that I suspect the iMac Core Duo does so well on H.264 playback because of its Radeon X1600. I'd doubt the Mac mini would get anything close to that any time soon.
  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    Max CPU utilization (across both CPUs) when playing a 1080p stream scaled to fit the screen is about 60%, but it usually hovers below 50%. I am not sure whether or not the X1600's H.264 decode acceleration is taken advantage of (I doubt it), I'm trying to find out now. Also remember that on the PC side, the X1600 will only accelerate up to 720p.

    Take care,
    Anand

  • Anand Lal Shimpi - Tuesday, January 31, 2006 - link

    I just confirmed with ATI, the X1600's H.264 decode acceleration is currently not supported under OS X. ATI is working with Apple on trying to get the support built in, but currently it isn't there.

    Take care,
    Anand
  • Eug - Tuesday, January 31, 2006 - link

    quote:

    I just confirmed with ATI, the X1600's H.264 decode acceleration is currently not supported under OS X. ATI is working with Apple on trying to get the support built in, but currently it isn't there.

    Thanks again for the info. That's actually good news in a way. Things are looking up for that single-core Yonah Mac mini HTPC.
  • andrep74 - Tuesday, January 31, 2006 - link

    Isn't performance/Watt a function of the CPU, not the platform?
  • Kyteland - Tuesday, January 31, 2006 - link

    That picture of Jobs doesn't say "PC vs Intel" it says "PowerPC vs Intel". Jobs is just standing in the way. He's comparing the old mac to the new mac.
  • Calin - Tuesday, January 31, 2006 - link

    You could think about it that way - but in the end, the buyer is interested on the total energy consumption/heat production (as this is what he pays for, and what he must get rid of).
    Have you heard of the Toyota D4D engine? It has a record of 2.4 liter (less than a gallon) diesel fuel used per a hundred kilometers (60 miles). However, the same engine on a Land Cruiser 4x4 all options will get you much less (four times less maybe).
    It doesn't worths talking about performance per watt at the processor level, it is better at the platform level.
  • BUBKA - Tuesday, January 31, 2006 - link

    Were these benches done with a USB 2.0 device plugged in?
  • Furen - Tuesday, January 31, 2006 - link

    I was under the impression that Intel was blaming Microsoft for that, so that would not apply to OSX, though if the driver works perfectly for every platform except Napa I'd guess its a hardware problem that MS will fix in software (which is well enough as long as it works). The power consumption difference is probably less than 10W anyway. It matters on a notebook but hardly matters with a desktop.

Log in

Don't have an account? Sign up now