Memory Performance

We'll start off our look at the Mac Pro's performance with some low level memory tests, since arguably the most controversial aspect of the Mac Pro is its use of Fully Buffered DIMMs.  For more information about FB-DIMMs be sure to read our original article on the Mac Pro

L2 Cache Latency

The G5 had a very quick 12-cycle L2 cache, which gives it a slight performance advantage compared to the 14-cycle L2 of the Xeons in the Mac Pro.  Access latency is only one part of the puzzle however, as the G5s benchmarked here only had a 512KB L2 cache (the G5 later got an upgrade to a 1MB cache) while the Xeons in the Mac Pro have a 4MB L2 cache per chip.  The G5 had a slightly faster L2, but you can reach higher clocks with the Xeon thus minimizing the effective latency and you can fit more data into the larger L2. 

Memory Access Latency

And here we see the real killer with FB-DIMMs; although the Mac Pro boasts lower latency memory accesses than the PowerMac G5, it actually takes longer to access main memory than the Core Duo processor in the MacBook Pro.  This is much worse than it sounds once you take into account the fact that the MacBook Pro features a 667MHz FSB compared to the 1333MHz FSB (per chip) used in the Mac Pro. 

We can further put things in perspective by looking at memory latency under Windows XP, compared to Intel's Core 2 processor.  Remember that the Core 2 is identical to the Xeons in the Mac Pro, the difference being that the chipset uses regular DDR2 memory instead of DDR2-667 FB-DIMMs.  Note that for our Core 2 system in the comparison below we ran the memory at DDR2-667 at 5-5-5-15 timings as well as DDR2-800 at 4-4-4-12 to provide apples-to-apples as well as apples-to-fastest comparisons. 

 CPU Everest
CPU-Z 1.35 (8192KB, 256-byte stride) Everest READ Everest WRITE
Apple Mac Pro 2.66GHz (DDR2-667 FB-DIMM Quad Channel) 100 ns 87.4 ns 4292 MB/s 3759 MB/s
Apple Mac Pro 2.66GHz (DDR2-667 FB-DIMM Dual Channel) 105.8 ns 92.3 ns 4141 MB/s 3096 MB/s
Intel Core 2 Duo E6700 2.66GHz (DDR2-800 4-4-4-12 Dual Channel) 59.9 ns 52.8 ns 7413 MB/s 4859 MB/s
Intel Core 2 Duo E6700 2.66GHz (DDR2-667 5-5-5-15 Dual Channel) 68.9 ns 59 ns 6782 MB/s 4858 MB/s

 

It's not Apple's fault, but FB-DIMMs absolutely kill memory latency; even running in quad channel mode, the FB-DIMM equipped Mac Pro takes 45% more time to access memory than our DDR2 equipped test bed at the same memory frequency.  Things don't get any prettier when we look at memory bandwidth either.

Remember the overhead we were worried about with the serialization of parallel memory requests?  With four FBD channels, the best we're able to see out of the Mac Pro is 4.292GB/s, compared to the 6.782GB/s of bandwidth our dual channel Core 2 testbed is able to provide.  The efficiency table below says it all:

 CPU Peak Theoretical Bandwidth
Everest READ Efficiency
Apple Mac Pro 2.66GHz (DDR2-667 FB-DIMM Quad Channel) 21.3GB/s 4.292GB/s 20%
Apple Mac Pro 2.66GHz (DDR2-667 FB-DIMM Dual Channel) 10.67GB/s 4.141GB/s 38.8%
Intel Core 2 Duo E6700 2.66GHz (DDR2-800 4-4-4-12 Dual Channel) 12.8GB/s 7.413GB/s 57.9%
Intel Core 2 Duo E6700 2.66GHz (DDR2-667 5-5-5-15 Dual Channel) 10.67GB/s 6.782GB/s 63.6%

 

FB-DIMMs are simply not good for memory performance; the added capacity allowed by having 8 FB-DIMM slots on the Mac Pro had better be worth it, because if Apple were to release a Core 2 based Mac chances are that it could give the Mac Pro a run for its money in a number of memory sensitive tasks. 

The Test Dual vs. Quad Channel
Comments Locked

96 Comments

View All Comments

  • tmohajir - Thursday, August 17, 2006 - link

    I had the same question. I was debating whether to by 2 x 512MB or 1GB, and then thought it might affect performance if I went with the 1GB sticks. I think for now the best bet would be to buy 2 512s so that each branch has a channel with the same amount of memory. Then if I want to upgrade later, move all 4 512s to riser 1, and buy 2 1GB dimms when the price drops a little more and stick them in riser 2. So that way you still have 2GB total per branch.
  • dborod - Thursday, August 17, 2006 - link

    I decided to order my MacPro with 4 x 512 MB dimms so as to be able to fully utilize the available memory bandwidth. It seemed the easiest and safest approach for now.
  • dborod - Thursday, August 17, 2006 - link

    I decided to order my MacPro with 4 x 512 MB dimms so as to be able to fully utilize the available memory bandwidth. It seemed the easiest and sa
  • Questar - Wednesday, August 16, 2006 - link

    Why use MP3 encoding for performace testing in a multi cpu environment? MP3 encoding is not very threadable, and most likely is not threaded to any great extent in iTunes.
  • Griswold - Thursday, August 17, 2006 - link

    Somebody obviously has never used the multithreaded encoder of the LAME MT project. I see gains of up to 50% with that. Sure, that may not be relevant for a mac pro user, but it is proof that MP3 encoding benefits from SMT.
  • Questar - Thursday, August 17, 2006 - link

    Griswald stikes again.

    Yes I've heard of LAMEMT. So how's that sound quality you're getting without a bit resevoir? Pretty crappy I'll bet.
  • Griswold - Friday, August 18, 2006 - link

    Oh give me a break nutsack. Dont pretend you know what you're talking about here, as it doesnt match your first (false) post - you obviously never used LameMT. Disabling bit reservoir may come with a certain loss of quality, but its not nearly as much as you (or the poster above) want it to make to be. I'm willing to bet 95% of the people using mp3 wont notice the difference.

    I'm listening to the same song encoded with standard Lame and LameMT and the quality is virtually the same. Of course, you'll now say your ears are so much better, you got so much better audio equipment and what not.. but meh, it's just questdork talking.
  • michael2k - Saturday, August 19, 2006 - link

    The same 95% of the population who purchase tracks from the iTMS I bet :)
  • Questar - Friday, August 18, 2006 - link

    Thanks for the best laugh I've had all week!
    I really needed it!!
  • saratoga - Thursday, August 17, 2006 - link

    quote:

    Somebody obviously has never used the multithreaded encoder of the LAME MT project. I see gains of up to 50% with that.


    Yeah but you also lose quality, so very few people use it.

    quote:

    Sure, that may not be relevant for a mac pro user, but it is proof that MP3 encoding benefits from SMT.


    Not exactly. LAMEMT is only multithreaded when you use parts of the MP3 standard that can be multithreaded. Typical MP3 encoding as done by lame, itunes, xing, etc simply can not be multithreaded. LAMEMT can be multithreaded because it disables certain features that are incompatable and then implements software pipelining.

    The LAME devs have talked about trying to work around this problem in the past, but so far most people seem to think its just not worth the effort because the speed up is much worse then just running two copies of LAME (which gives a 100% speedup verses the 50% you saw), and of course the unresolved questions about just how badly quality would be hurt by rewriting LAME profiles from scratch to use the approach in LAMEMT.

Log in

Don't have an account? Sign up now