As we mentioned before, the AMB interface is only 24-bits wide thanks to its high speed serial nature, but there's far more detail to this bus than meets the eye. The AMB bus is split into a 14-bit read bus ("Northbound" lanes) and a 10-bit write bus ("Southbound" lanes), with these buses operating at 6 times the DDR2 frequency (e.g. if you're using DDR2-667 FB-DIMMs, then the AMB runs at 667MHz x 6 or 4GHz). By having a dedicated read and write bus, reads and writes can happen simultaneously thus increasing performance in some circumstances. The read bus is a bit wider than the write bus since more often than not your system is reading from memory than writing to it.

In each bus, there are no dedicated lines for addresses, commands and data, all three types of signals are sent over the same pins. In conventional parallel interfaces, the address of the memory request is placed on a dedicated set of address pins and the data at that address is then placed on another set of data pins. With FBD, the data is sent in packets or frames (much like network traffic); each frame generally consists of either address/control signals or command and data signals. The data frames are 15 bytes large for writes and 21 bytes large for reads, but not all of that is raw data, some of it is ECC data that we don't normally look at when comparing bandwidths, so we'll have to strip that out.

For northbound traffic (reads), each frame is 12 cycles long and each frame that contains data can have a maximum of 16-bytes of data, meaning that our peak bandwidth with DDR2-667 FB-DIMMs is 5.34GB/s. For southbound traffic (writes), each frame is still 12 cycles long but only 8 bytes are transferred per frame, giving us a peak data bandwidth of 2.67GB/s.

Total data bandwidth then weighs in at just over 8GB/s for a single channel, but also keep in mind that not every frame will be a data frame, so the effective bandwidth will be noticeably lower. What we're touching on here is one of the major drawbacks to serial buses: there's greater overhead than with a parallel bus. Although there is more peak bandwidth on the AMB bus than there is between the AMB and its DDR2 devices (8GB/s vs. 5.34GB/s), there may actually be less peak read bandwidth once you factor in the overhead of the serial bus. There's of course less peak write bandwidth available, but writes take much longer to complete and generally can't ever reach peak bandwidth numbers. At the end of the day, despite the best efforts, there may be some situations where you are actually bandwidth limited by your AMB in a FBD system. How frequently those situations occur and what the average performance impact is are unfortunately both very complicated questions to answer and beyond the scope of this already long article.

The FBD proposition gets a little less appetizing when you look at the other major aspect of memory performance: latency. Since the protocol calls for point-to-point communication between AMBs, there's an additional latency penalty for each AMB that has to be contacted in the search for the right FB-DIMM to fulfill the read/write request. Intel states that the additional delay is in the range of 3 - 5 ns per FB-DIMM, meaning that a configuration of 8 x 1GB FB-DIMMs will be slower than 4 x 2GB FB-DIMMs. The argument here in favor of FBD is that even though you give up some latency, you make up for it in the ability to cram more memory channels on your memory controller and support configurations with more DIMMs.

There's one more issue worth talking about and that is power consumption. The AMB on each FB-DIMM has a pretty big job, converting the 4GHz serialized memory requests into 667MHz parallel requests that can be serviced by regular DDR2 memories. This translation process consumes quite a bit of power and thus causes the AMB to dissipate a noticeable amount of heat. The Mac Pro page on Apple's website states the following about the FB-DIMMs it uses:

"To help dissipate heat, every Apple DIMM you purchase for your Mac Pro comes with its own preinstalled heat sink. This unique heat sink lets fans run slower - and quieter - yet keeps the memory cool enough to run at full speed."

That heatsink is made necessary by the AMB on each FB-DIMM, which seems to dissipate somewhere between 3 - 6W. The reason there's a range is because how active the AMB is depends on how close it is to the memory controller. The first AMB in the chain will have to service all requests from the main memory controller, passing them along as needed, while the last AMB in the chain will only receive those requests that are specifically targeted to its module. With 8 FB-DIMM slots in the Mac Pro, you're looking at up to another ~40W of power if you've got all slots populated.

Despite being a lower pincount bus, current FB-DIMMs use the same number of pins as DDR2 DIMMs. The reason being that each AMB needs two sets of buses, one to communicate with the FB-DIMM before it, and one to communicate with the module after it, thus there are approximately 120 signaling pins needed for each AMB. Once you add your power and ground pins, not to mention your reserved pins for future use you're not that far off of the 240-pins used on current desktop DDR2 DIMMs. Rather than introducing a brand new connector and module design, FB-DIMMs simply take the current DDR2 DIMM design and key it differently to only work in FB-DIMM slots. Remember that the signal routing from the chipset to the first memory slot still only uses 69 signaling pins since it doesn't have to communicate with anything "before" it in the chain, so you do still get the benefits of a lower pincount interface.


Front and back of a FB-DIMM

The major benefit that the Mac Pro seems to get from the use of FB-DIMMs is that its memory bus and FSBs can offer identical bandwidths at 21.3GB/s (ignoring the unknowns we discussed earlier about the efficiency of FBD). By using a lower pincount interface, Intel was able to fit four FBD channels on its 5000 series chipset and thus offer the bandwidth equivalent of a 256-bit wide DDR2 memory controller. However the additional memory bandwidth comes at the high cost of additional latency, power consumption and more expensive DIMMs.

There are a couple of things you can do to maximize performance and minimize the cost of additional memory on your Mac Pro, and it starts with the number of FB-DIMMs you configure your system with. The Mac Pro ships with a default configuration of 2 x 512MB FB-DIMMs, unfortunately that means that you're only using two of the four available memory channels, cutting your peak theoretical memory bandwidth in half. You'll want to upgrade to at least four FB-DIMMs so that you can run in quad-channel mode, in the coming weeks we'll be running some tests to figure out exactly how much additional performance you'll gain by doing that and if it's noticeable or not.

If you do find yourself filling all 8 memory slots on the Mac Pro, we would suggest trying to move to 4 higher density modules instead. Remember that you gain an additional 3 - 5ns of latency (at minimum) with each FB-DIMM hop, so the fewer FB-DIMMs you have the lower your worst case scenario memory latency will be. But since you still want to be running in quad-channel mode you don't want to drop below four FB-DIMMs, making four the magic number with the Mac Pro.

As always, Apple's pricing for memory upgrades is much higher than what you can get elsewhere. We are going to try and test memory compatibility once our Mac Pro system arrives, but there's no reason that FB-DIMMs that work on current Xeon motherboards shouldn't work in the Mac Pro. We would recommend holding off on ordering a Mac Pro with any of Apple's memory upgrades until we can verify that 3rd party memory will work, if it does, the table below will give you an idea of the savings possible:

 
Memory Upgrade
Apple's Price
Newegg's Price

2 x 512MB

$300

$210

4 x 1GB

$1100

$676

8 x 1GB
$2500
4 x 2GB
$2700
8 x 2GB
$5700

The prices above were for Kingston ECC DDR2-667 FB-DIMMs, which may or may not work on the Mac Pro. We will find out for sure in the coming weeks but the price differential is great enough that you may want to hold off on ordering a lot of memory just in case you can get it a lot cheaper from elsewhere.

Understanding Fully Buffered DIMMs Drive Options
POST A COMMENT

33 Comments

View All Comments

  • michael2k - Friday, August 11, 2006 - link

    fb dimms, found in Mac Pros, are fast serial ram using DDR chips. Reply
  • OddTSi - Friday, August 11, 2006 - link

    Perhaps you missed the part where I said "non-ad hoc."

    I know what FB-DIMMs are, but they're more of a band-aid fix or a hack than a ground-up design.
    Reply
  • michael2k - Friday, August 11, 2006 - link

    Maybe you misused "ad hoc". Ad hoc means unplanned and temporary. Why do you think fb-dimm is a band-aid or a hack? Because the RAM chips themselves are not serial in nature?

    I mean, are you asking "Is there any designs or plans for serial memory chips?"

    To be cost effective you either have to use existing infrastructure, or create a logical evolution/adaptation of the existing infrastructure.
    Reply
  • AdvanS13 - Thursday, August 10, 2006 - link

    does anyone know apples market segment share for dual processor workstations? Reply
  • peternelson - Thursday, August 10, 2006 - link


    1) I think a gpu swap will need drivers or firmware updating.

    2) To buy a commodity sata drive is good but it MIGHT require the apple carrier in order to fit into the chassis.

    3) You compare apple memory with commodity FBDIMM.
    In the table you quote Apple's UPGRADE (ie on top of base machine) price against the complete cost of the memory. This makes Apple's pricing appear better than it is. Even then it looks like a ripoff, but also consider they are charging you for the base memory in with the basic system price.

    Reply
  • aliasfox - Friday, August 11, 2006 - link

    As far as I've read, the Mac Pros come with carriers in all four bays - carriers that don't need cables (ribbon or round). Didn't know the backs of SATA drives were similar enough that they could just be plugged in. Reply
  • JeffDM - Saturday, August 12, 2006 - link

    It's not stated in the Anand article, but all drive carriers are included. Apple's Tech Specs page says it, although it could have been more clearly stated. For what it's worth, I think it is worth downgrading the stock drive to 160GB and spending that difference toward additional drives. Going from 250GB to 160GB saves $75, that price difference would buy you a 250GB SATAII drive. Reply
  • JAS - Thursday, August 10, 2006 - link

    It appears that some people managed to receive their Mac Pro quickly.

    http://www.macworld.com/weblogs/macword/2006/08/ma...">http://www.macworld.com/weblogs/macword/2006/08/ma...
    Reply
  • IntelUser2000 - Wednesday, August 09, 2006 - link

    http://www.tomshardware.co.uk/2006/06/26/xeon_wood...">http://www.tomshardware.co.uk/2006/06/2...odcrest_...

    Check out the memory bandwidth benchmark. Quad channel is needed to match Core 2 systems' memory bandwidth using only dual channel. Dual channel on Xeon 5100 drops to approximately 68% of the quad channel bandwidth. That in numbers is 3.8GB/sec. Not to mention Xeon 5100 series has 25% higher memory FSB. It needs 25% higher FSB and 2x memory channels to achieve the same memory bandwidth numbers the desktop Core 2's can. According to memory latency benchmarks, the latency is also significantly higher on the Woodcrest than Conroe's platform.

    The chipset on the Xeon 5100 is worse in performance than the chipset on the Core 2. It will NOT beat Core 2 because of the 25% higher FSB, it will rather be SLOWER. Not to mention FB-DIMM makes it even slower.

    SpecFP benchmarks also support this:
    Xeon 5160(3GHz/1333MHz FSB/4MB L2/8x1024MB FB-DIMM DDR2-667): 2775
    Core 2 Extreme X6800(2.93GHz/1066MHz FSB/4MB L2/2x1024MB DDR2-800 5-5-5-15): 3046

    Core 2 Extreme gets almost 10% higher in the memory substem portion of the SpecCPU 2K. benchmark, even though it has 2.2% less clock speed than the Xeon 5160.

    Look here: http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...

    "ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory."

    Xeon 5160 got 70-76ns in ScienceMark, what did Core 2 get?? It got 36.75. Xeon 5160's ScienceMark latency is higher than Pentium Extreme Edition 965's latency, and twice the latency of Core 2.

    Everest shows the same thing: http://pc.watch.impress.co.jp/docs/2006/0801/graph...">http://pc.watch.impress.co.jp/docs/2006/0801/graph...

    Xeon 5160: 99.1
    Opteron 285: 57.7(seems higher than FX-62 results but this system uses Registered DDR DIMM, you can see in AT's results that AM2 further lowers latency)

    Core 2 Extreme: 59.8
    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    Reply
  • dcalfine - Wednesday, August 09, 2006 - link

    Overall, I think this is a very well-designed system, and in price comparisons with Dell, the Mac Pro came out over a thousand dollars cheaper for a similar system. I may be a fanboy, but I can admit that Apple still has some work to do here. As good as the Mac Pro is, I think Apple needs to start having better video options. For starters, the X500 chipset is used, which means that there's only one 16X PCIe lane. Also, Apple should get closer with Nvidia and start working in SLI, as well as FX4500X2 and FX5500. A Vanilla FX4500 just doesn't make the cut anymore. Also, the X500 chipset supports one 133X PCIX slot, which, I think, Apple should have incorporated, since not every expansion card has moved to the PCIe format.

    I'd like to see some speed comparisons between the mac pro and some pcs. I imagine that in most (if not all) test the Mac Pro will come out slightly slower than the PC due to the bells and whistles of Mac OS X, but I'd like to see just how much slower it runs, and how it runs in Boot Camp running Windows/Linux.

    But, yeah. Good goin', Apple!
    And AnandTech, get your hans on one of these ASAP!
    Reply

Log in

Don't have an account? Sign up now