The Clock Crossing Procedure and MCH Read Delay (tRD)

Have you ever stopped to wonder how the Memory Control Hub (MCH) manages to transfer data from the system bus (FSB) to the memory bus and vice versa? Maybe you questioned why it was not possible to run your memory at any speed you desired, independent of the selected FSB. While there have been some chipsets that provide this capability, performance was rather spotty at times depending on the particular settings in use. In fact, these systems often resulted in the creation of rather large ranges of bus speed and memory speed ratios that refused to function, stable or otherwise. The concept was certainly fair enough, but at the time a well-engineered implementation turned out to be far more difficult that anyone may have been willing to concede. It should not come as a surprise that Intel chipsets have only allow the use of certain validated memory ratios - those same numbers we have grown to love (or hate) like 1:1, 5:4, 3:2, and so forth.

Back in the days of relatively slow-paced memory performance improvements, Intel had no problems pumping out processors with default FSB values that made running memory at synchronous speeds an expensive proposition. Thus they created the "down" divider that allowed the memory subsystem to run at a bus speed below that of the system bus. This allowed for a period of innovation where processor bus architecture was able to improve without placing an undue burden on the memory industry to create reasonably priced modules capable of these increased speeds.

In a lot of ways this also prevented cost-prohibitive aspects of validating higher data transfer rates from ultimately affecting the market as a whole. Because Intel was not obligated to allocate additional funding for the completion of this expensive process, the savings (however small) could be passed on to the consumer. When coupled with the added benefit of allowing for the purchase and use of lower-binned memory, we can easily see why a move such as this helped stimulate the industry into a period of unprecedented growth.

Today's market is much different. Older signaling technologies like Advanced Gunning Transceiver Logic (AGTL+), used almost exclusively by Intel in their monolithic CPU-to-MCH interface architecture, have just about reached the practical limits. As such, "up" dividers, those that allow for memory bus speeds in excess of the system bus speed, are used almost exclusively by high-performance systems. Indeed, it would be hard to imagine ever being satisfied with a system able to make use of memory operating at transfer rates of up to DDR2-667, and no higher. As odd as it may seem, the next generation of memory (DDR3) is based entirely on the concept of running at speeds mandating the use of at least a 2:1 memory bus to system bus frequency ratio.

When examined from this aspect it becomes clear what DDR3 really is - an early look at the next generation's mainstream memory component. Only when Intel has removed the performance barrier of their antiquated FSB and loosely-coupled MCH will we be fully able to finally experience the real improvements that DDR3 promises to provide. But we digress - speculating about what could have been done better or things to come is trite. Instead, let's concentrate on what can be done now to improve the collective performance of our systems.

Below is a somewhat basic representation of the problem at hand (and later we will introduce the fairly elegant solution that Intel has crafted in order to tackle said problem). Are you still wondering what the problem is? Put briefly: designing an effective means for transferring data from one bus "domain" to the other, while still respecting the timing requirements for each bus.



We made mention earlier that we believe Intel's solution to be quite elegant. While this is true, the system is still far from perfect as it introduces a significant amount of extra data access latency to each and every memory read request. It might seem ironic that the design initially constructed in order to allow for higher memory speeds ends up being the very reason for the additional latency penalty incurred, but this is the case. Coincidentally, we can apply a similar accounting of the system's flaws when explaining why actualized system performance improvements are always far less than predicted when making use of the larger dividers.

Moving data between two simultaneously running parallel buses requires a fine level of timing control and a fair amount of programmable logic to be successful, especially if the scheme is to remain flexible. The graphic below should help to illustrate the concept. Four memory data clock cycles for every three system data bus cycles (a 4:3 DRAM:FSB memory ratio) means that the MCH can do little more than queue up data requested from the memory for transport to the system bus.

This process works to ensure that no FSB cycles are left unused, but it does little to improve actual memory throughput. Because of this, the gains seen are largely due to the increased transfer efficiency. For the same reason, sustained data rates, while improved, are made at the expense of requiring added latency associated with the transfer scheme. This has the affect of retarding the responsiveness of short bursts of random read activity. Some users [Ed: Androids in disguise?] even swear that they can tell when they are using an Intel system because of the suppression of the "snappy" experience that comes when using true low-latency systems.



In order to better quantify exactly how the MCH clock crossing procedure influences system performance, it might be helpful to take a step back and examine the different component parameters that contribute to a system's overall memory read (access) latency. The diagram below helps to illustrate just this point - we have identified each distinct operation that takes place during a CPU data request and the sum of the individual latencies that involved with each part of the process make up the total system memory access latency, as measured. As we will see, this time - expressed in nanoseconds (ns) - is critical to overall system performance.



Although the user has at least minor control over just about every aspect of the system as modeled above, there are a few settings that when altered have the potential to drastically impact performance for the better (or worse, depending on how well you understand what you are doing). One of the most well known is the selection of primary memory timings (tCL-tRCD-tRP-tRAS). You may be surprised to learn that the effect of properly setting these values, represented by part "E", is quite insignificant when compared to your choice of tRD (the singular consideration in developing the latency associated with part "F"). This will be our focus for the remainder of our discussion. We have already lightly touched on this mysterious setting on the prior page.

The Basics of Strap Selections, Dividers and Derived Memory Speeds MCH Read Delay Scaling and Default tRD Settings for Each Strap
POST A COMMENT

73 Comments

View All Comments

  • DragonStefan - Tuesday, June 09, 2009 - link

    Hello all.

    I have:
    - motherboard: ASUS Rampage Formula (Intel X48) (logical) and
    - Corsair XMS2 Dominator Series 2x2048MB Kit PC2-8500 CL5-5-5-15 (TWIN2X4096-8500C5D)

    Should i go for the following setup in bios:
    FSB: 400
    tRD: 5
    Trd: 12,5
    Divider: 3:2
    tCL: 5
    VDDR: High
    Allowed: Yes.

    Or should i go for a different setup?
    If i understand correctly, this is possible..
    What do i forget?
    I made the calculation, and the answer of the Question if it is possible Yes or No, is 1,67 > 1,33. 1,67 is higher than 1,33. So yes..

    Greets From DS
    Reply
  • danderson00 - Thursday, October 23, 2008 - link

    Hi,

    I realise this article is quite old now, but found it very useful for tuning my Rampage Formula. Have achieved significantly increased memory performance from this setting. The board seems to configure them fairly well on the auto setting, but there are some cases where manually tweaking them can give a good performance boost.

    I am curious about one thing - I would have thought that running a 1:1 divider would allow the lowest tRD value as the two clocks are running at the same speed. Data should be able to be passed between the two buses without delay, whereas if the memory clock is running faster, it might need the delay to prevent 'overlapping' with the previous data transfer. However, according to the formula (and indeed a couple of quick tests confirm it), a 1:1 divider is actually the worst for tRD, the wider the ratio the better.

    Any ideas why this is?

    Great article anyways!

    Dale
    Reply
  • Maxxxx - Sunday, June 14, 2009 - link

    Yes, you are right about 1:1 divider and tRD. This article incorrectly describes work of the memory controller. Reply
  • geok1ng - Sunday, August 03, 2008 - link

    I have a P5WDH a 975X mobo. if i am understand correctly this chipset would apply the TRD from the basic table and my best options would be a Trd of 6 or 8? Is there any way of knowing what Trd number is being applyed? I am running an E4300 at 9x329Mhz and 4 1GB sticks of DDR1100 at 987Mhz Cas 5/6/6/18/21. everest gave me a memo latency of 55.5ns ( better than quite a few 45nm/P35 owners here). Any use going for the Trd 6 option (8:5 divider i believe) since neither my my mobo can reach FSB above 1333 nor my memo can go above 1000mhz and keeping CAS 5 ( it is rated at cas 5/7/7/25/32 but the P5WDH just cant go above 5/6/6/6/18/21). Using a 8:5 divider bellow 1000Mhz memory mean runing the CPU at 2,7Ghz...and using crazy DDR/MCH voltages. Reply
  • Sarsbaby - Wednesday, July 16, 2008 - link

    Wow, I just learned alot, I think.
    Very nice article! Well written and presented.
    I'll definately have to clear my CMOS for this one.
    Reply
  • jamstan - Friday, July 11, 2008 - link

    I would have liked a review of the board itself instead of page after page about clocking. I have this board ready to build my rig today with 2 4870s in CF and I would have liked to read about the crossfire setup, the sound card, etc instead of page after page about clocking. Althou informative I feel the review should have remained focused on the board itself and the clocking crap should have been in a different article. It's a nice feature on this board but its like doing a review of a Corvette and wasting the whole review on its transmission. Reply
  • Sarsbaby - Wednesday, July 16, 2008 - link

    You know, this is only one of many reviews for this board, and only one of many on this forum.
    Try some more searching, and maybe educate yourself more before calling most of this article "Crap". This is probably one of the most useful articles on this motherboard I have found.
    With all these new options open to ROG owners, i'm glad someone is taking the time to explain what they mean and why we have paid for them.

    And have you ever re-built a transmision? Or tuned an LSD? It's alot more complicated than you think apparently.
    Reply
  • DEFLORATOR - Tuesday, May 27, 2008 - link

    Why does the author says that the board revision is 1.03G while it is clearly seen on the photoes that it's 1.00G (imprinted between PCIe slots)? Please owners of the board confirm that 1.00G is the latest revision of Rampage Formula (gonna order that tomorrow) Reply
  • viqarqadir2 - Monday, April 21, 2008 - link

    Hello
    I am very new to this stuff and havent been able to make a lot of sense of the configurations despite reading the article several times.

    I have the following setup:
    Intel Q6600@2.4 Ghertz
    Kingston Ram 8500 (5.5.15) 1X4 Gigs - 1066Mh
    XFX Geforce 8800GTX XXX edition. (I guess this doesnt matter)

    What sort of configuration should I apply?

    I also wanted to know if someone has had problems with the MB temperature and whether 51 Centigrades after playing STALKER for about one hour is normal. Any help will be appreciated.
    Reply
  • viqarqadir2 - Thursday, April 24, 2008 - link

    hmm...
    I dont know if I've done something wrong but for some reason, 3dMark is showing the memory at 1.9 Ghertz. It's a DDR2 rated at 1066 and I am running it at (according to my calculation) 1000.
    The pc feels ridiculously fast. All MB lights are green. The 3d Mark app is giving a score of about 11000. I am not a techie but is it possible that I have discovered something? Is there a way to post screenshots in the comments area?
    Reply

Log in

Don't have an account? Sign up now