Real-World Results: What Does a Lower tRD Really Provide?

Up until this point we have spent a lot of time writing about the "performance improvement" available by changing just tRD. First, let's define the gain: lower tRD settings result in lower associated TRD values (at equivalent FSB clocks), which allow for a lower memory read latency time, ultimately providing a higher memory read speed (MB/s). Exactly how a system tends to respond to this increase in available bandwidth remains to be seen, as this is largely dependent on just how sensitive the application/game/benchmark is to variations in memory subsystem performance. It stands to reason that more bandwidth and lower latencies cannot possibly be a bad thing, and we have yet to encounter a situation in which any improvement (i.e. decrease) in tRD has ever resulted in lower observed performance.

EVEREST - a popular diagnostics, basic benchmarking, and system reporting program - gives us a means for quantifying the change in memory read rates experienced when directly altering tRD though the use of its "Cache & Memory Benchmark" tool. We have collected these results and present them below for your examination. The essential point to remember when reviewing these figures is that all of this data was collected using memory speeds and settings well within the realm of normal achievement - an FSB of 400MHz using a 5:4 divider for DDR2-1000 with 4-4-4-10 primary timings at a Command Rate of 2N. The only change made between data collection runs was a modification to tRD.


Memory
Read Bandwidth - Variable tRD

Using the default tRD of 12, our system was able to reach a maximum memory read bandwidth value of 7,597 MB/s - a predictable result considering the rather relaxed configuration. Tightening tRD all the way to a setting of 5 provides us with dramatically different results: 9,166 MB/s, more than 20% higher total throughput! Keep in mind that this was done completely independent of any memory setting adjustment. There is a central tenet of this outcome: because the MCH is solely responsible for delivering the additional performance gains, this concept can be applied to any system, regardless of memory type or quality.

The next graph shows how memory access (read) latency changes with each tRD setting. As we can see, the values march steadily down as we continue to lower tRD. We can also note that the change in latency between any two successive steps is always about 2.5ns, the Tcycle value for 400MHz FSB and the expected equivalent change in TRD for a drop in tRD of one. No other single memory-related performance setting has the potential to influence a reduction in read latency of this magnitude, not even the primary memory timings, making tRD unique in this respect. For this reason, tRD is truly the key to unlocking hidden memory performance, much more so than the primary memory timings traditionally associated with latencies.


Memory
Read Latency - Variable tRD

We realized our best performance by pushing the MCH well beyond its specified range of operation. Not only were we able overclock the controller to 450MHz FSB but we also managed to maintain a tRD of 5 (for a TRD of about 11.1ns) at this exceptional bus speed. Using the 3:2 divider and loosening the primary memory timings to 5-5-5-12 allowed us to capture some of the best DDR2 memory bandwidth benchmarks attainable on an Intel platform. As expected, our choice of tRD plays a crucial role in enabling these exceptional results. Screenshots from EVEREST show just how big a difference tRD can make - we have included shots using tRD values of 7, 6, and 5.







A considerable share of the memory read performance advantage that AMD-based systems have over Intel-based systems can be directly attributed to the lower memory latency times made possible by the design of the AMD processor's on-die memory controller. So far we have done a lot to show you why reducing TRD to a lower level can make such a positive impact on performance; knowing this you might tend to believe that the optimal value would be about zero, and you would be right. Eliminating the latency associated with the MCH Read Delay would further reduce total system memory read latency by another 12.5ns (as modeled by the results above).

Given this, Intel-based systems would perform memory read operations about on par with the last generation AMD-based systems. Although not the only reason, this is one of the main motivations behind Intel's decision to finally migrate to a direct point-to-point bus interface not unlike that which has been historically attributed to AMD. Removing the middleman in each memory access operation will do wonders for performance when Intel's next step in 45nm process technology, codenamed Nehalem, hits the shelves in ~Q4'08. Until then we'll have to try to do the best with what we've got.

MCH Read Delay Scaling and Default tRD Settings for Each Strap The Rules of Working with tRD: What's Allowed and What Isn't
POST A COMMENT

73 Comments

View All Comments

  • DragonStefan - Tuesday, June 09, 2009 - link

    Hello all.

    I have:
    - motherboard: ASUS Rampage Formula (Intel X48) (logical) and
    - Corsair XMS2 Dominator Series 2x2048MB Kit PC2-8500 CL5-5-5-15 (TWIN2X4096-8500C5D)

    Should i go for the following setup in bios:
    FSB: 400
    tRD: 5
    Trd: 12,5
    Divider: 3:2
    tCL: 5
    VDDR: High
    Allowed: Yes.

    Or should i go for a different setup?
    If i understand correctly, this is possible..
    What do i forget?
    I made the calculation, and the answer of the Question if it is possible Yes or No, is 1,67 > 1,33. 1,67 is higher than 1,33. So yes..

    Greets From DS
    Reply
  • danderson00 - Thursday, October 23, 2008 - link

    Hi,

    I realise this article is quite old now, but found it very useful for tuning my Rampage Formula. Have achieved significantly increased memory performance from this setting. The board seems to configure them fairly well on the auto setting, but there are some cases where manually tweaking them can give a good performance boost.

    I am curious about one thing - I would have thought that running a 1:1 divider would allow the lowest tRD value as the two clocks are running at the same speed. Data should be able to be passed between the two buses without delay, whereas if the memory clock is running faster, it might need the delay to prevent 'overlapping' with the previous data transfer. However, according to the formula (and indeed a couple of quick tests confirm it), a 1:1 divider is actually the worst for tRD, the wider the ratio the better.

    Any ideas why this is?

    Great article anyways!

    Dale
    Reply
  • Maxxxx - Sunday, June 14, 2009 - link

    Yes, you are right about 1:1 divider and tRD. This article incorrectly describes work of the memory controller. Reply
  • geok1ng - Sunday, August 03, 2008 - link

    I have a P5WDH a 975X mobo. if i am understand correctly this chipset would apply the TRD from the basic table and my best options would be a Trd of 6 or 8? Is there any way of knowing what Trd number is being applyed? I am running an E4300 at 9x329Mhz and 4 1GB sticks of DDR1100 at 987Mhz Cas 5/6/6/18/21. everest gave me a memo latency of 55.5ns ( better than quite a few 45nm/P35 owners here). Any use going for the Trd 6 option (8:5 divider i believe) since neither my my mobo can reach FSB above 1333 nor my memo can go above 1000mhz and keeping CAS 5 ( it is rated at cas 5/7/7/25/32 but the P5WDH just cant go above 5/6/6/6/18/21). Using a 8:5 divider bellow 1000Mhz memory mean runing the CPU at 2,7Ghz...and using crazy DDR/MCH voltages. Reply
  • Sarsbaby - Wednesday, July 16, 2008 - link

    Wow, I just learned alot, I think.
    Very nice article! Well written and presented.
    I'll definately have to clear my CMOS for this one.
    Reply
  • jamstan - Friday, July 11, 2008 - link

    I would have liked a review of the board itself instead of page after page about clocking. I have this board ready to build my rig today with 2 4870s in CF and I would have liked to read about the crossfire setup, the sound card, etc instead of page after page about clocking. Althou informative I feel the review should have remained focused on the board itself and the clocking crap should have been in a different article. It's a nice feature on this board but its like doing a review of a Corvette and wasting the whole review on its transmission. Reply
  • Sarsbaby - Wednesday, July 16, 2008 - link

    You know, this is only one of many reviews for this board, and only one of many on this forum.
    Try some more searching, and maybe educate yourself more before calling most of this article "Crap". This is probably one of the most useful articles on this motherboard I have found.
    With all these new options open to ROG owners, i'm glad someone is taking the time to explain what they mean and why we have paid for them.

    And have you ever re-built a transmision? Or tuned an LSD? It's alot more complicated than you think apparently.
    Reply
  • DEFLORATOR - Tuesday, May 27, 2008 - link

    Why does the author says that the board revision is 1.03G while it is clearly seen on the photoes that it's 1.00G (imprinted between PCIe slots)? Please owners of the board confirm that 1.00G is the latest revision of Rampage Formula (gonna order that tomorrow) Reply
  • viqarqadir2 - Monday, April 21, 2008 - link

    Hello
    I am very new to this stuff and havent been able to make a lot of sense of the configurations despite reading the article several times.

    I have the following setup:
    Intel Q6600@2.4 Ghertz
    Kingston Ram 8500 (5.5.15) 1X4 Gigs - 1066Mh
    XFX Geforce 8800GTX XXX edition. (I guess this doesnt matter)

    What sort of configuration should I apply?

    I also wanted to know if someone has had problems with the MB temperature and whether 51 Centigrades after playing STALKER for about one hour is normal. Any help will be appreciated.
    Reply
  • viqarqadir2 - Thursday, April 24, 2008 - link

    hmm...
    I dont know if I've done something wrong but for some reason, 3dMark is showing the memory at 1.9 Ghertz. It's a DDR2 rated at 1066 and I am running it at (according to my calculation) 1000.
    The pc feels ridiculously fast. All MB lights are green. The 3d Mark app is giving a score of about 11000. I am not a techie but is it possible that I have discovered something? Is there a way to post screenshots in the comments area?
    Reply

Log in

Don't have an account? Sign up now