Real-World Results: What Does a Lower tRD Really Provide?

Up until this point we have spent a lot of time writing about the "performance improvement" available by changing just tRD. First, let's define the gain: lower tRD settings result in lower associated TRD values (at equivalent FSB clocks), which allow for a lower memory read latency time, ultimately providing a higher memory read speed (MB/s). Exactly how a system tends to respond to this increase in available bandwidth remains to be seen, as this is largely dependent on just how sensitive the application/game/benchmark is to variations in memory subsystem performance. It stands to reason that more bandwidth and lower latencies cannot possibly be a bad thing, and we have yet to encounter a situation in which any improvement (i.e. decrease) in tRD has ever resulted in lower observed performance.

EVEREST - a popular diagnostics, basic benchmarking, and system reporting program - gives us a means for quantifying the change in memory read rates experienced when directly altering tRD though the use of its "Cache & Memory Benchmark" tool. We have collected these results and present them below for your examination. The essential point to remember when reviewing these figures is that all of this data was collected using memory speeds and settings well within the realm of normal achievement - an FSB of 400MHz using a 5:4 divider for DDR2-1000 with 4-4-4-10 primary timings at a Command Rate of 2N. The only change made between data collection runs was a modification to tRD.


Memory
Read Bandwidth - Variable tRD

Using the default tRD of 12, our system was able to reach a maximum memory read bandwidth value of 7,597 MB/s - a predictable result considering the rather relaxed configuration. Tightening tRD all the way to a setting of 5 provides us with dramatically different results: 9,166 MB/s, more than 20% higher total throughput! Keep in mind that this was done completely independent of any memory setting adjustment. There is a central tenet of this outcome: because the MCH is solely responsible for delivering the additional performance gains, this concept can be applied to any system, regardless of memory type or quality.

The next graph shows how memory access (read) latency changes with each tRD setting. As we can see, the values march steadily down as we continue to lower tRD. We can also note that the change in latency between any two successive steps is always about 2.5ns, the Tcycle value for 400MHz FSB and the expected equivalent change in TRD for a drop in tRD of one. No other single memory-related performance setting has the potential to influence a reduction in read latency of this magnitude, not even the primary memory timings, making tRD unique in this respect. For this reason, tRD is truly the key to unlocking hidden memory performance, much more so than the primary memory timings traditionally associated with latencies.


Memory
Read Latency - Variable tRD

We realized our best performance by pushing the MCH well beyond its specified range of operation. Not only were we able overclock the controller to 450MHz FSB but we also managed to maintain a tRD of 5 (for a TRD of about 11.1ns) at this exceptional bus speed. Using the 3:2 divider and loosening the primary memory timings to 5-5-5-12 allowed us to capture some of the best DDR2 memory bandwidth benchmarks attainable on an Intel platform. As expected, our choice of tRD plays a crucial role in enabling these exceptional results. Screenshots from EVEREST show just how big a difference tRD can make - we have included shots using tRD values of 7, 6, and 5.







A considerable share of the memory read performance advantage that AMD-based systems have over Intel-based systems can be directly attributed to the lower memory latency times made possible by the design of the AMD processor's on-die memory controller. So far we have done a lot to show you why reducing TRD to a lower level can make such a positive impact on performance; knowing this you might tend to believe that the optimal value would be about zero, and you would be right. Eliminating the latency associated with the MCH Read Delay would further reduce total system memory read latency by another 12.5ns (as modeled by the results above).

Given this, Intel-based systems would perform memory read operations about on par with the last generation AMD-based systems. Although not the only reason, this is one of the main motivations behind Intel's decision to finally migrate to a direct point-to-point bus interface not unlike that which has been historically attributed to AMD. Removing the middleman in each memory access operation will do wonders for performance when Intel's next step in 45nm process technology, codenamed Nehalem, hits the shelves in ~Q4'08. Until then we'll have to try to do the best with what we've got.

MCH Read Delay Scaling and Default tRD Settings for Each Strap The Rules of Working with tRD: What's Allowed and What Isn't
Comments Locked

73 Comments

View All Comments

  • Bozo Galora - Friday, January 25, 2008 - link

    Yet another world class article by Mr. Boughton
    Not only do you give the insight, but you make it easily UNDERSTANDABLE.
    You da man
  • AndyKH - Friday, January 25, 2008 - link

    Also... is this tRD adjustment only possible with a X48 board? If not, I would have preferred that this article was kept seperate from an article about a specific motherboard. Don't get me wrong, I think it is a very informative article :-).
    If it is possible to adjust the tRD on other chipsets than the X48, can the possibility of setting the tRD as low as 5 then be attributed to the X48?
  • Gary Key - Friday, January 25, 2008 - link

    tRD functionality within the BIOS is dependent upon the motherboard manufacturer. We have been harping on the motherboard suppliers to fully open up the BIOS on the enthusiast boards, this includes tRD and associated phase changes. ASUS is one of the first (DFI also) to offer an extensive range of settings in this particular area (most BIOS releases handle tRD adjustments automatically). We debated on separating the article content but due to the BIOS options available, they were more or less tied to each other. Yes, if tRD is available in the BIOS, it can be set on other Intel based boards or chipsets. In fact, I had very good success on the ASUS 780i board with tRD adjustments. Thanks for the comments! :)
  • Georgeisdead - Wednesday, February 27, 2008 - link

    Would tRD be called something else? Perhaps Read to Write Delay (tRWD)? I have an EVGA 680i board and I cannot find the tRD setting. I don't even see it as an available option with memset 3.4. Does anyone know of a synonym for tRD?
  • Brunnis - Friday, January 25, 2008 - link

    The Gigabyte GA-P35-DS3 has a BIOS option to set tRD and I seem to remember that it had a large effect on memory performance. Would this be the setting that you talk about here. If it is, it seems ASUS isn't the first one to offer it.
  • Shoal07 - Friday, January 25, 2008 - link

    Can anyone confirm you can set the tRD to anything besides innoculous settings like "auto" "high" and "low" on the GA-P35-DS3, and specify if its the L or R? Also, what memory was used in this test? (I read the whole article and I don't recall the specs of the system/testbed as a whole).
  • Brunnis - Friday, January 25, 2008 - link

    I have checked my GA-P35-DS3 again. The option is labeled "Static tRead Value" in the BIOS and can be set to any integer value between 1 and 31. Modifying this value changes the "Performance Level" as reported by the Windows program MemSet 3.4 accordingly. Changing the value from 8 to 7 on my board yielded the following results in Sisoft Sandra bandwidth benchmark:

    tRD 7: 7117 / 7139 (MB/s)
    trD 8: 7026 / 7045 (MB/s)

    Pretty large different from changing a single timing one step.
  • AndyKH - Friday, January 25, 2008 - link

    Is it correctly understood that no other motherboards allow the tRD to be adjusted from within the BIOS, or is it simply because this board has named the setting something sensible? I think the article is a bit unclear about that.
  • legoman666 - Friday, January 25, 2008 - link

    Very enlightening article. The only thing missing are real world application tests showing the benefits in office applications, games (most important ;) ), and encoding.
  • Gary Key - Friday, January 25, 2008 - link

    We will have full application benchmarks in the X48 roundup that Kris and Raja are working on.

Log in

Don't have an account? Sign up now