Real-World Results: What Does a Lower tRD Really Provide?

Up until this point we have spent a lot of time writing about the "performance improvement" available by changing just tRD. First, let's define the gain: lower tRD settings result in lower associated TRD values (at equivalent FSB clocks), which allow for a lower memory read latency time, ultimately providing a higher memory read speed (MB/s). Exactly how a system tends to respond to this increase in available bandwidth remains to be seen, as this is largely dependent on just how sensitive the application/game/benchmark is to variations in memory subsystem performance. It stands to reason that more bandwidth and lower latencies cannot possibly be a bad thing, and we have yet to encounter a situation in which any improvement (i.e. decrease) in tRD has ever resulted in lower observed performance.

EVEREST - a popular diagnostics, basic benchmarking, and system reporting program - gives us a means for quantifying the change in memory read rates experienced when directly altering tRD though the use of its "Cache & Memory Benchmark" tool. We have collected these results and present them below for your examination. The essential point to remember when reviewing these figures is that all of this data was collected using memory speeds and settings well within the realm of normal achievement - an FSB of 400MHz using a 5:4 divider for DDR2-1000 with 4-4-4-10 primary timings at a Command Rate of 2N. The only change made between data collection runs was a modification to tRD.


Memory
Read Bandwidth - Variable tRD

Using the default tRD of 12, our system was able to reach a maximum memory read bandwidth value of 7,597 MB/s - a predictable result considering the rather relaxed configuration. Tightening tRD all the way to a setting of 5 provides us with dramatically different results: 9,166 MB/s, more than 20% higher total throughput! Keep in mind that this was done completely independent of any memory setting adjustment. There is a central tenet of this outcome: because the MCH is solely responsible for delivering the additional performance gains, this concept can be applied to any system, regardless of memory type or quality.

The next graph shows how memory access (read) latency changes with each tRD setting. As we can see, the values march steadily down as we continue to lower tRD. We can also note that the change in latency between any two successive steps is always about 2.5ns, the Tcycle value for 400MHz FSB and the expected equivalent change in TRD for a drop in tRD of one. No other single memory-related performance setting has the potential to influence a reduction in read latency of this magnitude, not even the primary memory timings, making tRD unique in this respect. For this reason, tRD is truly the key to unlocking hidden memory performance, much more so than the primary memory timings traditionally associated with latencies.


Memory
Read Latency - Variable tRD

We realized our best performance by pushing the MCH well beyond its specified range of operation. Not only were we able overclock the controller to 450MHz FSB but we also managed to maintain a tRD of 5 (for a TRD of about 11.1ns) at this exceptional bus speed. Using the 3:2 divider and loosening the primary memory timings to 5-5-5-12 allowed us to capture some of the best DDR2 memory bandwidth benchmarks attainable on an Intel platform. As expected, our choice of tRD plays a crucial role in enabling these exceptional results. Screenshots from EVEREST show just how big a difference tRD can make - we have included shots using tRD values of 7, 6, and 5.







A considerable share of the memory read performance advantage that AMD-based systems have over Intel-based systems can be directly attributed to the lower memory latency times made possible by the design of the AMD processor's on-die memory controller. So far we have done a lot to show you why reducing TRD to a lower level can make such a positive impact on performance; knowing this you might tend to believe that the optimal value would be about zero, and you would be right. Eliminating the latency associated with the MCH Read Delay would further reduce total system memory read latency by another 12.5ns (as modeled by the results above).

Given this, Intel-based systems would perform memory read operations about on par with the last generation AMD-based systems. Although not the only reason, this is one of the main motivations behind Intel's decision to finally migrate to a direct point-to-point bus interface not unlike that which has been historically attributed to AMD. Removing the middleman in each memory access operation will do wonders for performance when Intel's next step in 45nm process technology, codenamed Nehalem, hits the shelves in ~Q4'08. Until then we'll have to try to do the best with what we've got.

MCH Read Delay Scaling and Default tRD Settings for Each Strap The Rules of Working with tRD: What's Allowed and What Isn't
Comments Locked

73 Comments

View All Comments

  • Vikendios - Thursday, January 31, 2008 - link

    Very Interesting. But I believe that AT is also guilty of perpetuating the chipset/multiple GPU incompatibility (or non-optimization) myths, by not giving us systematic reviews of X38/48 and 680/790i using both ATI and Nvidia twinned cards.

    And if some BIOS adjustments or driver updates are becessary to twin Nvidia cards under Intel chipsets, or ATI/AMD cards under Nvidia's, kindly tell and guide us.

    I'm not a conspiration theorist, but I think there is more than meets the eye in the present situation.

    The apparent paradox of Intel (chipsets) pushing AMD (Crossfire) solutions is just marketing cycle hysteresis from the days when ATI was still an independent canadian company.

    But both Intel and AMD resent video card chip manufacturers forcing their way into hard-wired motherboard real estate thru the multiple GPU concept, with attendant slot and chipset modifications. With the demise of Via, Intel and AMD believe they can own the chipsets, as long as the motherboard manufacturers are only assemblers.

    For Nvidia, multiple GPU is an easy way to extend the life of a good graphic chip until the next generation comes up, but mostly it provides for a temporary proprietary claim on the motherboard design. 3dfx first tried that years ago in Voodoo days and it worked. It worked again when ATI couldn't follow up fast enough on SLI and had to fall in AMD's arms.

    Nvidia gambled that SLI would allow it to impose its own chipset business, either by technical or marketing (SLI endorsment) means. What next ? Special gaming CPU's ? That's a dangerous taunt, although Intel doesn't yet dare buy them, or compete directly with them with their own GPU's, out of anti-trust concerns in Brussels.



  • Holly - Wednesday, January 30, 2008 - link

    Excelent description of memory timing magic. Thumbs up :-)
  • FSBastrd - Tuesday, January 29, 2008 - link

    I may have come off a little brash with my first comment. The article is pretty sweet, and I was able to read through it without the pictures, but that doesn't mean I wouldn't like to view them. It's not just this article either. Pictures pretty much never load on this website for me.
  • kjboughton - Wednesday, January 30, 2008 - link

    Do you run some type of ad blocker? It may be causing problems by incorrectly blocking images from our servers...
  • FSBastrd - Wednesday, January 30, 2008 - link

    I'm basically running a stock version of Firefox, so no. Ironically, the ads are just about the only pictures that do load for me. Also, all of the picture for the AnandTech homepage load for me, it's just the pics in the articles This is the only website that really gives me problems. One last thing, some (rare) pictures do load for me from the articles. All in all, it's quite strange, and I can't figure it out.
  • FSBastrd - Tuesday, January 29, 2008 - link

    Am I the only one who can't get pictures to load from this site. It would sure make this article a whole lot easier to follow along.
  • sje123 - Tuesday, January 29, 2008 - link

    Excellent review as ever!

    Quick question with regard to Watercooling blocks for this board. It looks more or less identical to the X38 apart from the different chip in the NB, therefore I'm wondering if you could tell me whether or not you think an ASUS X38 NB block would also fit the ASUS X48 Rampage?

    is the NB under the cooler the same size etc and are the mouting screws in the same position as the X38 eg the Maximus?

    THe SB and the mofset coolers will be the same as the Maximus.
  • snarfbot - Sunday, January 27, 2008 - link

    alright, pretty exciting results here.

    at trd of 8 (default) at 400mhz 1:1 cas 4, i got 7687mb/s read, and 64ns latency in everest.

    at trd of 6 at the same speed, divider and cas setting i got 8089mb/s read, and 59.8ns latency.

    then just for fun i bumped the speed upto 500 and loosened the timings to cas 5, at 5:4, i left the trd at 6. at these settings i got 8640mb/s read, and 57.5 latency.

    the latency suprised me, as the trd remained the same, and i actually loosened the cas latency.

    anyways pretty good results.

    processor is a e2140@3200mhz.
  • snarfbot - Sunday, January 27, 2008 - link

    alright, i have a ga-p35-ds3l. im running the fsb at 400, memory at 1:1 cas 4.

    i set trd to 6 in the bios. based on the formula, it shouldnt even post.

    trd(6) - tcl(4)/n(1) =fsb400(2)/1
    2=2

    im gonna run through sandra and see what the difference is, if there is any, or perhaps this setting doesnt work correctly on this board.
  • Fyl - Sunday, January 27, 2008 - link

    not to lower the merits of this great article but since I've read it I've been experimenting on my machine different settings and for some of them your formula doesn't seem to stand; here's an example of a stable configuration, no overvoltage to anything:

    E8500@3.6 (400MHzx9)
    P35-DS4 (tRD 7)
    2x2G DDR2 800 (400MHz, 5-5-5-12)

    based on your formula N = 400:400 = 1 and x = 2
    therefore 7-(5/1) > 2/1 => 2 > 2 => false but actually working

    am I missing anything?

Log in

Don't have an account? Sign up now