Real-World Results: What Does a Lower tRD Really Provide?

Up until this point we have spent a lot of time writing about the "performance improvement" available by changing just tRD. First, let's define the gain: lower tRD settings result in lower associated TRD values (at equivalent FSB clocks), which allow for a lower memory read latency time, ultimately providing a higher memory read speed (MB/s). Exactly how a system tends to respond to this increase in available bandwidth remains to be seen, as this is largely dependent on just how sensitive the application/game/benchmark is to variations in memory subsystem performance. It stands to reason that more bandwidth and lower latencies cannot possibly be a bad thing, and we have yet to encounter a situation in which any improvement (i.e. decrease) in tRD has ever resulted in lower observed performance.

EVEREST - a popular diagnostics, basic benchmarking, and system reporting program - gives us a means for quantifying the change in memory read rates experienced when directly altering tRD though the use of its "Cache & Memory Benchmark" tool. We have collected these results and present them below for your examination. The essential point to remember when reviewing these figures is that all of this data was collected using memory speeds and settings well within the realm of normal achievement - an FSB of 400MHz using a 5:4 divider for DDR2-1000 with 4-4-4-10 primary timings at a Command Rate of 2N. The only change made between data collection runs was a modification to tRD.


Memory
Read Bandwidth - Variable tRD

Using the default tRD of 12, our system was able to reach a maximum memory read bandwidth value of 7,597 MB/s - a predictable result considering the rather relaxed configuration. Tightening tRD all the way to a setting of 5 provides us with dramatically different results: 9,166 MB/s, more than 20% higher total throughput! Keep in mind that this was done completely independent of any memory setting adjustment. There is a central tenet of this outcome: because the MCH is solely responsible for delivering the additional performance gains, this concept can be applied to any system, regardless of memory type or quality.

The next graph shows how memory access (read) latency changes with each tRD setting. As we can see, the values march steadily down as we continue to lower tRD. We can also note that the change in latency between any two successive steps is always about 2.5ns, the Tcycle value for 400MHz FSB and the expected equivalent change in TRD for a drop in tRD of one. No other single memory-related performance setting has the potential to influence a reduction in read latency of this magnitude, not even the primary memory timings, making tRD unique in this respect. For this reason, tRD is truly the key to unlocking hidden memory performance, much more so than the primary memory timings traditionally associated with latencies.


Memory
Read Latency - Variable tRD

We realized our best performance by pushing the MCH well beyond its specified range of operation. Not only were we able overclock the controller to 450MHz FSB but we also managed to maintain a tRD of 5 (for a TRD of about 11.1ns) at this exceptional bus speed. Using the 3:2 divider and loosening the primary memory timings to 5-5-5-12 allowed us to capture some of the best DDR2 memory bandwidth benchmarks attainable on an Intel platform. As expected, our choice of tRD plays a crucial role in enabling these exceptional results. Screenshots from EVEREST show just how big a difference tRD can make - we have included shots using tRD values of 7, 6, and 5.







A considerable share of the memory read performance advantage that AMD-based systems have over Intel-based systems can be directly attributed to the lower memory latency times made possible by the design of the AMD processor's on-die memory controller. So far we have done a lot to show you why reducing TRD to a lower level can make such a positive impact on performance; knowing this you might tend to believe that the optimal value would be about zero, and you would be right. Eliminating the latency associated with the MCH Read Delay would further reduce total system memory read latency by another 12.5ns (as modeled by the results above).

Given this, Intel-based systems would perform memory read operations about on par with the last generation AMD-based systems. Although not the only reason, this is one of the main motivations behind Intel's decision to finally migrate to a direct point-to-point bus interface not unlike that which has been historically attributed to AMD. Removing the middleman in each memory access operation will do wonders for performance when Intel's next step in 45nm process technology, codenamed Nehalem, hits the shelves in ~Q4'08. Until then we'll have to try to do the best with what we've got.

MCH Read Delay Scaling and Default tRD Settings for Each Strap The Rules of Working with tRD: What's Allowed and What Isn't
Comments Locked

73 Comments

View All Comments

  • Orthogonal - Friday, January 25, 2008 - link

    Just so I understand this correctly, due to the path the data and clocks must travel throughout the devices as explained on page 5, even though you can increase the bandwidth of the Memory modules, the MCH is ultimately the "bottleneck". Historically we falsely assumed higher bandwidth and lower CAS latency translated to better data throughput, but since tRD increased along with it, it was essentially wiped out or unused bandwidth. Now we try to lower tRD as low as possible to reduce MCH latency as it performs the "Clock crossing procedure", which is why the 400Mhz FSB with the lowest tRD latency gives the best data throughput.

    Also, does this mean that in your "Best Pick" DDR2 configuration summary that the two A+ choices highlighted in Green will effectively result in about the same performance since even though DDR2-1200 has more bandwidth than DDR2-1000, since the tRD=5, they will have the same Trd Delay (12.5ns).
  • Aivas47a - Friday, January 25, 2008 - link

    I'm glad to see Asus implementing these new memory phase adjustment options in the bios. Now if they would provide a greater ability to fine-tune GTL reference voltages I would be a happy camper. GTL is a key setting for quad core overclocking success as Raja has helpfully explained in his DFI P35 review. The selectable percentages Asus currently provides are too crude and don't go high enough.
  • mrlobber - Friday, January 25, 2008 - link

    FCG, your article just flat out rocks, thanks for this one, we needed it badly :)

    One question about the previous Asus boards: X38 and also P35, which lack the exact tRD manipulation, providing the Transaction Booster stuff instead. As far as I understand, your analysis about the default tRD values set by different default fsb and memory divider combinations could also be used to determine the starting tRD value at least for the X38 chipset as well in a pretty straightforward way, and from that point being able to offset the tRD with Transaction Booster up or down to control it as necessary? (P35 would have different default tRD's, but the underlying principles should stay the same?)

    And, by making appropriate changes in x values if needed, your POST / no POST inequality should stay applicable as well, right?
  • kjboughton - Friday, January 25, 2008 - link

    All true, although we did talk about how these straps at one time had default tRD values associated with them, the difference has become that these default values are now usually based on the real underlying requirements, such as FSB. Now, exactly how each motherboard vendor sets up and implements this value has a lot to do with how their motherboard falls out in comparison testing. With that being said, boards that perform better generally make use of lower tRD values by default. And because X48 is a speed-binned version of X38, which is superior to P35 with it comes to MCH overclocking, it is also safe to say that the higher-end chipsets will allower the same (or lower) tRD values at FSB levels where the other chipsets may fall flat on their faces. Make sense?

    Regarding the 'Test POST Equation' - absolutely, I know those equations to be true for X38/X48 but I wouldn't doubt if they ended up being exactly the same for say, P35. A little bit of testing should validate this assumption... ;)
  • Orthogonal - Friday, January 25, 2008 - link

    Can we expect a similar analysis and optimization of strappings, timings etc... when an X48 DDR3 compatible board is released?
  • kjboughton - Friday, January 25, 2008 - link

    Yes, the will be an easy bridge to make. DDR3 is very similar to DDR2 and in a lot of respects is a simply extension of the logic already developed. In any case, we will provide this information for reference when the time comes.
  • daddyo323 - Friday, January 25, 2008 - link

    I've overclocked a couple cpus before, and each time, I had stability problems due to memory.

    I have built many systems, but since gave up on overclocking... these new Cores and chipsets look like they were made for it...

    My question is, was that CPU stable at 4ghz, and could we have a chart on which settings to set, exactly... I wonder how far we can push this platform with the air cooling.
  • kjboughton - Friday, January 25, 2008 - link

    Everything you want to know, about more, about this CPU can be seen here: http://www.anandtech.com/cpuchipsets/intel/showdoc...">http://www.anandtech.com/cpuchipsets/intel/showdoc...

    We used the same CPU that can be read about in the above review. The short answer is yes, we were completely stable at 4GHz with just 1.28V real under load.

    Cheers,
    Kris
  • Quiksilver - Friday, January 25, 2008 - link

    Has there been an ETA on the release date of the X48 chipset? I thought they were supposed to come out in December but they never appeared and this would be the second X48 preview I've seen for AT. Also I remember seeing a flow chart somewhere that had DDR2 & DDR3 being the differences between X38 and X48 of which X38 had both but now it seems X48 has DDR2 as well but will the DDR2 boards be available at launch or are they coming later on?
  • Gary Key - Friday, January 25, 2008 - link

    ASUS is telling us mid-February for the X48 launch now. Gigabyte and MSI have confirmed that also, but we have had dates confirmed about a dozen times over the last two months and it always seems to change about three days before the next "official" launch. ;)

Log in

Don't have an account? Sign up now