Real-World Results: What Does a Lower tRD Really Provide?

Up until this point we have spent a lot of time writing about the "performance improvement" available by changing just tRD. First, let's define the gain: lower tRD settings result in lower associated TRD values (at equivalent FSB clocks), which allow for a lower memory read latency time, ultimately providing a higher memory read speed (MB/s). Exactly how a system tends to respond to this increase in available bandwidth remains to be seen, as this is largely dependent on just how sensitive the application/game/benchmark is to variations in memory subsystem performance. It stands to reason that more bandwidth and lower latencies cannot possibly be a bad thing, and we have yet to encounter a situation in which any improvement (i.e. decrease) in tRD has ever resulted in lower observed performance.

EVEREST - a popular diagnostics, basic benchmarking, and system reporting program - gives us a means for quantifying the change in memory read rates experienced when directly altering tRD though the use of its "Cache & Memory Benchmark" tool. We have collected these results and present them below for your examination. The essential point to remember when reviewing these figures is that all of this data was collected using memory speeds and settings well within the realm of normal achievement - an FSB of 400MHz using a 5:4 divider for DDR2-1000 with 4-4-4-10 primary timings at a Command Rate of 2N. The only change made between data collection runs was a modification to tRD.


Memory
Read Bandwidth - Variable tRD

Using the default tRD of 12, our system was able to reach a maximum memory read bandwidth value of 7,597 MB/s - a predictable result considering the rather relaxed configuration. Tightening tRD all the way to a setting of 5 provides us with dramatically different results: 9,166 MB/s, more than 20% higher total throughput! Keep in mind that this was done completely independent of any memory setting adjustment. There is a central tenet of this outcome: because the MCH is solely responsible for delivering the additional performance gains, this concept can be applied to any system, regardless of memory type or quality.

The next graph shows how memory access (read) latency changes with each tRD setting. As we can see, the values march steadily down as we continue to lower tRD. We can also note that the change in latency between any two successive steps is always about 2.5ns, the Tcycle value for 400MHz FSB and the expected equivalent change in TRD for a drop in tRD of one. No other single memory-related performance setting has the potential to influence a reduction in read latency of this magnitude, not even the primary memory timings, making tRD unique in this respect. For this reason, tRD is truly the key to unlocking hidden memory performance, much more so than the primary memory timings traditionally associated with latencies.


Memory
Read Latency - Variable tRD

We realized our best performance by pushing the MCH well beyond its specified range of operation. Not only were we able overclock the controller to 450MHz FSB but we also managed to maintain a tRD of 5 (for a TRD of about 11.1ns) at this exceptional bus speed. Using the 3:2 divider and loosening the primary memory timings to 5-5-5-12 allowed us to capture some of the best DDR2 memory bandwidth benchmarks attainable on an Intel platform. As expected, our choice of tRD plays a crucial role in enabling these exceptional results. Screenshots from EVEREST show just how big a difference tRD can make - we have included shots using tRD values of 7, 6, and 5.







A considerable share of the memory read performance advantage that AMD-based systems have over Intel-based systems can be directly attributed to the lower memory latency times made possible by the design of the AMD processor's on-die memory controller. So far we have done a lot to show you why reducing TRD to a lower level can make such a positive impact on performance; knowing this you might tend to believe that the optimal value would be about zero, and you would be right. Eliminating the latency associated with the MCH Read Delay would further reduce total system memory read latency by another 12.5ns (as modeled by the results above).

Given this, Intel-based systems would perform memory read operations about on par with the last generation AMD-based systems. Although not the only reason, this is one of the main motivations behind Intel's decision to finally migrate to a direct point-to-point bus interface not unlike that which has been historically attributed to AMD. Removing the middleman in each memory access operation will do wonders for performance when Intel's next step in 45nm process technology, codenamed Nehalem, hits the shelves in ~Q4'08. Until then we'll have to try to do the best with what we've got.

MCH Read Delay Scaling and Default tRD Settings for Each Strap The Rules of Working with tRD: What's Allowed and What Isn't
Comments Locked

73 Comments

View All Comments

  • poohbear - Friday, January 25, 2008 - link

    one thing i liked about some of the recent high end mobo releases was the inclusion of an onboard wi-fi chip on a desktop mobo, but this mobo seems to be lacking that. i mean, they threw in everything but the kitchen sink, why not include wi-fi?:(
  • TheDoc9 - Friday, January 25, 2008 - link

    One of the best I've read here, definitely one of the best on over clocking I've ever read. It takes it to the next level, reminded me of how a body builder friend of mine schedules and calculates his workouts, calories, and entire life to be the best he can be. Hope to see more like this one in the future.
  • jimru22 - Friday, January 25, 2008 - link

    The article references the use of an Intel Extreme processor with adjustable multiplier. I'm planning on building a system hopefully anchored by the Asus Rampage Formula and a Intel Q9450 with locked 8X multiplier. Based in the charts, it seems to me that in order to run the Q9450 (333 MHZ) at 3.6 MHZ a 450 MHZ FSB is required. Therefore in this case, a tRD of 6 / Trd 13.3ns is the optimum value. Is this correct?
  • kjboughton - Friday, January 25, 2008 - link

    You would be correct. Processors with lower maximum multipliers present somewhat of a challenge when selecting the best memory configuration. In this case the 8x multiplier forces a higher than normally desired FSB, which is one of the many benefits of owning an Extreme processor (no such limitation). As such, the next best option, and the first choice for you, would be to go to 450MHz FSB and set a tRD of 6. Although this might not be completely ideal (we like to stick with 400MHz) your results will without a doubt be within a few percent of real-world performance at 400MHz FSB and a tRD of 5. Yet another reason why the Extreme line of processors are worth their price.
  • Odeen - Saturday, January 26, 2008 - link

    I'd like to differ on that.. As someone who first discovered overclocking during the Celeron 300A days, where a budget chip could run at 50-60% faster than its stock speed, and deliver higher performance than a $400 (at release time) Pentium III 450MHz, all without overstressing the rest of the platform (i.e. with bog-standard FSB and memory speed) I view overclocking as two ratios:
    Maximum attainable clock speed / original clock speed. 3:2 is the minimum ratio that isn't depressing to see booting up.
    Cost of equivalent performance from a processor w/o overclocking / cost of actual processor. In the case the ratio was 4:1. Some of the best-case scenarios (like the very last 300A's being 100% overclockable to 600mhz), the ratio can be 6-7:1.

    The Black Edition CPU's fail both value tests tests, because they are typically ONLY available at the fastest speed grades. Therefore, they are unlikely to reach a 30% overlock, never mind the requisite 50. And, being the most expensive SKU in the class combined with the lackluster overclock potential means that they are unlikely to outperform a processor that costs 4x as much (even an imaginary SKU that fits on the price-performance regression line of the class).

    That said, if the Wolfdale E8190 is $130 and Intel somehow offers an "enthusiast edition" of it for $180 (that is, an edition for true enthusiasts, who want to extract the maximum bang for their buck), I would get one - the unlocked multiplier would make overclocking less of a "platform" issue (i.e. "how fast will the chip go until my motherboard peters out") and more of "how fast will this particular chip go period". I can definitely get behind that.
  • jimru22 - Friday, January 25, 2008 - link

    Thank you Kris for the outstanding article as well as your response.

    Kind regards,
    Jim
  • Orthogonal - Friday, January 25, 2008 - link

    What are the chances someone could whip up an Excel Macro to incorporate all these inputs, equations and graphs for easy computation of optimal settings for a given CPU and Memory configuration.
  • kjboughton - Friday, January 25, 2008 - link

    Already exists, although you'll have to sweet-talk me into releasing the file. Seriously though, the Excel spreadsheet makes choosing the right settings downright simple.
  • Orthogonal - Friday, January 25, 2008 - link

    Fair enough, pretty please!

    Well maybe there could atleast be a web applet on the site or something of the sort. That would be killer.
  • LoneWolf15 - Friday, January 25, 2008 - link

    Just one thought...IMO, no "Board Layout" portion of a review is complete without a picture of the port cluster on the back of the board.

Log in

Don't have an account? Sign up now