Memory Timings and Bandwidth Explained

With that brief overview of the memory subsystem, we are ready to talk about memory timings. There are usually four and sometimes five timings listed with memory. They are expressed as a set of numbers, e.g. 2-3-2-7, corresponding to CAS-tRCD-tRP-tRAS. On modules that list a fifth number, it is usually the CMD value, e.g. 1T. Some might also include a range for the tRAS value. These are really only a small subset of the total number of timing figures that memory companies use, but they tend to be the more important ones and encapsulate the other values. So, what does each setting mean? By referring back to the previous sections on how memory is accessed, we can explain where each value comes into play.

The most common discussion on timing is the CAS Latency, or CL value. CAS stands for Column Access Strobe. This is the number of memory cycles that elapse between the time a column is requested from an active page and the time that the data is ready to begin bursting across the bus. This is the most common occurrence, and so, CAS Latency generally has the largest impact on overall memory performance for applications that depend on memory latency. Applications that depend on memory bandwidth do not care as much about CAS latency, though. Of course, there are other factors that come into play, as our tests with OCZ 3500EB RAM have shown that a well designed CL2.5 RAM can keep up with and sometimes even outperform CL2 RAM. Note that purely random memory accesses will stress the other timings more than the CL, as there is little spatial locality in that case. Random memory access is not typical for general computing, which explains why theoretical memory benchmarks that use it as a performance metric frequently have little to no correlation with real world performance.

The next value is tRCD, which is referred to as the RAS to CAS Delay. This is the delay in memory cycles between the time a row is activated and when a column of data within the row can actually be requested. It comes into play when a request arrives for data that is not in an active row, so it occurs less frequently than CL and is generally not as important. As mentioned a moment ago, certain applications and benchmarks can have different memory access patterns, though, which can make tRCD more of a factor.

The term tRP stands for the time for RAS Precharge, which can be somewhat confusing. Time for a Row Precharge is another interpretation of the term and explains the situation better. tRP is the time in memory cycles that is required to flush an active row out of the sense amp ("cache") before a new row can be requested. As with tRCD, this only comes into play when a request is made to an inactive row.

Moving on, we have the tRAS - or more properly tRASmin - which is the minimum time that a row must remain active before a new row within that bank can be activated. In other words, after a row is activated, it cannot be closed and another row in the same bank be opened until a minimum amount of time (tRASmin) has elapsed. This is why having more memory banks can help to improve memory performance, provided it does not slow down other areas of the memory. There is less chance that a new page/row will need to be activated in a bank for which tRASmin has not elapsed. Taken together, tRP and tRAS are also referred to as the Row Cycle time (tRC), as they occur together.

CMD is the command rate of the memory. The command rate specifies how many consecutive clock cycles that commands need to be presented to the DRAMs before the DRAMs sample the address and command bus wires. The package of the memory controller, the wires of the address and command buses, and the package of the DRAM all have some electrical capacitance. As electrical 1's and 0's in the commands are sent from the memory controller to the DRAMs, the capacitance of these (and other) elements of the memory system slow the rate at which an electrical transition between a 1 and a 0 (and vice versa) can occur. At ever-increasing memory bus clock speeds, the clock period shrinks, meaning that there is less time available for the transition between a 1 and a 0 (and vice versa) to occur. Because of the way that addresses and commands are routed to DRAMs on memory modules, the total capacitance on these wires may be so high that transitions between 1 and 0 cannot occur reliably in only one clock cycle. For this reason, commands may need to be sent for 2 consecutive clock cycles so that they can be assured of settling to their appropriate values before the DRAMs take action. A 2T command rate means that commands are presented for 2 consecutive clocks to the DRAMs. In some implementations, command rate is always 1T, while in others, it may be either 1T or 2T. On DDR/DDR2, for instance, using high-quality memory modules (which cost a little more) and/or reducing the number of memory modules on each channel can allow 1T command rates. If you are wondering how the command rate can impact performance, that explanation will have hopefully made it clear that CMD can be just as important as CL. Every memory access will incur the CMD and CL delays, so removing one memory clock cycle from each benefits every memory access.

In addition to all of these timings, the question of memory bandwidth still remains. Bandwidth is the rate at which data can be sent from the DRAMs over the memory bus. Lower timings allow faster access to the data, while higher bandwidth allows access to more data. Applications that access large amounts of data - either sequentially or randomly - usually benefit from increased bandwidth. Bandwidth can be increased either by increasing the number of memory channels (i.e. dual-channel) or by increasing the clock speed of the memory. Doubling memory bandwidth will never lead to a doubling of actual performance except in theoretical benchmarks, but it could provide a significant boost in performance. Many games and multimedia benchmarks process large amounts of data that cannot reside within the cache of the CPU, and being able to retrieve the data faster can help out. All other things being equal, more bandwidth will never hurt performance.

It is important to make clear that this is only a very brief overview of common RAM timings. Memory is really very complex, and stating that lower CAS Latencies and higher bandwidths are better is a generalization. It compares to stating that "larger caches and higher clock speeds are better" in the CPU realm. This is often true, but there are many other factors that come into play. For CPUs, we also need to consider pipeline lengths, number of in-flight instructions, specific instruction latencies, number and type of execution units, etc. RAM has numerous other timings that can come into play, and the memory controller, FSB, and many other influences can also affect the resulting performance and efficiency of a system. Some people might think that designing memory is relatively simple compared to working on CPUs, but especially with rising clock speeds, this is not the case.

Design Considerations I'm late, I'm late for a very important date!
Comments Locked

22 Comments

View All Comments

  • ariafrost - Tuesday, September 28, 2004 - link

    Good choice. You really don't want to get generic RAM... it is generally slow, unstable, and gives you the much-hated BSOD... I've only bought CAS 2 RAM (Corsair XMS) but I may consider buying some CAS 2.5 if the price delta isn't too great.
  • IKnowNothing - Tuesday, September 28, 2004 - link

    It's like you read my mind. I'm purchasing an Athlon 64 3500+ and wasn't sure if I should purchase generic RAM or high performance RAM.

    Cheers.

Log in

Don't have an account? Sign up now