Memory Timings and Bandwidth Explained

With that brief overview of the memory subsystem, we are ready to talk about memory timings. There are usually four and sometimes five timings listed with memory. They are expressed as a set of numbers, e.g. 2-3-2-7, corresponding to CAS-tRCD-tRP-tRAS. On modules that list a fifth number, it is usually the CMD value, e.g. 1T. Some might also include a range for the tRAS value. These are really only a small subset of the total number of timing figures that memory companies use, but they tend to be the more important ones and encapsulate the other values. So, what does each setting mean? By referring back to the previous sections on how memory is accessed, we can explain where each value comes into play.

The most common discussion on timing is the CAS Latency, or CL value. CAS stands for Column Access Strobe. This is the number of memory cycles that elapse between the time a column is requested from an active page and the time that the data is ready to begin bursting across the bus. This is the most common occurrence, and so, CAS Latency generally has the largest impact on overall memory performance for applications that depend on memory latency. Applications that depend on memory bandwidth do not care as much about CAS latency, though. Of course, there are other factors that come into play, as our tests with OCZ 3500EB RAM have shown that a well designed CL2.5 RAM can keep up with and sometimes even outperform CL2 RAM. Note that purely random memory accesses will stress the other timings more than the CL, as there is little spatial locality in that case. Random memory access is not typical for general computing, which explains why theoretical memory benchmarks that use it as a performance metric frequently have little to no correlation with real world performance.

The next value is tRCD, which is referred to as the RAS to CAS Delay. This is the delay in memory cycles between the time a row is activated and when a column of data within the row can actually be requested. It comes into play when a request arrives for data that is not in an active row, so it occurs less frequently than CL and is generally not as important. As mentioned a moment ago, certain applications and benchmarks can have different memory access patterns, though, which can make tRCD more of a factor.

The term tRP stands for the time for RAS Precharge, which can be somewhat confusing. Time for a Row Precharge is another interpretation of the term and explains the situation better. tRP is the time in memory cycles that is required to flush an active row out of the sense amp ("cache") before a new row can be requested. As with tRCD, this only comes into play when a request is made to an inactive row.

Moving on, we have the tRAS - or more properly tRASmin - which is the minimum time that a row must remain active before a new row within that bank can be activated. In other words, after a row is activated, it cannot be closed and another row in the same bank be opened until a minimum amount of time (tRASmin) has elapsed. This is why having more memory banks can help to improve memory performance, provided it does not slow down other areas of the memory. There is less chance that a new page/row will need to be activated in a bank for which tRASmin has not elapsed. Taken together, tRP and tRAS are also referred to as the Row Cycle time (tRC), as they occur together.

CMD is the command rate of the memory. The command rate specifies how many consecutive clock cycles that commands need to be presented to the DRAMs before the DRAMs sample the address and command bus wires. The package of the memory controller, the wires of the address and command buses, and the package of the DRAM all have some electrical capacitance. As electrical 1's and 0's in the commands are sent from the memory controller to the DRAMs, the capacitance of these (and other) elements of the memory system slow the rate at which an electrical transition between a 1 and a 0 (and vice versa) can occur. At ever-increasing memory bus clock speeds, the clock period shrinks, meaning that there is less time available for the transition between a 1 and a 0 (and vice versa) to occur. Because of the way that addresses and commands are routed to DRAMs on memory modules, the total capacitance on these wires may be so high that transitions between 1 and 0 cannot occur reliably in only one clock cycle. For this reason, commands may need to be sent for 2 consecutive clock cycles so that they can be assured of settling to their appropriate values before the DRAMs take action. A 2T command rate means that commands are presented for 2 consecutive clocks to the DRAMs. In some implementations, command rate is always 1T, while in others, it may be either 1T or 2T. On DDR/DDR2, for instance, using high-quality memory modules (which cost a little more) and/or reducing the number of memory modules on each channel can allow 1T command rates. If you are wondering how the command rate can impact performance, that explanation will have hopefully made it clear that CMD can be just as important as CL. Every memory access will incur the CMD and CL delays, so removing one memory clock cycle from each benefits every memory access.

In addition to all of these timings, the question of memory bandwidth still remains. Bandwidth is the rate at which data can be sent from the DRAMs over the memory bus. Lower timings allow faster access to the data, while higher bandwidth allows access to more data. Applications that access large amounts of data - either sequentially or randomly - usually benefit from increased bandwidth. Bandwidth can be increased either by increasing the number of memory channels (i.e. dual-channel) or by increasing the clock speed of the memory. Doubling memory bandwidth will never lead to a doubling of actual performance except in theoretical benchmarks, but it could provide a significant boost in performance. Many games and multimedia benchmarks process large amounts of data that cannot reside within the cache of the CPU, and being able to retrieve the data faster can help out. All other things being equal, more bandwidth will never hurt performance.

It is important to make clear that this is only a very brief overview of common RAM timings. Memory is really very complex, and stating that lower CAS Latencies and higher bandwidths are better is a generalization. It compares to stating that "larger caches and higher clock speeds are better" in the CPU realm. This is often true, but there are many other factors that come into play. For CPUs, we also need to consider pipeline lengths, number of in-flight instructions, specific instruction latencies, number and type of execution units, etc. RAM has numerous other timings that can come into play, and the memory controller, FSB, and many other influences can also affect the resulting performance and efficiency of a system. Some people might think that designing memory is relatively simple compared to working on CPUs, but especially with rising clock speeds, this is not the case.

Design Considerations I'm late, I'm late for a very important date!
Comments Locked

22 Comments

View All Comments

  • 666an666 - Thursday, May 14, 2009 - link

    Thanks for the details. Unfortunatelt, most sellers of RAM (and most brand packagings) fail to mention these measurement details. They only show obscure model numbers and "PC-3200" or whatever. They usually only offer the choice of various brands, not various CL values.
  • letter rip - Saturday, December 25, 2004 - link

    This is great reading. When's the next installment?
  • Herm0 - Wednesday, November 10, 2004 - link

    There are two things that sould improve greatly a DIMM performance, in addition to the well known timings things "2-2-2-6"... , but looking at DIMMs specs, are hard to know :

    - The number of internal Banks. When a DIMM use multiple banks, the DIMM is divided in pieces, each holding its own grid of data and the logic to access it. Going from one bank to another one have no penalty : the memory controller have to send the bank address on two physical DIMM pins (so that it can't be more than 4 banks in a DIMM) at each access. Having a 2/4 bank DIMM is really like having 2/4 DIMMs : while one bank is waiting for a delay to exhaust (a CAS latency, a RAS latency, a RAS precharge...), the memory controller can send an order or do r/w things on another one... Most manufacturer build 2 banks DIMMs (when they publish that information !), few of them do 4 banks DIMMs.

    - The wideness of their row. It's slow to access to the 1st data of a row (1: wait for tRP, Row Precharge, from the last operation, 2: send the new row address and wait tRCD, 3: Ras to Cas Delay, send the column address and wait tCL, Cas delay, read the 1st 64bit bloc of data), but it's fast to read from the activated row (Send the starting column and wait tCL, then read/write data, 1 or 2 per clock (SDRAM or DDRAM), of the pre-programmed length & order). In a ideal DIMM having only 1 row, the only penalty would be from the tCL one ! The more large is a row, the more data can be accessed before dealing with Row delays (Precharge, and Ras to Cas). The row size is nearly never published, and I don't know how to get the number from the detailed DIMM/DRAM specs...

    Looking at 1Gb DDR400 DIMM modules too as #19, a good one, theorically, seems to be a Kingston's DIMMs :
    - Timings = 2.5-3-3-7 (shouldn't last digit be 2.5+3+2 = 7.5 or 8 ?), most 1 Gb DIMMs are 3-3-3-8 or slowers.
    - Banks = 4, most of DIMMs, even high-end ones, are only 2 Banks.
    - Row size = ??? Unknown...

    Am I right, or do I have to re-do Ars Technica lessons ? :-)
  • Gioron - Thursday, September 30, 2004 - link

    In terms of buying 512M of fast memory of 1G of slow memory... here's what a quick look at prices for memory looked like (all corsair sticks and only from one vendor because I'm lazy and didn't want to complicate things):
    512M "Value" (CL2.5): $77
    512M "XMS" (CL2): $114
    512M "Xtra low" (2-2-2-5): $135
    1G "Value" kit (CL3, 2x512M):$158

    To me, it looks like the "Xtra low" is indeed not a good bang for the buck, with the 1G upgrade only $20 more. However, the "XMS" 512M might be a good price point if you don't want to go all the way to $158 but have more than $77. Going for insanely low latencies seems to be only worth it if you have plenty of cash to spare and are already at 1G or more. (Or else are optimizing for a single, small application that relies heavily on RAM timings, but I don't think you'll run into that too much in a desktop environment.)

    One thing that might be useful in later articles is a brief discussion on the tradeoffs between size and performenace in relation to swapping pages to disk. Not sure if that will fit in with the planned article content, however.
  • JarredWalton - Wednesday, September 29, 2004 - link

    ??? I didn't think I actually started with a *specific* type of RAM - although I suppose it does apply to SDRAM/DDR, it also applies to most other types of RAM at an abstract level. There are lots of abstractions, like the fact that a memory request actually puts the row address and column address on different pins - it doesn't just "arrive". I didn't want to get into really low-level details, but look more at the overall picture. The article was more about the timings and what each one means, but you have to have a somewhat broader understanding of how RAM is accessed before such detail as CAS and RAS can really be explained in a reasonable manner.
  • Lynx516 - Wednesday, September 29, 2004 - link

    Not much has changed fundementaly with SDRAM since the early days of ddR.

    I never actually said a burst was a column but infact a continous set of columns (unless interleaved).

    Ok I admit there arnt many books on processor design and latency however there are data sheets and articles that describe the basics. Once tyou have grasped the basics you can work it out using the data sheets e.t.c

    Probably a better place to start with this series would have been the memory heirarchy instead of starting with a specifc
    type of RAM
  • JarredWalton - Wednesday, September 29, 2004 - link

    The idea here is to have an article on Anandtech.com. :) I like Ars Technica as much as the next guy, but there are lots of different ways of describing technology. Sometimes you just have to write a new article covering information available elsewhere, you know? How many text books are there on processor design and latency? Well, here's another article discussing memory. Also worth noting is that Ars hasn't updated their memory information since the days of SDRAM and DDR (late 2000), and things certainly have changed since then.

    I should clarify my last comment I made: the column width of DDR is not really 32 bytes or 64 bytes, but that seems to be how many memory companies now refer to it in *layman's* terms. This article is much more of a layman's approach. The deep EE stuff on how everything works is more than most people really want to know or understand (for better or for worse). A column can also be regarded as each piece of a burst, which is probably the correct terminology. We'll be looking at various implementations in the next article - hopefully stuff that you haven't read a lot about yet. :)
  • greendonuts3 - Tuesday, September 28, 2004 - link

    Meh. You kind of started in the middle of the topic and worked your way outward/backward/forward. As a general user, I found the wealth of info more confusing than helpful in understanding ram. Maybe you could focus just on timing issues, which seems to be your intent, and refer the reader to other articles (eg the Ars one mentioned above) for the basics?
    Thanks.
  • JarredWalton - Tuesday, September 28, 2004 - link

    The comparison with set associativity is not that bad, in my opinion. What you have to remember is that we would then be talking about a direct-mapped cache with a whopping four entries (one per sense amp/active row). I guess I didn't explain it too well, and it's not a perfect match, true.

    Regarding burst lengths, each burst is not a column of information, although perhaps it was on older RAM types. For instance, the burst length of DDR can be 4 or 8. Each burst transmits (in the case of single-channel configurations) 64 bits of data, or 8 bytes. The column size is not 8 bytes these days, however - it is either 32 bytes or 64 bytes on DDR. (Dual-channel would effectively double those values.)
  • ss284 - Tuesday, September 28, 2004 - link

    I wouldnt say that the article is that confusing, but there is much truth in the post above^^^.

    -Steve

Log in

Don't have an account? Sign up now