An Anecdote

Getting the whole picture of how memory performance impacts system performance is still a very difficult task. If all this talk of timings and latencies has not helped, let us provide another comparison. Think of the CPU as a cook at a restaurant, busily working to keep up with customer demand. There is a process that occurs. Waiters or cashiers take the orders and send them to the cook, the cook prepares the food, and the final result is delivered to the customer. Sounds simple enough, right? Let's look at some of the details.

When an order for a dish comes in, certain common items (e.g. fries, rice, soup, salads, etc.) may already be prepared, so delivering them to the customer occurs rapidly. We can think of this as the processor finding something in the L1 cache. This is great when it occurs, but it only occurs for a very limited number of items. Most of the time, the cook will need to begin preparing the order, so he will get the items from the cupboard, freezer and refrigerator and begin cooking them. This time, the ingredients are in the L2/L3 cache. So far so good, but where does RAM come into play?

As items are pulled from the fridge, freezer, etc., the restaurant will need to restock them. The supplies have to be ordered from headquarters or whomever the restaurant uses. This is akin to system RAM (or maybe even the hard drive, but we'll leave that out of the analogy for now). If the restaurant can anticipate needs properly, it can order the supplies in advance. Sometimes, though, supplies run low - or maybe you didn't order the correct amount of supplies - and you need to send someone off to a local store for additional ingredients. This is a cache miss, and the store is the system RAM. In a time-critical situation such as this one, the cook wants the ingredients ASAP. A closer store would be better, or perhaps a store with faster checkout lanes, but provided that the trip does not take a really long time, any store is about as good as another. Basically, system RAM with its timings and latencies can have an impact, but a really fast memory controller (i.e. a store next door) with slower RAM (slow checkout lanes) can be more important than having the fastest RAM in the world.

This is all well and good for smaller restaurants and chains, but a large corporation (e.g. McDonald's) cannot simply walk next door to pick up some frozen burgers. In this case, the whole supply chain needs to be highly efficient. Instead of ordering supplies once a week, inventories might be checked every night, and orders placed as necessary. Headquarters has forecasts based on past requirements and may send orders to their suppliers months in advance. This supply chain correlates loosely with the idea of outstanding memory requests, prefetch logic, deeper buffers, etc. Bandwidth also comes into play here, as a large chain might have several large trailers of supplies en route at any point in time, while a smaller chain might be able to get by with only one or two moderately-sized delivery vans.

With faster processors, faster buses, faster RAM, etc., the analogy is moving towards all processors being large corporations with huge demands. Early 8088 and 8086 processors could just wander to the local store as necessary - like what most adults do for their own cooking needs. As the amount of data being processed increases, though, everything becomes exponentially more difficult. There is a big jump from running one small restaurant that serves a few dozen people daily to serving hundreds of people daily, to running several locations, to running a corporation that has locations scattered across the world. That is essentially what we have seen in the world of computer processors. We have gone from running a local "mom-and-pop" burger joint to running McDonald's, Burger King, and several other hamburger chains.

This analogy is probably flawed at numerous levels, but hopefully it helps. If you think about it, the complexity of any one subsystem of the modern PC is probably hundreds of times greater than that of the entire original IBM PC. The change did not occur instantly, but even the largest of technology corporations are going to have a lot of trouble staying at the top of every area of computers.

I'm late, I'm late for a very important date! Conclusion
Comments Locked

22 Comments

View All Comments

  • 666an666 - Thursday, May 14, 2009 - link

    Thanks for the details. Unfortunatelt, most sellers of RAM (and most brand packagings) fail to mention these measurement details. They only show obscure model numbers and "PC-3200" or whatever. They usually only offer the choice of various brands, not various CL values.
  • letter rip - Saturday, December 25, 2004 - link

    This is great reading. When's the next installment?
  • Herm0 - Wednesday, November 10, 2004 - link

    There are two things that sould improve greatly a DIMM performance, in addition to the well known timings things "2-2-2-6"... , but looking at DIMMs specs, are hard to know :

    - The number of internal Banks. When a DIMM use multiple banks, the DIMM is divided in pieces, each holding its own grid of data and the logic to access it. Going from one bank to another one have no penalty : the memory controller have to send the bank address on two physical DIMM pins (so that it can't be more than 4 banks in a DIMM) at each access. Having a 2/4 bank DIMM is really like having 2/4 DIMMs : while one bank is waiting for a delay to exhaust (a CAS latency, a RAS latency, a RAS precharge...), the memory controller can send an order or do r/w things on another one... Most manufacturer build 2 banks DIMMs (when they publish that information !), few of them do 4 banks DIMMs.

    - The wideness of their row. It's slow to access to the 1st data of a row (1: wait for tRP, Row Precharge, from the last operation, 2: send the new row address and wait tRCD, 3: Ras to Cas Delay, send the column address and wait tCL, Cas delay, read the 1st 64bit bloc of data), but it's fast to read from the activated row (Send the starting column and wait tCL, then read/write data, 1 or 2 per clock (SDRAM or DDRAM), of the pre-programmed length & order). In a ideal DIMM having only 1 row, the only penalty would be from the tCL one ! The more large is a row, the more data can be accessed before dealing with Row delays (Precharge, and Ras to Cas). The row size is nearly never published, and I don't know how to get the number from the detailed DIMM/DRAM specs...

    Looking at 1Gb DDR400 DIMM modules too as #19, a good one, theorically, seems to be a Kingston's DIMMs :
    - Timings = 2.5-3-3-7 (shouldn't last digit be 2.5+3+2 = 7.5 or 8 ?), most 1 Gb DIMMs are 3-3-3-8 or slowers.
    - Banks = 4, most of DIMMs, even high-end ones, are only 2 Banks.
    - Row size = ??? Unknown...

    Am I right, or do I have to re-do Ars Technica lessons ? :-)
  • Gioron - Thursday, September 30, 2004 - link

    In terms of buying 512M of fast memory of 1G of slow memory... here's what a quick look at prices for memory looked like (all corsair sticks and only from one vendor because I'm lazy and didn't want to complicate things):
    512M "Value" (CL2.5): $77
    512M "XMS" (CL2): $114
    512M "Xtra low" (2-2-2-5): $135
    1G "Value" kit (CL3, 2x512M):$158

    To me, it looks like the "Xtra low" is indeed not a good bang for the buck, with the 1G upgrade only $20 more. However, the "XMS" 512M might be a good price point if you don't want to go all the way to $158 but have more than $77. Going for insanely low latencies seems to be only worth it if you have plenty of cash to spare and are already at 1G or more. (Or else are optimizing for a single, small application that relies heavily on RAM timings, but I don't think you'll run into that too much in a desktop environment.)

    One thing that might be useful in later articles is a brief discussion on the tradeoffs between size and performenace in relation to swapping pages to disk. Not sure if that will fit in with the planned article content, however.
  • JarredWalton - Wednesday, September 29, 2004 - link

    ??? I didn't think I actually started with a *specific* type of RAM - although I suppose it does apply to SDRAM/DDR, it also applies to most other types of RAM at an abstract level. There are lots of abstractions, like the fact that a memory request actually puts the row address and column address on different pins - it doesn't just "arrive". I didn't want to get into really low-level details, but look more at the overall picture. The article was more about the timings and what each one means, but you have to have a somewhat broader understanding of how RAM is accessed before such detail as CAS and RAS can really be explained in a reasonable manner.
  • Lynx516 - Wednesday, September 29, 2004 - link

    Not much has changed fundementaly with SDRAM since the early days of ddR.

    I never actually said a burst was a column but infact a continous set of columns (unless interleaved).

    Ok I admit there arnt many books on processor design and latency however there are data sheets and articles that describe the basics. Once tyou have grasped the basics you can work it out using the data sheets e.t.c

    Probably a better place to start with this series would have been the memory heirarchy instead of starting with a specifc
    type of RAM
  • JarredWalton - Wednesday, September 29, 2004 - link

    The idea here is to have an article on Anandtech.com. :) I like Ars Technica as much as the next guy, but there are lots of different ways of describing technology. Sometimes you just have to write a new article covering information available elsewhere, you know? How many text books are there on processor design and latency? Well, here's another article discussing memory. Also worth noting is that Ars hasn't updated their memory information since the days of SDRAM and DDR (late 2000), and things certainly have changed since then.

    I should clarify my last comment I made: the column width of DDR is not really 32 bytes or 64 bytes, but that seems to be how many memory companies now refer to it in *layman's* terms. This article is much more of a layman's approach. The deep EE stuff on how everything works is more than most people really want to know or understand (for better or for worse). A column can also be regarded as each piece of a burst, which is probably the correct terminology. We'll be looking at various implementations in the next article - hopefully stuff that you haven't read a lot about yet. :)
  • greendonuts3 - Tuesday, September 28, 2004 - link

    Meh. You kind of started in the middle of the topic and worked your way outward/backward/forward. As a general user, I found the wealth of info more confusing than helpful in understanding ram. Maybe you could focus just on timing issues, which seems to be your intent, and refer the reader to other articles (eg the Ars one mentioned above) for the basics?
    Thanks.
  • JarredWalton - Tuesday, September 28, 2004 - link

    The comparison with set associativity is not that bad, in my opinion. What you have to remember is that we would then be talking about a direct-mapped cache with a whopping four entries (one per sense amp/active row). I guess I didn't explain it too well, and it's not a perfect match, true.

    Regarding burst lengths, each burst is not a column of information, although perhaps it was on older RAM types. For instance, the burst length of DDR can be 4 or 8. Each burst transmits (in the case of single-channel configurations) 64 bits of data, or 8 bytes. The column size is not 8 bytes these days, however - it is either 32 bytes or 64 bytes on DDR. (Dual-channel would effectively double those values.)
  • ss284 - Tuesday, September 28, 2004 - link

    I wouldnt say that the article is that confusing, but there is much truth in the post above^^^.

    -Steve

Log in

Don't have an account? Sign up now