Memory Subsystem Overview

We mentioned how changes to the module design can require changes to the memory controller as well. When an address arrives at the memory, it does not simply appear there directly from the CPU; we are really talking about several steps. First, the CPU sends the request to the cache, and if the data is not in the cache, the request is forwarded to the memory controller via the Front Side Bus (FSB). (In some newer systems like the Athlon 64, requests may arrive via a HyperTransport bus, but the net result is basically the same.) The memory controller then sends the request to the memory modules over the memory bus. Once the data is retrieved internally on the memory module, it gets sent from the RAM via the memory bus back to the memory controller. The memory controller then sends it onto the FSB, and eventually, the requested data arrives at the CPU.

Note that the data could also be requested/sent somewhere else. DMA (Direct Memory Access) allows other devices such as network adapters, sound cards, graphics cards, controller cards, etc. to send requests directly to the memory controller, bypassing the CPU. In this overview, we were talking about the CPU to RAM pathway, but the CPU could be replaced by other devices. Normally, the CPU generates the majority of the memory traffic, and that is what we will mostly cover. However, there are other uses of the RAM that can come into play, and we will address those when applicable.

Now that we have explained how the requests actually arrive, we need to cover a few details about how the data is transmitted from the memory module(s). When the requested column is ready to transmit back to the memory controller, we said before that it is sent in "bursts". What this means is that data will be sent on every memory bus clock edge - think of it as a "slot" - for the RAM's burst length. If the memory bus is running at a different speed than the FSB, though - especially if it's running slower - there can be some additional delays. The significance of these delays varies by implementation, but at best, you will end up with some "bubbles" (empty slots) in the FSB. Consider the following specific example.

On Intel's quad-pumped bus, each non-empty transmission needs to be completely full, so all four slots need to have data. (There are caveats that allow this rule to be "bent", but they incur a loss of performance and so they are avoided whenever possible.) If you have a quad-pumped 200 MHz FSB (the current P4 bus) and the RAM is running on a double-pumped 166 MHz bus, the FSB is capable of transmitting more data than the RAM is supplying. In order to guarantee that all four slots on an FSB clock cycle contain data, the memory controller needs to buffer the data to make sure an "underrun" does not occur - i.e. the memory controller starts sending data and then runs out after the first one or two slots. Each FSB cycle comes at 5 ns intervals, and with a processor running at 3.0 GHz, a delay of 5 ns could mean as many as 15 missed CPU cycles!

There are a couple of options to help speed up the flow of data from the memory controller to the FSB. One is to use dual-channel memory, so the buffer will fill up in half the time. This helps to explain why Intel benefits more from dual-channel RAM than AMD: their FSB and memory controller are really designed for the higher bandwidth. Another option is to simply get faster RAM until it is able to equal the bandwidth of the FSB. Either one generally works well, but having a memory subsystem with less bandwidth than what the FSB can use is not on ideal situation, especially for the Intel design. This is why most people recommend against running your memory and system busses asynchronously. Running RAM that provides a higher bandwidth than what the FSB can use does not really help, other than to reduce latencies in certain situations. If the memory can provide 8.53 GB/s of bandwidth and the FSB can only transmit 6.4 GB/s, the added bandwidth generally goes to waste. For those wondering why benchmarks using DDR2-533 with an 800 FSB P4 do not show much of an advantage for the faster memory, this is the main reason. (Of course, on solutions with integrated graphics, the additional memory bandwidth could be used for graphics work, and in servers, the additional bandwidth can be helpful for I/O.)

If you take that entire description of the memory subsystem, you can also see how AMD was able to benefit by moving the memory controller onto the CPU die. Now, the delays associated with the transmission of data over the FSB are almost entirely removed. The memory controller still has to do work, but with the controller running at CPU clock speeds, it will be much faster than before. The remaining performance deficit that Athlon 64 and Opteron processors suffer when running slower RAM can be attributed to the loss of bandwidth and the increased latencies, which we will discuss more in a moment. There are a few other details that we would like to mention first.

Understanding Memory Access Design Considerations
Comments Locked

22 Comments

View All Comments

  • Lynx516 - Tuesday, September 28, 2004 - link

    Your description of how SDRAM works is wrong. you do not bust down the rows as your artcle implys but instead it bursts along the columns.

    The whole column is sent imeadiatly but the other columns in the burst are not and are sent sequencially (idealy not quite the case if you want to interleave them).

    Comparing Banks to set associativity is probably counter productive as most of your reader wont fully under stand how it works. And infact comparing banks to set associativity is a bad annalogy. A better one would be just to say taht the memmory space in the chip is split up into banks.

    On top of this you have referred to a detailed comparsion of DRAM types. Even though there are many different types of DRAM most are not that interesting or used that much in PCs. I also assume that as you have said this you will not be talking about SRAM or RDRAM in forth coming articles which highlight the different approaches that can be taken when designing a memory sub system. (SRAM the low latency, high bandwidth but low density, RDRAM the serial approach)

    I assume you are going to talk abit about how a memory controller works as they are one of the most complex components in a PC (more complex than the exectution core of a CPU) but you have not refered to any plans to talk about memory controller and how the type of memory you are using affects the design of a memory controller.

    All in all a pretty confusingly written article. If you want a DRAM for beginners arstechnica have two good articles (though one is fairly old but atleast correctly and CLEARLY describes how SDRAM works).

  • Resh - Tuesday, September 28, 2004 - link

    I really think that some diagrams would help, especially for novices like #10. Other than that, great article and hope to see the follow-ups soon.
  • Modal - Tuesday, September 28, 2004 - link

    Great article, thanks. I like these "this is how the pieces of your computer work" articles... very interesting stuff, but it's usually written in far too complicated a manner for a relative novice like me. This was quite readable and understandable however; nice work.
  • danidentity - Tuesday, September 28, 2004 - link

    This is one of the best articles I've seen at Anandtech in a long while, keep up the good work.
  • deathwalker - Tuesday, September 28, 2004 - link

    I..for one, would rather have 1 GB of CL 2.5 high quality memory than 512 MB of CL 2 high quality memory. I'm conviced that in this instance quantity wins out over speed.
  • AlphaFox - Tuesday, September 28, 2004 - link

    where are the pictures? ;)
  • Pollock - Tuesday, September 28, 2004 - link

    Excellent read!
  • mino - Tuesday, September 28, 2004 - link

    Sry for triple post but some major typpos:
    "1) buy generic memory until your budget could afford no more than 512M DDR400"
    should be:
    "1) buy generic memory until your budget could afford more than 512M DDR400"
    and
    "Goog"(ROFL) should be "Good"
    onother -> another
    Hope that's all ;)
  • mino - Tuesday, September 28, 2004 - link

    OK, 3 rules ;) - I added 3rd after some thought.
  • mino - Tuesday, September 28, 2004 - link

    #2 You are missing one important point. That is, unless You can(want) afford at least 512M high quality RAM, it makes NO SENSE to buy 256M DDR400 CL2 since there are 2 basic rules:

    1) buy generic memory until your budget could afford no more than 512M DDR400
    2) then spend some aditional money for brand memory
    3) then go 1G and only at this point spent all additional money for better latencies and so on.

    Also do remember that at many shops(here in Slovakia) there is 3 or 4 yrs warranty for generic memory(like A-DATA) and also if you have major problems with compatibility they will usually allow you to choose different brand/type for your board for no additional cost except price difference. Also in case the memory works fine with onother board.
    Also Twinmos parts have 99month warranty (for price 10% higher than generic). That speaks for itself.

    Except this little missing part of reality,

    Goog work Jarred.

Log in

Don't have an account? Sign up now