A Quick Flash Refresher

DRAM is very fast. Writes happen in nanoseconds as do CPU clock cycles, those two get along very well. The problem with DRAM is that it's volatile storage; if the charge stored in each DRAM cell isn't refreshed, it's lost. Pull the plug and whatever you stored in DRAM will eventually disappear (and unlike most other changes, eventually happens in fractions of a second).

Magnetic storage, on the other hand, is not very fast. It's faster than writing trillions of numbers down on paper, but compared to DRAM it plain sucks. For starters, magnetic disk storage is mechanical - things have to physically move to read and write. Now it's impressive how fast these things can move and how accurate and relatively reliable they are given their complexity, but to a CPU, they are slow.

The fastest consumer hard drives take 7 milliseconds to read data off of a platter. The fastest consumer CPUs can do something with that data in one hundred thousandth that time.

The only reason we put up with mechanical storage (HDDs) is because they are cheap, store tons of data and are non-volatile: the data is still there even when you turn em off.

NAND flash gives us the best of both worlds. They are effectively non-volatile (flash cells can lose their charge but after about a decade) and relatively fast (data accesses take microseconds, not milliseconds). Through electron tunneling a charge is inserted into an N-channel MOSFET. Once the charge is in there, it's there for good - no refreshing necessary.


N-Channel MOSFET. One per bit in a NAND flash chip.

One MOSFET is good for one bit. Group billions of these MOSFETs together, in silicon, and you've got a multi-gigabyte NAND flash chip.

The MOSFETs are organized into lines, and the lines into groups called pages. These days a page is usually 4KB in size. NAND flash can't be written to one bit at a time, it's written at the page level - so 4KB at a time. Once you write the data though, it's there for good. Erasing is a bit more complicated.

To coax the charge out of the MOSFETs requires a bit more effort and the way NAND flash works is that you can't discharge a single MOSFET, you have to erase in larger groups called blocks. NAND blocks are commonly 128 pages, that means if you want to re-write a page in flash you have to first erase it and all 127 adjacent pages first. And allow me to repeat myself: if you want to overwrite 4KB of data from a full block, you need to erase and re-write 512KB of data.

To make matters worse, every time you write to a flash page you reduce its lifespan. The JEDEC spec for MLC (multi-level cell) flash is 10,000 writes before the flash can start to fail.

Dealing with all of these issues requires that controllers get very crafty with how they manage writes. A good controller must split writes up among as many flash channels as possible, while avoiding writing to the same pages over and over again. It must also deal with the fact that some data is going to get frequently updated while others will remain stagnant for days, weeks, months or even years. It has to detect all of this and organize the drive in real time without knowing anything about how it is you're using your computer.

It's a tough job.

But not impossible.

Index Live Long and Prosper: The Logical Page
Comments Locked

295 Comments

View All Comments

  • nemitech - Monday, August 31, 2009 - link

    opps - not ebay - it was NEWEGG.
  • Loki726 - Monday, August 31, 2009 - link

    Thanks a ton for including the pidgin compiler benchmarks. I didn't think that HD performance would make much of a difference (linking large builds might be a different story), but it is great to have numbers to back up that intuition. Keep it up.
  • torsteinowich - Monday, August 31, 2009 - link

    Hi

    You write that the Indilinx wiper tool collects a free page list from the OS, then wipes the pages. This sounds like a dangerous operation to me since the OS might allocate some of these blocks after the tool collects the list, but before they are wiped.

    Have you received a good explanation for Indilinx about how they ensure file system integrity? As far as i know Windows cannot temporarily switch to read-only mode on an active file system (at least not the system drive). The only way i could see this tool working safely would be by booting off a different media and accessing the file system to be trimmed offline with a tool that correctly identifies the unused pages for the particular file system being used. I could be wrong of course, maybe windows 7 has a system call to temporarily freeze FS writes, but i doubt it.
  • has407 - Monday, August 31, 2009 - link

    It: (1) creates a large temporary file (wiper.dat) which gobbles up all (or most) of the free space; (2) determines the LBA's occupied by that file; (3) tells the SSD to TRIM those LBA's; and then (4) deletes the temporary file (wiper.date).

    From the OS/filesystem perspective, it's just another app and another file. (A similar technique is used by, e.g., sysinternals Windows SDelete app to zero free space. For Windows you could also probably use the hooks used by defrag utilities to accomplis it, but that would be a lot more work.)
  • cghebert - Monday, August 31, 2009 - link

    Anand,

    Great article. Once again you have outclassed pretty much every other site out there with the depth of content in this review. You should start marketing t-shirts that say "Everything I learned about SSDs I learned from AnandTech"

    I did have a question about gaming benchmarks, since you made this statement:

    " but as you'll see later on in my gaming tests the benefits of an SSD really vary depending on the game"

    But I never saw any gaming benchmarks. Did I miss something?
  • nafhan - Monday, August 31, 2009 - link

    Just wanted to say awesome review.
    I've been reading Anandtech since 2000, and while other sites have gone downhill or (apparently) succumbed to pressure from advertisers, you guys have continued to give in depth, critical reviews.
    I also appreciate that you do some real analysis instead of just throwing 10 pages of charts online.
    Thanks, and keep up the good work!
  • zysurge - Monday, August 31, 2009 - link

    Awesome amazing article. So much information, presented clearly.

    Question, though? I have an Intel G2 160GB drive coming in the next few days for my Dell D830 laptop, which will be running Windows 7 x64.

    Do I set the controller to ATA and use the Intel Matrix driver, or set it to AHCI and use Microsoft's driver? Will either provide an advantage? I realize neither will provide TRIM until Q4, but after the firmware update, both should, right?

    Thanks in advance!
  • ggathagan - Wednesday, September 16, 2009 - link

    From page 15 (Early Trim support...):
    Under Windows 7 that means you have to use a Microsoft made IDE or AHCI driver (you can't install chipset drivers from anyone else).
  • Mumrik - Monday, August 31, 2009 - link

    but I can't live with less than 300GB on that drive, and SSDs in usable sizes still cost more than high end video cards :-(

    I really hope I'll be able to pick up a 300GB drive for 100-200 bucks in a year or so, but it is probably a bit too optimistic.
  • Simen1 - Monday, August 31, 2009 - link

    This is simply wrong. Ask anyone over 10 years if they think this mathematical statement is true or false. 80 can never equal 74,5.

    Now, someone claims that 1 GB = 10^9 B and others claim that 1 GB is 2^30 B. Who is really right? What does the G and the B mean? Who defines that?

    The answers is easy to find and document. B means Byte. G stands for Giga ans means 10^6, not 2^30. Giga is defined in the international system of units, SI.

    No standardization organization have _ever_ defined Giga to be 2^30. But IEC, International Electrotechnical Commission, have defined "Gi" to 2^30. This is supposed to be used for digital storage so people wont be confused by all the misunderstandings around this. Misunderstandings that mainly comes from Microsoft and quite a few other big software vendors. Companies that ignore the mathematical errors in their software when they claim that 80GB = 74,5 GB, and ignore both international standards on how to shorten large numbers.

Log in

Don't have an account? Sign up now