A Quick Flash Refresher

DRAM is very fast. Writes happen in nanoseconds as do CPU clock cycles, those two get along very well. The problem with DRAM is that it's volatile storage; if the charge stored in each DRAM cell isn't refreshed, it's lost. Pull the plug and whatever you stored in DRAM will eventually disappear (and unlike most other changes, eventually happens in fractions of a second).

Magnetic storage, on the other hand, is not very fast. It's faster than writing trillions of numbers down on paper, but compared to DRAM it plain sucks. For starters, magnetic disk storage is mechanical - things have to physically move to read and write. Now it's impressive how fast these things can move and how accurate and relatively reliable they are given their complexity, but to a CPU, they are slow.

The fastest consumer hard drives take 7 milliseconds to read data off of a platter. The fastest consumer CPUs can do something with that data in one hundred thousandth that time.

The only reason we put up with mechanical storage (HDDs) is because they are cheap, store tons of data and are non-volatile: the data is still there even when you turn em off.

NAND flash gives us the best of both worlds. They are effectively non-volatile (flash cells can lose their charge but after about a decade) and relatively fast (data accesses take microseconds, not milliseconds). Through electron tunneling a charge is inserted into an N-channel MOSFET. Once the charge is in there, it's there for good - no refreshing necessary.


N-Channel MOSFET. One per bit in a NAND flash chip.

One MOSFET is good for one bit. Group billions of these MOSFETs together, in silicon, and you've got a multi-gigabyte NAND flash chip.

The MOSFETs are organized into lines, and the lines into groups called pages. These days a page is usually 4KB in size. NAND flash can't be written to one bit at a time, it's written at the page level - so 4KB at a time. Once you write the data though, it's there for good. Erasing is a bit more complicated.

To coax the charge out of the MOSFETs requires a bit more effort and the way NAND flash works is that you can't discharge a single MOSFET, you have to erase in larger groups called blocks. NAND blocks are commonly 128 pages, that means if you want to re-write a page in flash you have to first erase it and all 127 adjacent pages first. And allow me to repeat myself: if you want to overwrite 4KB of data from a full block, you need to erase and re-write 512KB of data.

To make matters worse, every time you write to a flash page you reduce its lifespan. The JEDEC spec for MLC (multi-level cell) flash is 10,000 writes before the flash can start to fail.

Dealing with all of these issues requires that controllers get very crafty with how they manage writes. A good controller must split writes up among as many flash channels as possible, while avoiding writing to the same pages over and over again. It must also deal with the fact that some data is going to get frequently updated while others will remain stagnant for days, weeks, months or even years. It has to detect all of this and organize the drive in real time without knowing anything about how it is you're using your computer.

It's a tough job.

But not impossible.

Index Live Long and Prosper: The Logical Page
POST A COMMENT

296 Comments

View All Comments

  • zodiacfml - Wednesday, September 02, 2009 - link

    Very informative, answered more than anything in my mind. Hope to see this again in the future with these drive capacities around $100. Reply
  • mgrmgr - Wednesday, September 02, 2009 - link

    Any idea if the (mid-Sept release?) OCZ Colossus's internal RAID setup will handle the problem of RAID controllers not being able to pass Windows 7's TRIM command to the SSD array. I'm intent on getting a new Photoshop machine with two SSDs in Raid-0 as soon as Win7 releases, but the word here and elsewhere so far is that RAID will block the TRIM function. Reply
  • kunedog - Wednesday, September 02, 2009 - link

    All the Gen2 X-25M 80GB drives are apparently gone from Newegg . . . so they've marked up the Gen1 drives to $360 (from $230):
    http://www.newegg.com/Product/Product.aspx?Item=N8...">http://www.newegg.com/Product/Product.aspx?Item=N8...

    Unbelievable.
    Reply
  • gfody - Wednesday, September 02, 2009 - link

    What happened to the gen2 160gb on Newegg? For a month the ETA was 9/2 (today) and now it's as if they never had it in the first place. The product page has been removed.

    It's like Newegg are holding the gen2 drives hostage until we buy out their remaining stock of gen1 drives.
    Reply
  • iwodo - Tuesday, September 01, 2009 - link

    I think it acts as a good summary. However someone wrote last time about Intel drive handling Random Read / Write extremely poorly during Sequential Read / Write.

    Has Aanand investigate yet?

    I am hoping next Gen Intel SSD coming in Q2 10 will bring some substantial improvement.
    Reply
  • statik213 - Tuesday, September 01, 2009 - link

    Does the RAID controller propagate TRIM commands to the SSD? Or will having RAID negate TRIM? Reply
  • justaviking - Tuesday, September 01, 2009 - link

    Another great article, Anand! Thanks, and keep them coming.

    If this has already been discussed, I apologize. I'm still exhausted from reading the wonderful article, and have not read all 17 pages of comments.

    On PAGE 3, it talks about the trade-off of larger vs. smaller pages.

    I wonder if it would be feasible to make a hybrid drive, with a portion of the drive using small pages for faster performance when writing small files, and the majority of it being larger pages to keep the management of the drive reasonable.

    Any file could be written anywhere, but the controller would bias small writes to the small pages, and large writes to large files.

    Externally it would appear as a single drive, of course, but deep down in the internals, it would essentially be two drives. Each of the two portions would be tuned for maximum performance in different areas, but able to serve as backup or overflow if the other portion became full or ever got written to too many times.

    Interesting concept? Or a hair brained idea buy an ignorant amateur?
    Reply
  • CList - Tuesday, September 01, 2009 - link

    Great article, wonderful to see insightful, in depth analysis.

    I'd be curious to hear anyone's thoughts on the implications are of running virtual hard disk files on SSD's. I do a lot of work these days on virtual machines, and I'd love to get them feeling more snappy - especially on my laptop which is limited to 4GB of ram.

    For example;
    What would the constant updates of those vmdk (or "vhd") files do to the disk's lifespan?

    If the OS hosting the VM is windows 7, but the virtual machine is WinServer2003 will the TRIM command be used properly?

    Cheers,
    CList
    Reply
  • pcfxer - Tuesday, September 01, 2009 - link

    Great article!

    "It seems that building Pidgin is more CPU than IO bound.."

    Obviously, Mr. Anand doesnt' understand how compilers work ;). Compilers will always be CPU and memory bound, reduce your memory in the computer to say 256MB (or lower) and you'll see what I mean. The levels of recursion necessary to follow the production (grammars that define the language) use up memory but would rarely use the drive unless the OS had terrible resource management. :0.
    Reply
  • CMGuy - Wednesday, September 02, 2009 - link

    While I can't comment on the specifics of software compilers I know that faster disk IO makes a big difference when your performing a full build (compilation and packaging) of software.
    IDEs these days spend a lot their time reading/writing small files (thats a lot of small, random, disk IO) and a good SSD can make a huge difference to this.
    Reply

Log in

Don't have an account? Sign up now