A Quick Flash Refresher

DRAM is very fast. Writes happen in nanoseconds as do CPU clock cycles, those two get along very well. The problem with DRAM is that it's volatile storage; if the charge stored in each DRAM cell isn't refreshed, it's lost. Pull the plug and whatever you stored in DRAM will eventually disappear (and unlike most other changes, eventually happens in fractions of a second).

Magnetic storage, on the other hand, is not very fast. It's faster than writing trillions of numbers down on paper, but compared to DRAM it plain sucks. For starters, magnetic disk storage is mechanical - things have to physically move to read and write. Now it's impressive how fast these things can move and how accurate and relatively reliable they are given their complexity, but to a CPU, they are slow.

The fastest consumer hard drives take 7 milliseconds to read data off of a platter. The fastest consumer CPUs can do something with that data in one hundred thousandth that time.

The only reason we put up with mechanical storage (HDDs) is because they are cheap, store tons of data and are non-volatile: the data is still there even when you turn em off.

NAND flash gives us the best of both worlds. They are effectively non-volatile (flash cells can lose their charge but after about a decade) and relatively fast (data accesses take microseconds, not milliseconds). Through electron tunneling a charge is inserted into an N-channel MOSFET. Once the charge is in there, it's there for good - no refreshing necessary.


N-Channel MOSFET. One per bit in a NAND flash chip.

One MOSFET is good for one bit. Group billions of these MOSFETs together, in silicon, and you've got a multi-gigabyte NAND flash chip.

The MOSFETs are organized into lines, and the lines into groups called pages. These days a page is usually 4KB in size. NAND flash can't be written to one bit at a time, it's written at the page level - so 4KB at a time. Once you write the data though, it's there for good. Erasing is a bit more complicated.

To coax the charge out of the MOSFETs requires a bit more effort and the way NAND flash works is that you can't discharge a single MOSFET, you have to erase in larger groups called blocks. NAND blocks are commonly 128 pages, that means if you want to re-write a page in flash you have to first erase it and all 127 adjacent pages first. And allow me to repeat myself: if you want to overwrite 4KB of data from a full block, you need to erase and re-write 512KB of data.

To make matters worse, every time you write to a flash page you reduce its lifespan. The JEDEC spec for MLC (multi-level cell) flash is 10,000 writes before the flash can start to fail.

Dealing with all of these issues requires that controllers get very crafty with how they manage writes. A good controller must split writes up among as many flash channels as possible, while avoiding writing to the same pages over and over again. It must also deal with the fact that some data is going to get frequently updated while others will remain stagnant for days, weeks, months or even years. It has to detect all of this and organize the drive in real time without knowing anything about how it is you're using your computer.

It's a tough job.

But not impossible.

Index Live Long and Prosper: The Logical Page
Comments Locked

295 Comments

View All Comments

  • GourdFreeMan - Tuesday, September 1, 2009 - link

    Yes, rewriting a cell will refill the floating gate with trapped electrons to the proper voltage level unless the gate has begun to wear out, so backing up your data, secure erasing your drive and copying the data back will preserve the life (within reason) of even drives that use minimalistic wear leveling to safeguard data. Charge retention is only a problem for users if they intend to use the drive for archival storage, or operate the drive at highly elevated temperatures.

    It is a bigger problem for flash engineers, however, and one of the reasons why MLC cannot be moved easily to more bits per cell without design changes. To store n-bits in a single cell you need 2^n separate energy levels to represent them, and thus each bit is only has approximately 1/(2^(n-1)) the amount of energy difference between states when compared to SLC using similar designs and materials.
  • Zheos - Tuesday, September 1, 2009 - link

    Man you seem to know a lot about what you're talking about :)

    Yeah now i understand why SSD for database and file storage server would be quite a bad idea.

    But for personal windows & everyday application storage, seems like a pure win to me if you can afford one :)

    I was only worried about its life-span but thankx to you and you're quick replys (and for the maths and technical stuff about how it realy work ;) im sold on the fact that i will buy one soon.

    The G2 from Intel seems like the best choice for now but I'll just wait and see how it's going when TRIM will become almost enable on every SSD and i'll make my decision there in a couple of months =)


  • GourdFreeMan - Wednesday, September 2, 2009 - link

    It isn't so much that SSDs make a bad storage server, but rather that you can't neglect to make periodic backups, as with any type of storage, if your data has great monetary or sentimental value. In addition to backups, RAID (1-6) is also an option if cost is no object and you want to use SSDs for long term storage in a running server. Database servers are a little more complicated, but SSDs can be an intelligent choice there as well if your usage patterns aren't continuous heavy small (i.e. <= 4K) writes.

    I plan on getting a G2 myself for my laptop after Intel updates the firmware to support TRIM and Anand reviews the effects in Windows 7, and I have already been using an Indilinx-based SLC drive in my home server.

    If you do anything that stresses your hard drive(s), or just like snappy boot times and application load times you will probably be impressed by the speeds of a new SSD. The cost per GB and lack of long term reliability studies are really the only things holding them back from taking the storage market by storm now.
  • ninevoltz - Thursday, September 17, 2009 - link

    GourdFreeMan could you please continue your explanation? I would like to learn more. You have really dived deeply into the physical properties of these drives.
  • GourdFreeMan - Tuesday, September 1, 2009 - link

    Minor correction to the second paragraph in my post above -- "each bit is only has" should read "each representation only has" in the last sentence.
  • philosofool - Monday, August 31, 2009 - link

    Nice job. This has been a great series.

    I'm getting a SSD once I can get one at $1/GB. I want a system/program files drive of at least 80GB and then a conventional HDD (a tenth of the cost/GB) for user data.

    Would keeping user data on a conventional HDD affect these results? It would seem like it wouldn't, but I would like to see the evidence.

    I would really like to see more benchmarks for these drives that aren't synthetic. Have you tried things like Crysis or The Witcher load times? (Both seemed to me to have pretty slow loads for maps.) I don't know if these would be affected, but as real world applications, I think it makes sense to try them out.
  • Anand Lal Shimpi - Monday, August 31, 2009 - link

    Personally I keep docs on my SSD but I keep pictures/music on a hard drive. Neither gets touched all that often in the grand scheme of things, but one is a lot smaller :)

    In The SSD Anthology I looked at Crysis load times. Performance didn't really improve when going to an SSD.

    Take care,
    Anand
  • Eeqmcsq - Monday, August 31, 2009 - link

    I would have thought that the read speed of an SSD would have helped cut down some of the compile time. Is there any tool that lets you analyze disk usage vs cpu usage during the compile time, to see what percentage of the compile was spent reading/writing to disk vs CPU processing?

    Is there any way you can add a temperature test between an HDD and an SSD? I read a couple of Newegg reviews that say their SSDs got HOT after use, though I think that may have just been 1 particular brand that I don't remember. Also, there was at least one article online that tested an SSD vs an HDD and the SSD ran a little warmer than the HDD.

    Also, garbage collection does have one advantage: It's OS independent. I'm still using Ubuntu 8.04 at work, and I'm stuck on 8.04 because my development environment WORKS, and I won't risk upgrading and destabilizing it. A garbage collecting SSD would certainly be helpful for my system... though your compiling tests are now swaying me against an SSD upgrade. Doh!

    And just for fun, have you thought about running some of your benchmarks on a RAM drive? I'd like to see how far SSDs and SATA have to go before matching the speed of RAM.

    Finally, any word from JMicron and their supposed update to the much "loved" JMF602 controller? I'd like to see some non-stuttering cheapo SSDs enter the market and really bring the $$$/GB down, like the Kingston V-series. Also, I'd like to see a refresh in the PATA SSD market.

    "Am I relieved to be done with this article? You betcha." And I give you a great THANK YOU!!! for spending the time working on it. As usual, it was a great read.
  • Per Hansson - Monday, August 31, 2009 - link

    Photofast have released Indilinx based PATA drives;
    http://www.photofastuk.com/engine/shop/category/G-...">http://www.photofastuk.com/engine/shop/category/G-...
  • aggressor - Monday, August 31, 2009 - link

    What ever happened to the price drops that OCZ announced when the Intel G2 drives came out? I want 128GB for $280!

Log in

Don't have an account? Sign up now