The Anatomy of an SSD

Let’s meet Mr. N-channel MOSFET again:

Say Hello

This is the building block of NAND-flash; one transistor is required per cell. A single NAND-flash cell can either store one or two bits of data. If it stores one, then it’s called a Single Level Cell (SLC) flash and if it stores two then it’s a Multi Level Cell (MLC) flash. Both are physically made the same way; in fact there’s nothing that separates MLC from SLC flash, it’s just a matter of how the data is stored in and read from the cell.

SLC flash (left) vs. MLC flash (right)

Flash is read from and written to in a guess-and-test fashion. You apply a voltage to the cell and check to see how it responds. You keep increasing the voltage until you get a result.

  SLC NAND flash MLC NAND flash
Random Read 25 µs 50 µs
Erase 2ms per block 2ms per block
Programming 250 µs 900 µs


With four voltage levels to check, MLC flash takes around 3x longer to write to as SLC. On the flip side you get twice the capacity at the same cost. Because of this distinction, and the fact that even MLC flash is more than fast enough for a SSD, you’ll only see MLC used for desktop SSDs while SLC is used for enterprise level server SSDs.

Cells are strung together in arrays as depicted in the image to the right

So a single cell stores either one or two bits of data, but where do we go from there? Groups of cells are organized into pages, the smallest structure that’s readable/writable in a SSD. Today 4KB pages are standard on SSDs.

Pages are grouped together into blocks; today it’s common to have 128 pages in a block (512KB in a block). A block is the smallest structure that can be erased in a NAND-flash device. So while you can read from and write to a page, you can only erase a block (128 pages at a time). This is where many of the SSD’s problems stem from, I’ll repeat this again later because it’s one of the most important parts of understanding SSDs.

Arrays of cells are grouped into a page, arrays of pages are grouped into blocks

Blocks are then grouped into planes, and you’ll find multiple planes on a single NAND-flash die.

The combining doesn’t stop there; you can usually find either one, two or four die per package. While you’ll see a single NAND-flash IC, there may actually be two or four die in that package. You can also stack multiple ICs on top of each other to minimize board real estate usage.


Hey, There’s an Elephant in the Room Strength in Numbers, What makes SSDs Fast


View All Comments

  • SkullOne - Wednesday, March 18, 2009 - link

    Fantastic article. Definitely one of the best I've read in a long time. Incredibly informative. Everyone who reads this article is a little bit smarter afterwards.

    All the great information about SSDs aside, I think the best part though is how OCZ is willing to take blame for failure earlier and fix the problems. Companies like that are the ones who will get my money in the future especially when it is time for me to move from HDD to SSD.
  • Apache2009 - Wednesday, March 18, 2009 - link

    i got one Vertex SSD. Why suspend will cause system halt ? My laptop is nVidia chipset and it is work fine with HDD. Somebody know it ? Reply
  • MarcHFR - Wednesday, March 18, 2009 - link


    You wrote that there is spare-area on X25-M :

    "Intel ships its X25-M with 80GB of MLC flash on it, but only 74.5GB is available to the user"

    It's a mistake. 80 GB of Flash look like 74.5GB for the user because 80,000,000,000 bytes of flash is 74.5 Go for the user point of view (with 1 KB = 1024 byte).

    You did'nt point out the other problem of the X25-M : LBA "optimisation". After doing a lot of I/O random write the speed in sequential write can get down to only 10 MB /s :/
  • Kary - Thursday, March 19, 2009 - link

    The extra space would be invisible to the end user (it is used internally)

    Also, addressing is normally done in a result actual sizes are typically in binary in memory devices (flash, RAM...):

    80 GB...not compatible with binary addressing

    (though 48GB of a 128GB drive being used for this seems pretty high)
  • ssj4Gogeta - Wednesday, March 18, 2009 - link

    Did you bother reading the article? He pointed out that you can get any SSD (NOT just Intel's) stuck into a situation when only a secure erase will help you out. The problem is not specific to Intel's SSD, and it doesn't occur during normal usage. Reply
  • MarcHFR - Wednesday, March 18, 2009 - link

    The problem i've pointed out has nothing to do with the performance dregradation related to the write on a filled page, it's a performance degradation related to an LBA optimisation that is specific to Intel SSD.
  • VaultDweller - Wednesday, March 18, 2009 - link

    So where would Corsair's SSD fit into this mix? It uses a Samsung MLC controller... so would it be comparable to the OCZ Summit? I would expect not since the rated sequential speeds on the Corsair are tremendously lower than the Summit, but the Summit is the closest match in terms of the internals. Reply
  • kensiko - Wednesday, March 18, 2009 - link

    No, OCZ Summit = newest Samsung controller. The Corsair use the previous controller, smaller performance. Reply
  • VaultDweller - Wednesday, March 18, 2009 - link

    So what's the difference?

    The Summit is optimized for sequential performance at the cost of random I/O, as per the article. That is clearly not the case with the Corsair drive, so how does the Corsair hold up in terms of random I/O? That's what I'm interested in, since the sequential on the Corsair is "fast enough" if the random write performance is good.
  • jatypc - Wednesday, March 18, 2009 - link

    A detailed description of how SSDs operate makes me wonder: Imagene hypothetically I have a SSD drive that is filled from more than 90% (e.g., 95%) and those 90% are read-only things (or almost read-only things such as exe and other application files). The remaining 10% is free or frequently written to (e.g., page/swap file). Then the use of drive results - from what I understood in the article - in very fast aging of those 10% of the SSD disk because the 90% are occupied by read-only stuff. If the disk in question has for instance 32GB, those 10% are 3.2 GB (e.g., a size of a usual swap file) and after writing it approx. 10000 times, the respective part of the disk would become dead. Being occupies by a swap file, this number of reads/writes can be achieved in one or two years... Am I right? Reply

Log in

Don't have an account? Sign up now