Putting Theory to Practice: Understanding the SSD Performance Degradation Problem

Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

  Our Hypothetical SSD
Page Size 4KB
Block Size 5 Pages (20KB)
Drive Size 1 Block (20KB
Read Speed 2 KB/s
Write Speed 1 KB/s

 

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.


Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.


The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.


Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

The Blind SSD Free Space to the Rescue
Comments Locked

250 Comments

View All Comments

  • SkullOne - Wednesday, March 18, 2009 - link

    Fantastic article. Definitely one of the best I've read in a long time. Incredibly informative. Everyone who reads this article is a little bit smarter afterwards.

    All the great information about SSDs aside, I think the best part though is how OCZ is willing to take blame for failure earlier and fix the problems. Companies like that are the ones who will get my money in the future especially when it is time for me to move from HDD to SSD.
  • Apache2009 - Wednesday, March 18, 2009 - link

    i got one Vertex SSD. Why suspend will cause system halt ? My laptop is nVidia chipset and it is work fine with HDD. Somebody know it ?
  • MarcHFR - Wednesday, March 18, 2009 - link

    Hi,

    You wrote that there is spare-area on X25-M :

    "Intel ships its X25-M with 80GB of MLC flash on it, but only 74.5GB is available to the user"

    It's a mistake. 80 GB of Flash look like 74.5GB for the user because 80,000,000,000 bytes of flash is 74.5 Go for the user point of view (with 1 KB = 1024 byte).

    You did'nt point out the other problem of the X25-M : LBA "optimisation". After doing a lot of I/O random write the speed in sequential write can get down to only 10 MB /s :/
  • Kary - Thursday, March 19, 2009 - link

    The extra space would be invisible to the end user (it is used internally)

    Also, addressing is normally done in binary..as a result actual sizes are typically in binary in memory devices (flash, RAM...):
    64gb
    128gb

    80 GB...not compatible with binary addressing

    (though 48GB of a 128GB drive being used for this seems pretty high)
  • ssj4Gogeta - Wednesday, March 18, 2009 - link

    Did you bother reading the article? He pointed out that you can get any SSD (NOT just Intel's) stuck into a situation when only a secure erase will help you out. The problem is not specific to Intel's SSD, and it doesn't occur during normal usage.
  • MarcHFR - Wednesday, March 18, 2009 - link

    The problem i've pointed out has nothing to do with the performance dregradation related to the write on a filled page, it's a performance degradation related to an LBA optimisation that is specific to Intel SSD.
  • VaultDweller - Wednesday, March 18, 2009 - link

    So where would Corsair's SSD fit into this mix? It uses a Samsung MLC controller... so would it be comparable to the OCZ Summit? I would expect not since the rated sequential speeds on the Corsair are tremendously lower than the Summit, but the Summit is the closest match in terms of the internals.
  • kensiko - Wednesday, March 18, 2009 - link

    No, OCZ Summit = newest Samsung controller. The Corsair use the previous controller, smaller performance.
  • VaultDweller - Wednesday, March 18, 2009 - link

    So what's the difference?

    The Summit is optimized for sequential performance at the cost of random I/O, as per the article. That is clearly not the case with the Corsair drive, so how does the Corsair hold up in terms of random I/O? That's what I'm interested in, since the sequential on the Corsair is "fast enough" if the random write performance is good.
  • jatypc - Wednesday, March 18, 2009 - link

    A detailed description of how SSDs operate makes me wonder: Imagene hypothetically I have a SSD drive that is filled from more than 90% (e.g., 95%) and those 90% are read-only things (or almost read-only things such as exe and other application files). The remaining 10% is free or frequently written to (e.g., page/swap file). Then the use of drive results - from what I understood in the article - in very fast aging of those 10% of the SSD disk because the 90% are occupied by read-only stuff. If the disk in question has for instance 32GB, those 10% are 3.2 GB (e.g., a size of a usual swap file) and after writing it approx. 10000 times, the respective part of the disk would become dead. Being occupies by a swap file, this number of reads/writes can be achieved in one or two years... Am I right?

Log in

Don't have an account? Sign up now