Putting Theory to Practice: Understanding the SSD Performance Degradation Problem

Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

  Our Hypothetical SSD
Page Size 4KB
Block Size 5 Pages (20KB)
Drive Size 1 Block (20KB
Read Speed 2 KB/s
Write Speed 1 KB/s

 

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.


Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.


The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.


Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

The Blind SSD Free Space to the Rescue
Comments Locked

250 Comments

View All Comments

  • punjabiplaya - Wednesday, March 18, 2009 - link

    Great info. I'm looking to get an SSD but was put off by all these setbacks. Why should I put away my HDDS and get something a million times more expensive that stutters?
    This article is why I visit AT first.
  • Hellfire26 - Wednesday, March 18, 2009 - link

    Anand, when you filled up the drives to simulate a full drive, did you also write to the extended area that is reserved? If you didn't, wouldn't the Intel SLC drive (as an example) not show as much of a performance drop, versus the MLC drive? As you stated, Intel has reserved more flash memory on the SLC drive, above the stated SSD capacity.

    I also agree with GourdFreeMan, that the physical block size needs to be reduced. Due to the constant erasing of blocks, the Trim command is going to reduce the life of the drive. Of course, drive makers could increase the size of the cache and delay using the Trim command until the number of blocks to be erased equals the cache available. This would more efficiently rearrange the valid data still present in the blocks that are being erased (less writes). Microsoft would have to design the Trim command so it would know how much cache was available on the drive, and drive makers would have to specifically reserve a portion of their cache for use by the Trim command.

    I also like Basilisk's comment about increasing the cluster size, although if you increase it too big, you are likely to be wasting space and increasing overhead. Surely, even if MS only doubles the cluster size for NTFS partitions to 8KB's, write cycles to SSD's would be reduced. Also, There is the difference between 32bit and 64bit operating systems to consider. However, I don't have the knowledge to say whether Microsoft can make these changes without running into serious problems with other aspects of the operating system.
  • Anand Lal Shimpi - Wednesday, March 18, 2009 - link

    I only wrote to the LBAs reported to the OS. So on the 80GB Intel drive that's from 0 - 74.5GB.

    I didn't test the X25-E as extensively as the rest of the drives so I didn't look at performance degradation as closely just because I was running out of time and the X25-E is sooo much more expensive. I may do a standalone look at it in the near future.

    Take care,
    Anand
  • gss4w - Wednesday, March 18, 2009 - link

    Has anyone at Anandtech talked to Microsoft about when the "Trim" command will be supported in Windows 7. Also it would be great if you could include some numbers from Windows 7 beta when you do a follow-up.

    One reason I ask is that I searched for "Windows 7 ssd trim" and I saw a presentation from WinHEC that made it sound like support for the trim command would be a requirement for SSD drives to meet the Windows 7 logo requirements. I would think if this were the case then Windows 7 would have support for trim. However, this article made it sound like support for Trim might not be included when Windows 7 is initially released, but would be added later.

  • ryedizzel - Thursday, March 19, 2009 - link

    I think it is obvious that Windows 7 will support TRIM. The bigger question this article points out is whether or not the current SSDs will be upgradeable via firmware- which is more important for consumers wanting to buy one now.
  • Martimus - Wednesday, March 18, 2009 - link

    It took me an hour to read the whole thing, but I really enjoyed it. It reminded me of the time I spent testing circuitry and doing root cause analysis.
  • alpha754293 - Wednesday, March 18, 2009 - link

    I think that it would be interesting if you were to be able to test the drives for the "desktop/laptop/consumer" front by writing a 8 GiB file using 4 kiB block sizes, etc. for the desktop pattern and also to test the drive then with a larger sizes and larger block size for the server/workstation pattern as well.

    You present some very very good arguments and points, and I found your article to be thoroughly researched and well put.

    So I do have to commend you on that. You did an excellent job. It is thoroughly enjoyable to read.

    I'm currently looking at a 4x 256 GB Samsung MLC on Solaris 10/ZFS for apps/OS (for PXE boot), and this does a lot of the testing; but I would be interested to see how it would handle more server-type workloads.
  • korbendallas - Wednesday, March 18, 2009 - link

    If The implementation of the Trim command is as you described here, it would actually kind of suck.

    "The third step was deleting the original 4KB text file. Since our drive now supports TRIM, when this deletion request comes down the drive will actually read the entire block, remove the first LBA and write the new block back to the flash:"

    First of all, it would create a new phenomenon called Erase Amplification. This would negatively impact the lifetime of a drive.

    Secondly, you now have worse delete performance.


    Basically, an SSD 4kB block can be in 3 different states: erased, data, garbage. A block enters the garbage state when a block is "overwritten" or the Trim command marks the contents as invalid.

    The way i would imagine it working, marking block content as invalid is all the Trim command does.

    Instead the drive will spend idle time finding the 512kB pages with the most garbage blocks. Once such a page is found, all the data blocks from that page would be copied to another page, and the page would be erased. Doing it in this way maximizes the number of garbage blocks being converted to erased.
  • alpha754293 - Wednesday, March 18, 2009 - link

    BTW...you might be able to simulate the drive as well using Cygwin where you go to the drive and run the following:

    $ dd if=/dev/random of=testfile bs=1024k count=76288

    I'm sure that you can come up with fancier shell scripts and stuff that uses the random number generator for the offsets (and if you really want it to work well, partition it so that when it does it, it takes up the entire initial 74.5 GB partition, and when you're done "dirtying" the data using dd and offset in a random pattern, grow the partition to take up the entire disk again.)

    Just as a suggestion for future reference.

    I use parts of that to some (varying) degree for when I do my file/disk I/O subsystem tests.
  • nubie - Wednesday, March 18, 2009 - link

    I should think that most "performance" laptops will come with a Vertex drive in the near future.

    Finally a performance SSD that comes near mainstream pricing.

    Things are looking up, if more manufacturers get their heads out of the sand we should see prices drop as competition finally starts breeding excellence.

Log in

Don't have an account? Sign up now