Putting Theory to Practice: Understanding the SSD Performance Degradation Problem

Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

  Our Hypothetical SSD
Page Size 4KB
Block Size 5 Pages (20KB)
Drive Size 1 Block (20KB
Read Speed 2 KB/s
Write Speed 1 KB/s

 

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.


Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.


The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.


Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

The Blind SSD Free Space to the Rescue
POST A COMMENT

235 Comments

View All Comments

  • siliq - Wednesday, April 01, 2009 - link

    With Anand's excellent article, it's clear that the sequential read/write thoroughput doesn't matter so much - all SSDs, even the notorious JMicron series, can do a good job on that metric. What is relevant to our daily use is the random write rate. Latencies and IOs/second are the most important metric in the realm of SSD.

    Based on that, I would suggest Anand (and other Tech reporters) to include a real world test of evaluating the Random Write performance for SSD. Because current real-world tests: booting windows, loading games, rendering 3D, etc. they focus on the random read. However, measuring how long it takes to install Windows, Microsoft Visual Studio, or a 4-GB PC Game would thoroughly test the Random Write / Latency performance. I think this is a good complementary of our current testing methodology
    Reply
  • Sabresiberian - Tuesday, March 31, 2009 - link

    Just wanted to add my thanks to Anand for this article in particular and for the quality work he has done over the years; I am so grateful for Anandtech's quality and information and the fact that it has been maintained! Reply
  • Sabresiberian - Tuesday, March 31, 2009 - link

    Oops didn't proof, sorry about the misspell Anand! Reply
  • hongmingc - Saturday, March 28, 2009 - link

    Anand, This is a great Article and a good story too.
    The OCZ story caught my attention that a quick firmware upgrade make a big improvement. From my understanding that SSD system designers try to trade off Space, Speed, and Durability (Also SSD :)) due the nature of NAND flash.
    We can clearly see the trade off of Space and Speed when SSD is getting more full the slower the speed (This is due to out-of-place write to increase the write operation and a block reclaim routine). However, Speed is also sacrificed to achieve the Durability (by doing wear leveling). Remember SLC nand's life time is about 100K write, while MLC nand has only about 10K write. Without considering doing wear leveling to improve the life cycle of the SSD, the firmware can be much simple and easy which will improve the write operation speed quite a bit.
    I echo you that the performance test should reflect user's daily usage which can be small size files write and may not be 80% full.
    However, users may be more concern about the Durability, the life cycle of the SSD.
    Is there such a test? How long will the black box OCZ Vertex live?
    How long will the regular OCZ Vertex live? and How long will the X25 live?
    Reply
  • antcasq - Sunday, April 05, 2009 - link

    This article was excellent, explaining several issues regarding performance.

    It would be great if the next article abou ssd addresses durability and reliability.

    My main concert is the swap partition (Linux) or virtual memory file (Windows). I found an post in another website saying that this is not an issue. Is it true? I find it hard to believe. Maybe in a real world test/scenario the problem will arise.
    http://robert.penz.name/137/no-swap-partition-jour...">http://robert.penz.name/137/no-swap-partition-jour...

    I hope AnandTech can take my concerns into consideration.

    Best regards
    Reply
  • stilz - Friday, March 27, 2009 - link

    This is the first hardware review I've read from start to finish, and the time is well worth the information you've provided.

    Thank you for your honest, professional and knowledgeable work. Also kudos to OCZ, I'll definitely consider the Vertex while making purchases.
    Reply
  • Bytales - Friday, March 27, 2009 - link

    As i read the article, i'm thinking of ways to slow down the down the degrading process. Intel is gonna ship x-25m 320gb this year. If i buy this drive and use it as an OS drive, i will obviously won't need the whole 320GB. Say i would need only 40 to 50 GB. I can make a secure erase (if the drive isn't new), made a partition of 50GB, and leave the remaining space unpartitioned. Will that solve the problem in any way ?
    Another way to solve the problem, would be a method inside the OS. The OS could use a user controlled % of the RAM memory, as a cache for those small 4kb files. Since ram reads and writes are way faster, i think it will also help. Say you got 8GB ram, and use 2gb for this purpose, and then the OS would only have 6gb ram for its use, while 2gb is used for these smaller files. That would increase also the lifespan of the SSD. Can this be possible ?
    Reply
  • Hellfire26 - Thursday, March 26, 2009 - link

    In reference to SSD's, I have read a lot of articles and comments about improved firmware and operating system support. I hope manufacturers don't forget about the on-board RAID controller.

    From the articles and comments made by users around the web, who have tested SSD's in a Raid 0 configuration, I believe that two Intel X25-M SSD's in a RAID 0 configuration would more than saturate current on-board RAID controllers.

    Intel is doing a die shrink of the NAND memory that is going into their SSD's come this fall. I would expect these new Intel SSD's to show faster read and write times. Other manufacturers will also find ways to increase the speed of their SSD's.

    SSD's scale well in a RAID configuration. It would be a shame if the on-board RAID controller limited our throughput. The alternative would be very expensive add-in RAID cards.
    Reply
  • FlaTEr1C - Wednesday, March 25, 2009 - link

    Anand, once again you wrote an article that no one else could've written. This is why I'm reading this site since 2004 and will always do. Your articles and reviews are without exception unique and a must-read. Thank you for this thorough background, analysis and review of SSD.

    I was looking a long time for a solution to make my desktop experience faster and I think I'll order a 60GB Vertex. 200€(germany) is still a lot of money but it will be worth it.

    Once again, great work Anand!
    Reply
  • blackburried - Wednesday, March 25, 2009 - link

    It's referred to as "discard" in the kernel functions.

    It works very well w/ SSD's that support TRIM, like fusion-io's drives.
    Reply

Log in

Don't have an account? Sign up now