Putting Theory to Practice: Understanding the SSD Performance Degradation Problem

Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

  Our Hypothetical SSD
Page Size 4KB
Block Size 5 Pages (20KB)
Drive Size 1 Block (20KB
Read Speed 2 KB/s
Write Speed 1 KB/s

 

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.


Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.


The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.


Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

The Blind SSD Free Space to the Rescue
POST A COMMENT

235 Comments

View All Comments

  • Natfly - Wednesday, March 18, 2009 - link

    Reply
  • DangerMouse4269 - Tuesday, April 13, 2010 - link

    Nicely written. Even a very out of practice Comp Eng understood that. Reply
  • geekforhire - Monday, June 14, 2010 - link

    I have just replaced the hard drive in this 3 year old Dell Inspiron 9400 notebook computer with a new and very quick OCZ SSD, manually configured the partition with a 1024 offset, freshly installed the OS, freshly downloaded all of the latest and greatest drivers from Dell, and applied all currently available OS updates from Msft.

    The problem is that when the machine resumes from Standby, it will /reliably/ (4 out of 4 attempts) produce a BSOD 0xF4 after the power button is pressed to resume the machine from standby.

    Here's the sequence to recreate the problem:

    0) Machine is booted normally into Windows, and log in to an account which has administrative privs.
    1) Click on Start -> Shut Down -> Standby.
    2) See display turn black, disk I/O light flashes then stops, then the power indicator light begins to flash on and off slowly.
    3) Wait until the power light has made 2 slow flashes.
    4) Press the power button.
    5) See the Dell Bios splash screen, then disappear
    6) Boom: See the BSOD 0xF4

    The values reported after the STOP are:
    (0x00000003, 0x865b3020, 0x865b3194, 0x805d2954)

    Note that I've been in contact with OCZ before about this SSD+computer, because the previous BSOD that was produced was 0x77. Their recommendation was to create the partition with an offset with a 64 interval, and to reflash the SSD with their modern firmware. This was done, the OS was reinstalled as described, and now I'm getting a different BSOD code. Another mention was a question whether the notebook computer uses a SATA2 controller (definitely compatible) or SATA1 (which may have troubles).

    I've run Spinrite on the SSD, and there are lots of ECC errors being reported. I've been in contact with Spinrite, and they chalk this up to the SSD being chatty (which they like), but since SSD's are new and magnetic disks are common, they want to stay focussed on magnetic disks.

    When the machine boots back up, the OS reports that a serious error has occurred, and asks that a problem report be submitted, which I do. Then an attractive but somewhat generic page is displayed with common causes (Aging or failing hard disks, large file transfers from secondary media to local hd, loss of power to a hard drive, hard disk intensive processes (eg: antivirus scanners), recently installed hardware that might have compatibility and performance problems)

    Has anyone else encountered this kind of problem, and do you have any suggestions?
    Reply
  • angavar - Thursday, September 09, 2010 - link

    As a medical student I can appreciate a well researched and analytical article when I see it. This is by far the best computer hardware review I have ever read! Thank-you for your time and effort in producing what is clearly a thoroughly researched and detailed analysis. Reply
  • mac021 - Wednesday, October 17, 2012 - link

    Thank you for the lesson and helping me understand SSD drives. May I just ask for your advice...

    For everyday use designing and generating prototypes for websites and running typical office s/w like word and excel for long documentations while listening to music or just having some video play in the background, then the occasional gaming of, say Star Craft 2 and Dead Space 3, and lets assume I do this on a 5 hours a day average for 365 days in a year, how long before I need to replace an OCZ Vertex/Summit SSD? And does format/reinstall help in prolonging the life of an SSD just as it does for my old hard drives (from a computer that's 6 years old and counting)? Or there's no stopping the SSD's death after reaching 10,000 times of being erased and rewritten on? I'm not one who keeps upgrading or buying new computer systems for every new thing that comes out, i'm more of a keeper and maintainer for as long as the system servers my needs... but when I make a purchase, I make sure it will be enough to last me another 6-12 years IF possible! Which is why I'm still considering SATA for my next purchase late this year or early next year (and I'm only buying a new PC just because I made a mistake buying a foxconn motherboard that can't support anything higher than XP, not even Vista... weird, anyway I found that out too late).

    Also, would you know of a motherboard that supports SSD, Windows 8, Nvidea, third gen i5/i7, and up to 64GB ram?

    Thanks so much!
    Reply

Log in

Don't have an account? Sign up now