The Cleaning Lady and Write Amplification

Imagine you’re running a cafeteria. This is the real world and your cafeteria has a finite number of plates, say 200 for the entire cafeteria. Your cafeteria is open for dinner and over the course of the night you may serve a total of 1000 people. The number of guests outnumbers the total number of plates 5-to-1, thankfully they don’t all eat at once.

You’ve got a dishwasher who cleans the dirty dishes as the tables are bussed and then puts them in a pile of clean dishes for the servers to use as new diners arrive.

Pretty basic, right? That’s how an SSD works.

Remember the rules: you can read from and write to pages, but you must erase entire blocks at a time. If a block is full of invalid pages (files that have been overwritten at the file system level for example), it must be erased before it can be written to.

All SSDs have a dishwasher of sorts, except instead of cleaning dishes, its job is to clean NAND blocks and prep them for use. The cleaning algorithms don’t really kick in when the drive is new, but put a few days, weeks or months of use on the drive and cleaning will become a regular part of its routine.

Remember this picture?

It (roughly) describes what happens when you go to write a page of data to a block that’s full of both valid and invalid pages.

In actuality the write happens more like this. A new block is allocated, valid data is copied to the new block (including the data you wish to write), the old block is sent for cleaning and emerges completely wiped. The old block is added to the pool of empty blocks. As the controller needs them, blocks are pulled from this pool, used, and the old blocks are recycled in here.

IBM's Zurich Research Laboratory actually made a wonderful diagram of how this works, but it's a bit more complicated than I need it to be for my example here today so I've remade the diagram and simplified it a bit:

The diagram explains what I just outlined above. A write request comes in, a new block is allocated and used then added to the list of used blocks. The blocks with the least amount of valid data (or the most invalid data) are scheduled for garbage collection, cleaned and added to the free block pool.

We can actually see this in action if we look at write latencies:

Average write latencies for writing to an SSD, even with random data, are extremely low. But take a look at the max latencies:

While average latencies are very low, the max latencies are around 350x higher. They are still low compared to a mechanical hard disk, but what's going on to make the max latency so high? All of the cleaning and reorganization I've been talking about. It rarely makes a noticeable impact on performance (hence the ultra low average latencies), but this is an example of happening.

And this is where write amplification comes in.

In the diagram above we see another angle on what happens when a write comes in. A free block is used (when available) for the incoming write. That's not the only write that happens however, eventually you have to perform some garbage collection so you don't run out of free blocks. The block with the most invalid data is selected for cleaning; its data is copied to another block, after which the previous block is erased and added to the free block pool. In the diagram above you'll see the size of our write request on the left, but on the very right you'll see how much data was actually written when you take into account garbage collection. This inequality is called write amplification.


Intel claims very low write amplification on its drives, although over the lifespan of your drive a < 1.1 factor seems highly unlikely

The write amplification factor is the amount of data the SSD controller has to write in relation to the amount of data that the host controller wants to write. A write amplification factor of 1 is perfect, it means you wanted to write 1MB and the SSD’s controller wrote 1MB. A write amplification factor greater than 1 isn't desirable, but an unfortunate fact of life. The higher your write amplification, the quicker your drive will die and the lower its performance will be. Write amplification, bad.

Live Long and Prosper: The Logical Page Why SSDs Care About What You Write: Fragmentation & Write Combining
Comments Locked

295 Comments

View All Comments

  • kisjoink - Monday, August 31, 2009 - link

    "Intel doesn't need to touch the G1, the only thing faster than it is the G2."

    I've been an avid anandtech-reader since 1998, but this is the first time I'm commenting on an article. I just have to say how much I disagree with this quote!

    I've used the x25m since its launch (December 08) and at first I was very happy with the drive. Although really expensive, I got convinced to buy one after reading your first previews and review. And the performance was great, at least for the first 4-5 months. At that time I started noticing some 'hiccups' (system freeze). At first they were few and short. But over time they become more noticeable and now they're a real pain. Sometimes my system can freeze for more than 15 seconds. It usually happens when I edit a picture in photoshop, but it can also happen while writing something in Word, programming java in Eclipse or just surfing the web.

    The problem? I'm pretty sure its the Intel drive. After reading too many SSD-articles I immediately suspected the x25m when the I started noticing the hiccups. So I got used to running the "Windows Resource Monitor" in the background - studying the disk activity after every hiccup. Just take a look at this example (just started photoshop and did some light editing on a picture):
    http://img256.imageshack.us/img256/3073/hickup.jpg">http://img256.imageshack.us/img256/3073/hickup.jpg

    I'm sure there are many ways I could tune my system better. I've done a couple of things, like moved the internet temp folder to a mechanical drive etc. And the performance of the drive will probably recover if I do this special SSD-format - but it's a real pain to have to do complete OS installation 2-3 times a year when you claim it's possible for Intel to create a new firmware with TRIM-support. I mean - I really did pay premium price for this product (close to 800$ included VAT here in Norway for the 80GB version in December 08).

    So, to summarise - I got convinced to buy the drive after reading your articles (you write great reviews!) - and I understand that the problem that I (and others from what I've been reading on forums) is really difficult to recreate in a testing environment - but that doesn't mean that the problem doesn't exist. I just wish you could point this out. The expensive G1 has some really big performance issues that might force you to do a complete reinstall of your system a couple times a year - and although Intel could fix it they wont, because they have a new, better and cheaper product out - and people like me (altough we feel really screwed over by Intel) will buy their next device (as long as its the best device out there).
  • IntelUser2000 - Monday, August 31, 2009 - link

    Is that after you installed the firmware version 8820 or before?? That reduces the problem a lot unless you filled the drive to more than 70%.
  • kisjoink - Monday, August 31, 2009 - link

    Yes, I forgot to mention that - it's after I upgraded to the 8820 firmware. I don't think I've ever filled it up with more than 80%, usually I have about 30GB of free space
  • Anand Lal Shimpi - Monday, August 31, 2009 - link

    This is actually an interesting scenario that I've been investigating a bit myself. The 8820 firmware actually significantly changes the way the drive likes to store data compared to the original firmware. That's fine for a cleanly secure-erased drive, but what happens if you have data/fragmentation on the drive already?

    Every time you write to the drive the controller will look at the preferred state specified by the new firmware. It will see that your data is organized the way the old firmware liked it, but not the new firmware. Thus upon every...single...write it will try and reorganize the data until it gets to its happy state.

    I honestly have no idea how long this process will take, I can see it taking quite a bit of time but perhaps you could speed it up by writing a bunch of sequential files to fill up the drive? The safer bet would be to backup, secure erase and restore onto the drive. You shouldn't see it happen again.

    Think of it like this. I live in my house and I have everything organized a certain way. It takes me minimal time to find everything I need. Let's say tomorrow I leave my house and you move in. You look at how things are organized and it's quite different from how you like things setup. Whenever you go to grab a plate or book you try cleaning up a bit. Naturally it'll take a while before things get cleaned up and until then you won't be as quick as you're used to.

    Take care,
    Anand
  • jimhsu - Friday, September 11, 2009 - link

    The G2's I've discovered REALLY don't like to be filled up more than 80% or so. When I had 8GB free on the 80GB drive, seq write performance basically plummeted at random intervals (to levels like 30MB/s.) Random writes sometimes dropped down to 4MB/s. Now that I've freed 20GB and tried writing and deleting large ISO files to the drive, the performance is coming back slowly.
  • Dunk - Monday, August 31, 2009 - link

    Hi Anand,

    I'm blown away by your article series on SSD - absolutely fantastic.

    When new Intel firmware is launched with TRIM support for the G2, can I flash it without losing the drive and needing to reinstall everything?

    I'm happy using the out of the box MS driver for now in Win7, but would prefer to use Intel's TRIM version once available.

    Many thanks
    Duncan
  • Anand Lal Shimpi - Monday, August 31, 2009 - link

    If Intel follows the same pattern as what we saw with the G1's firmware update, you should be able to flash without destroying your data (although it's always a good idea to back up).

    Thank you for your comment :)

    Take care,
    Anand
  • mgrmgr - Monday, August 31, 2009 - link

    Okay, single X25-M G2s can be updated without losing data. But I am considering two 80GB drives in RAID-0 to overcome the sequential write slowdown with Photoshop. How will updating work for the RAID-0 pair?

    Do you have an opinion about using a RAID pair for Photoshop?
  • Noteleet - Monday, August 31, 2009 - link

    Fantastic article, I'm definitely planning on getting a SSD next time I upgrade.

    I'm fairly interested in seeing some reviews for the Solid 2. If OCZ can get the kinks worked out I think the Intel flash and the Indilinx controller would make a winning combination for price to performance.
  • Visual - Monday, August 31, 2009 - link

    Where does the drive store the mapping between logical and physical pages and other system data it needs to operate? Does it use the same memory where user data is stored? If so, doesn't it need to write-balance that map data as well? And if that's true, doesn't it need to have a map for the map written somewhere? How is that circular logic broken?

    Or does the drive have some small amount of higher-quality, more reliable, maybe single-level-cell based flash memory for its system data?

Log in

Don't have an account? Sign up now