The Cleaning Lady and Write Amplification

Imagine you’re running a cafeteria. This is the real world and your cafeteria has a finite number of plates, say 200 for the entire cafeteria. Your cafeteria is open for dinner and over the course of the night you may serve a total of 1000 people. The number of guests outnumbers the total number of plates 5-to-1, thankfully they don’t all eat at once.

You’ve got a dishwasher who cleans the dirty dishes as the tables are bussed and then puts them in a pile of clean dishes for the servers to use as new diners arrive.

Pretty basic, right? That’s how an SSD works.

Remember the rules: you can read from and write to pages, but you must erase entire blocks at a time. If a block is full of invalid pages (files that have been overwritten at the file system level for example), it must be erased before it can be written to.

All SSDs have a dishwasher of sorts, except instead of cleaning dishes, its job is to clean NAND blocks and prep them for use. The cleaning algorithms don’t really kick in when the drive is new, but put a few days, weeks or months of use on the drive and cleaning will become a regular part of its routine.

Remember this picture?

It (roughly) describes what happens when you go to write a page of data to a block that’s full of both valid and invalid pages.

In actuality the write happens more like this. A new block is allocated, valid data is copied to the new block (including the data you wish to write), the old block is sent for cleaning and emerges completely wiped. The old block is added to the pool of empty blocks. As the controller needs them, blocks are pulled from this pool, used, and the old blocks are recycled in here.

IBM's Zurich Research Laboratory actually made a wonderful diagram of how this works, but it's a bit more complicated than I need it to be for my example here today so I've remade the diagram and simplified it a bit:

The diagram explains what I just outlined above. A write request comes in, a new block is allocated and used then added to the list of used blocks. The blocks with the least amount of valid data (or the most invalid data) are scheduled for garbage collection, cleaned and added to the free block pool.

We can actually see this in action if we look at write latencies:

Average write latencies for writing to an SSD, even with random data, are extremely low. But take a look at the max latencies:

While average latencies are very low, the max latencies are around 350x higher. They are still low compared to a mechanical hard disk, but what's going on to make the max latency so high? All of the cleaning and reorganization I've been talking about. It rarely makes a noticeable impact on performance (hence the ultra low average latencies), but this is an example of happening.

And this is where write amplification comes in.

In the diagram above we see another angle on what happens when a write comes in. A free block is used (when available) for the incoming write. That's not the only write that happens however, eventually you have to perform some garbage collection so you don't run out of free blocks. The block with the most invalid data is selected for cleaning; its data is copied to another block, after which the previous block is erased and added to the free block pool. In the diagram above you'll see the size of our write request on the left, but on the very right you'll see how much data was actually written when you take into account garbage collection. This inequality is called write amplification.


Intel claims very low write amplification on its drives, although over the lifespan of your drive a < 1.1 factor seems highly unlikely

The write amplification factor is the amount of data the SSD controller has to write in relation to the amount of data that the host controller wants to write. A write amplification factor of 1 is perfect, it means you wanted to write 1MB and the SSD’s controller wrote 1MB. A write amplification factor greater than 1 isn't desirable, but an unfortunate fact of life. The higher your write amplification, the quicker your drive will die and the lower its performance will be. Write amplification, bad.

Live Long and Prosper: The Logical Page Why SSDs Care About What You Write: Fragmentation & Write Combining
Comments Locked

295 Comments

View All Comments

  • CList - Tuesday, September 1, 2009 - link

    Don't be disgusted at Newegg, be disgusted at the people who are willing to pay the premium price! Newegg is simply playing a reactionary role in the course of natural free-market economics and cannot be blamed. The consumers, on the other hand, are willing participants and are choosing to pay those prices. When no one is left who is willing to pay those prices, Newegg will quickly lower them.

    Cheers,
    CList
  • gfody - Tuesday, September 1, 2009 - link

    I don't understand how consumers have any control over what Newegg is charging for the 160gb that's not even in stock yet.

    If Newegg wants to get the absolute most anyone is willing to pay for every piece of merchandise they may as well just move to an auction format.
  • DrLudvig - Tuesday, September 1, 2009 - link

    Yeah, if you look at intel's website, http://www.intel.com/cd/channel/reseller/asmo-na/e...">http://www.intel.com/cd/channel/reselle...na/eng/p..., you will se that the R5 includes "3.5" desktop drive bay adapter to 2.5" SSD adapter bracket, screws, installation guide, and warranty documentation.
    Why on earth Newegg is charging that much more for it i really don't know, here in denmark the R5 retails for about 15 bucks more than the C1.. Which really isn't that bad..
  • Mr Perfect - Tuesday, September 1, 2009 - link

    Whoa. That's it? An adapter kit? With that kind of price difference, I expected it to be the D0 stepping of SSDs or something.

    Thanks for clearing that up.
  • NA1NSXR - Monday, August 31, 2009 - link

    The reason not being that performance or longevity is not good enough, but because improvements are still coming too quickly, and prices falling fast still. Once the frequency of significant improvements and price drops slow down, I will more seriously consider an SSD. I suppose it depends on how much waiting on the I/O you do though. For me, it is not so much that a Velociraptor is intolerable.
  • bji - Tuesday, September 1, 2009 - link

    Perhaps this is what you meant, but you should really clarify. It's still not time for YOU to buy an SSD. SSDs represent an incredible performance improvement that is well worth the money for many people.
  • DragonReborn - Monday, August 31, 2009 - link

    say i wanted to go crazy (it happens)...should i get two 80gb intel g2's or the 160gb intel g2? same space...is the RAID 0 performance worth it?

    i have all my important data backed on a big 2tb drive so the two ssd's (or 1 160gb) will just hold my OS/progs/etc.

    thoughts?
  • kensiko - Monday, August 31, 2009 - link

    I would say that in real world usage, you won't notice a huge difference between RAID and not RAID, SSD are already fast enough for the rest of the system. Also, TRIM may not work for now in RAID configuration.

    Just look at Windows Start up, no difference between Gen2 SSD!
  • Gc - Monday, August 31, 2009 - link

    This is a nice article, but the numbers leave an open question.
    What is Samsung doing right? Multiprocess/multithread performance?

    The article finds Samsung drives performance is low on 2MB reads,

    (new 2MB sequential reads not given, assume same as 'used')
    used 2MB sequential reads (low rank, 79% of top)

    good on 2MB writes:

    new 2MB sequential writes (middle rank, 89% of top)
    used 2MB sequential writes (2nd place, 91% of top)

    and horrible on 4KB random files:

    (new 4KB random reads not given, assume same as 'used')
    used 4KB random read (bottom ssd ranked, only 36% of top)
    new 4KB random write (low rank, only 9% of top)
    used 4KB random write (bottom ssd ranked, only 3% of top, < HD)

    Yet somehow in the multitasking Productivity test and Gaming test, it was surprisingly competitive:

    multitasking productivity (mid-high rank, 88% of top)
    gaming (mid-high rank, 95% of top)

    The productivity test is described as "four tasks going on at once, searching through Windows contacts, searching through Windows Mail, browsing multiple webpages in IE7 and loading applications". In other words, nearly all READS (except maybe for occasionally writing to disk new items for the browser history or cache).

    The gaming test is described as "reading textures and loading level data", again nearly all READS.

    Q. Given that the Samsung controller's 2MB read performance and
    4KB read performance are both at the bottom of the pack, how
    did it come out so high in the read-mostly productivity test
    and gaming test?

    Does this indicate the Samsung controllers might be better than Indilinx for multiprocess/multithreaded loads?

    (The Futuremark pdf indicates Productivity 2 is the only test with 4 simultaneous tasks, and doesn't say whether the browser tabs load concurrently. The Gaming 2 test is multithreaded with up to 16 threads. [The Samsung controller also ranks well on the communications test, but that may be explained: Communications 1 includes encryption and decompression tasks where Samsung's good sequential write performance might shine.])

    Since many notebooks/laptops are used primarily for multitasking productivity (students, "office"-work), maybe the Samsung was a reasonable choice for notebook/laptop OEMs. Also, in these uses the cpu and drive are idle much of the time, so the Samsung best rank on idle power looks good. (But inability to upgrade firmware is bad.)

    (The article doesn't explain what the load was in the load drive test, though it says the power drops by half if the test is switched to random writes; maybe it was sequential writes for peak power consumption. It would have been helpful to see the power consumption rankings for read-mostly loads.)

    Thanks!
  • rcocchiararo - Monday, August 31, 2009 - link

    Your prices are way off, newegg is charging ludicrous ammounts right now :(

    also, the 128 agility was 269 last week, i was super exited, then it went back to 329, and its now 309.

Log in

Don't have an account? Sign up now