The Secret Sauce: 0.5x Write Amplification

The downfall of all NAND flash based SSDs is the dreaded read-modify-write scenario. I’ve explained this a few times before. Basically your controller goes to write some amount of data, but because of a lot of reorganization that needs to be done it ends up writing a lot more data. The ratio of how much you write to how much you wanted to write is write amplification. Ideally this should be 1. You want to write 1GB and you actually write 1GB. In practice this can be as high as 10 or 20x on a really bad SSD. Intel claims that the X25-M’s dynamic nature keeps write amplification down to a manageable 1.1x. SandForce says its controllers write a little less than half what Intel does.

SandForce states that a full install of Windows 7 + Office 2007 results in 25GB of writes to the host, yet only 11GB of writes are passed on to the drive. In other words, 25GBs of files are written and available on the SSD, but only 11GB of flash is actually occupied. Clearly it’s not bit-for-bit data storage.

What SF appears to be doing is some form of real-time compression on data sent to the drive. SandForce told me that it’s not strictly compression but a combination of several techniques that are chosen on the fly depending on the workload.

SandForce referenced data deduplication as a type of data reduction algorithm that could be used. The principle behind data deduplication is simple. Instead of storing every single bit of data that comes through, simply store the bits that are unique and references to them instead of any additional duplicates. Now presumably your hard drive isn’t full of copies of the same file, so deduplication isn’t exactly what SandForce is doing - but it gives us a hint.

Straight up data compression is another possibility. The idea behind lossless compression is to use fewer bits to represent a larger set of bits. There’s additional processing required to recover the original data, but with a fast enough processor (or dedicated logic) that part can be negligible.

Assuming this is how SandForce works, it means that there’s a ton of complexity in the controller and firmware. Much more than what even a good SSD controller needs to deal with. Not only does SandForce have to manage bad blocks, block cleaning/recycling, LBA mapping and wear leveling, but it also needs to manage this tricky write optimization algorithm. It’s not a trivial matter, SandForce must ensure that the data remains intact while tossing away nearly half of it. After all, the primary goal of storage is to store data.

The whole write-less philosophy has tremendous implications for SSD performance. The less you write, the less you have to worry about garbage collection/cleaning and the less you have to worry about write amplification. This is how the SF controllers get by without having any external DRAM, there’s just no need. There are fairly large buffers on chip though, most likely on the order of a couple of MBs (more on this later).

Manufacturers are rarely honest enough to tell you the downsides to their technologies. Representing a collection of bits with a fewer number of bits works well if you have highly compressible data or a ton of duplicates. Data that is already well compressed however, shouldn’t work so nicely with the DuraWrite engine. That means compressed images, videos or file archives will most likely exhibit higher write amplification than SandForce’s claimed 0.5x. Presumably that’s not the majority of writes your SSD will see on a day to day basis, but it’s going to be some portion of it.

Enter the SandForce Controlling Costs with no DRAM and Cheaper Flash
POST A COMMENT

102 Comments

View All Comments

  • fertilizer - Tuesday, January 05, 2010 - link

    First of all, my complements to a great article!
    It provided me with great insight!

    It seems to me that SSD manufacturers are spending a lot of time complying to the world of HDD based Operating Systems.
    Would'nt it be time to get OS's to treat a SSD differently than a HDD?
    Reply
  • j718 - Tuesday, January 05, 2010 - link

    the ocz vertex ex is an slc drive, not mlc as shown in the charts. Reply
  • j718 - Tuesday, January 05, 2010 - link

    whoops, sorry, it's just the anandtech storage bench charts that have the ex mislabeled.
    Reply
  • Donald99 - Monday, January 04, 2010 - link

    Any thoughts on potential energy use in mobile environment? Compared to intel MLC. Still better energy efficiencey than a traditional drive?
    Performance results seem uber.
    Reply
  • cliffa3 - Monday, January 04, 2010 - link

    Anand,

    Great article, will be an interesting technology to watch and see how mature it really is.

    Question on the timeline for the price drop: When you said 'we'll see 160GB down at $225', were you talking about the mid-year refresh or the end of year next-gen?
    Reply
  • MadMan007 - Monday, January 04, 2010 - link

    Is it just me or is it inaccurate to mix GB and GiB when calculating overprovisioning at the bottom of page 5? By my reckoning the overprovisioning should be 6.6% (64GB/60GB, 128GB/120GB) not double that from using (64GB/55.9GiB etc) Reply
  • vol7ron - Monday, January 04, 2010 - link

    Anand, the right column of the table should be marked as GiB.

    The last paragraph should take that into consideration. Either the second column should first be converted into GiB, or if it already is (and hard to believe it is), then you could do direct division from there.

    The new table:
    Adv.(GB) Tot.(GB) Tot.(GiB) User(GiB)
    50 64 59.6 46.6
    100 128 119.2 93.1
    200 256 238.4 186.3
    400 512 476.8 372.5

    The new percentages should be:
    (59.6-46.6) / 59.6 x 100 = 21.8% decrease
    (119.2-93.1) / 119.2 x 100 = 21.9% decrease
    (238.4-186.3) / 238.4 x 100 = 21.9% decrease
    (476.8-372.5) / 476.8 x 100 = 21.9% decrease


    And the second table:
    Adv.(GB) Tot.(GB) Tot.(GiB) User(GiB)
    60 64 59.6 55.9
    120 128 119.2 111.8
    240 256 238.4 223.5
    480 512 476.8 447

    The new percentages should be:
    (59.6-55.9) / 59.6 x 100 = 6.21% decrease
    (119.2-111.8) / 119.2 x 100 = 6.21% decrease
    (238.4-223.5) / 238.4 x 100 = 6.25% decrease
    (476.8-447) / 476.8 x 100 = 6.25% decrease


    Note, I did not use significant figures, so all numbers are approximated, yet suitable - the theoretical value may be slightly different.


    vol7ron
    Reply
  • vol7ron - Monday, January 04, 2010 - link

    Anand, the right column of the table should be marked as GiB.

    The last paragraph should take that into consideration. Either the second column should first be converted into GiB, or if it already is (and hard to believe it is), then you could do direct division from there.

    The new table:
    Adv.(GB) Tot.(GB) Tot.(GiB) User(GiB)
    50 64 59.6 46.6
    100 128 119.2 93.1
    200 256 238.4 186.3
    400 512 476.8 372.5

    The new percentages should be:
    (59.6-46.6) / 59.6 x 100 = 21.8% decrease
    (119.2-93.1) / 119.2 x 100 = 21.9% decrease
    (238.4-186.3) / 238.4 x 100 = 21.9% decrease
    (476.8-372.5) / 476.8 x 100 = 21.9% decrease


    And the second table:
    Adv.(GB) Tot.(GB) Tot.(GiB) User(GiB)
    60 64 59.6 55.9
    120 128 119.2 111.8
    240 256 238.4 223.5
    480 512 476.8 447

    The new percentages should be:
    (59.6-55.9) / 59.6 x 100 = 6.21% decrease
    (119.2-111.8) / 119.2 x 100 = 6.21% decrease
    (238.4-223.5) / 238.4 x 100 = 6.25% decrease
    (476.8-447) / 476.8 x 100 = 6.25% decrease


    Note, I did not use significant figures, so all numbers are approximated, yet suitable - the theoretical value may be slightly different.


    vol7ron
    Reply
  • Guspaz - Sunday, January 03, 2010 - link

    Your pricing estimates for Intel's refreshes worry me, and I worry that you're out of touch with SSD pricing.

    Intel's G2 x25-m 160GB drive currently sells for $500-550, so claims that Intel will be selling 600GB drives at the same price point raise some eyebrows.
    Reply
  • kunedog - Monday, January 04, 2010 - link

    I couldn't help but roll my eyes a little when I saw that Anand was again making Intel SSD pricing predictions. Even the G1 X-25Ms skyrocketed above his predictions for the G2s:
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...

    And the G1s are still higher at Newegg (the G2s are still a LOT higher). Anand has never acknowledged the stratospheric X-25M G2 pricing and how dead wrong his predictions were. He's kept us updated on negative aspects like the firmware bugs, slow stock/availability of G2s, and lack of TRIM for G1s, but never pricing.
    Reply

Log in

Don't have an account? Sign up now