The Secret Sauce: 0.5x Write Amplification

The downfall of all NAND flash based SSDs is the dreaded read-modify-write scenario. I’ve explained this a few times before. Basically your controller goes to write some amount of data, but because of a lot of reorganization that needs to be done it ends up writing a lot more data. The ratio of how much you write to how much you wanted to write is write amplification. Ideally this should be 1. You want to write 1GB and you actually write 1GB. In practice this can be as high as 10 or 20x on a really bad SSD. Intel claims that the X25-M’s dynamic nature keeps write amplification down to a manageable 1.1x. SandForce says its controllers write a little less than half what Intel does.

SandForce states that a full install of Windows 7 + Office 2007 results in 25GB of writes to the host, yet only 11GB of writes are passed on to the drive. In other words, 25GBs of files are written and available on the SSD, but only 11GB of flash is actually occupied. Clearly it’s not bit-for-bit data storage.

What SF appears to be doing is some form of real-time compression on data sent to the drive. SandForce told me that it’s not strictly compression but a combination of several techniques that are chosen on the fly depending on the workload.

SandForce referenced data deduplication as a type of data reduction algorithm that could be used. The principle behind data deduplication is simple. Instead of storing every single bit of data that comes through, simply store the bits that are unique and references to them instead of any additional duplicates. Now presumably your hard drive isn’t full of copies of the same file, so deduplication isn’t exactly what SandForce is doing - but it gives us a hint.

Straight up data compression is another possibility. The idea behind lossless compression is to use fewer bits to represent a larger set of bits. There’s additional processing required to recover the original data, but with a fast enough processor (or dedicated logic) that part can be negligible.

Assuming this is how SandForce works, it means that there’s a ton of complexity in the controller and firmware. Much more than what even a good SSD controller needs to deal with. Not only does SandForce have to manage bad blocks, block cleaning/recycling, LBA mapping and wear leveling, but it also needs to manage this tricky write optimization algorithm. It’s not a trivial matter, SandForce must ensure that the data remains intact while tossing away nearly half of it. After all, the primary goal of storage is to store data.

The whole write-less philosophy has tremendous implications for SSD performance. The less you write, the less you have to worry about garbage collection/cleaning and the less you have to worry about write amplification. This is how the SF controllers get by without having any external DRAM, there’s just no need. There are fairly large buffers on chip though, most likely on the order of a couple of MBs (more on this later).

Manufacturers are rarely honest enough to tell you the downsides to their technologies. Representing a collection of bits with a fewer number of bits works well if you have highly compressible data or a ton of duplicates. Data that is already well compressed however, shouldn’t work so nicely with the DuraWrite engine. That means compressed images, videos or file archives will most likely exhibit higher write amplification than SandForce’s claimed 0.5x. Presumably that’s not the majority of writes your SSD will see on a day to day basis, but it’s going to be some portion of it.

Enter the SandForce Controlling Costs with no DRAM and Cheaper Flash
POST A COMMENT

102 Comments

View All Comments

  • Holly - Friday, January 01, 2010 - link

    Well, you can patent implementation and technology, but not the idea itself. (At least that's what my boss was trying to explain me). So, in case this idea seems worthy enough other manufacturers will come with their own MySuperStoringTechnology (c).

    Personaly I think any improvement (even if it comes out to be dead end) is worth it in global scale and this tech seems very interesting to me.

    I only have some worries about using cheaper NAND chips... Taking cheap USB flash they tend to go nuts in about 6-12 months of usage (well, I am stressing them quite a bit...) Putting them together with the best controller seems to me a bit like disbalancing things. Definitely not for servers/enthusiasts (who want the best quality for good reasons) and still too expensive for pple earning their paychecks
    Reply
  • Holly - Friday, January 01, 2010 - link

    P.S. Happy New Year Reply
  • yacoub - Thursday, December 31, 2009 - link

    I don't know that I want lower-quality Flash memory in my SSDs. I think I'd rather have both a better chip and high quality memory. But you know corners will be cut somewhere to keep the prices affordable. Reply
  • frontliner - Thursday, December 31, 2009 - link

    Page 10 talks about Random Write in MB/s and you're talking IOPS:

    At 11K IOPS in my desktop 4KB random write test, the Vertex 2 Pro is 20% faster than Intel’s X25-M G2. Looking at it another way, the Vertex 2 Pro has 2.3x the 4KB random write performance of today’s OCZ Vertex Turbo.

    &

    Random read performance is quite good at 13K IOPS, but a little lower than Intel’s X25-M G2.
    Reply
  • Anand Lal Shimpi - Thursday, December 31, 2009 - link

    woops! you're right, I decided to go with the MB/s graphs at the last minute but wrote the text based on the IOPS results. Fixed! :)

    Take care,
    Anand
    Reply
  • Makaveli - Thursday, December 31, 2009 - link

    My guess is intel will release another firmware to increase the write speed on the G2 drives. As Q4 2010 is quite a long wait for a refresh. So new firmware with increase write speed and a price drop should still keep them in the driving seat.

    Kudos to OCZ for the constant shove in the back to intel tho.
    Reply
  • mikesown - Thursday, December 31, 2009 - link

    Hi Anand,

    Great article! Along the subject of Intel's monopoly bullying, I was curious if you had any information about Micron manufacturing their own C300 SSDs with (very nice, it seems) Marvell controllers(see http://www.micronblogs.com/category/ssd-concepts/)">http://www.micronblogs.com/category/ssd-concepts/). I know Micron and Intel manufactured NAND jointly through their IM Flash Technologies venture, so it seems a little bit strange that Micron would manufacture competing SSDs while in a partnership with Intel. Did Intel and Micron part ways for good?

    Thanks,
    Mike
    Reply
  • efficientD - Thursday, December 31, 2009 - link

    As an employee of Micron, I can say that Intel and Micron have not parted ways, but rather only had the agreement for the actual flash memory and not all of the other parts of an SSD (controller, dtc.) We are still very much in cooperation on what was agreed upon in the first place. You will notice that the OCZ in this article is Micron, and not from IM flash (the Intel/Micron joint venture). If you crack open an Intel drive, however, you will nearly exclusively find IM Flash chips along with Micron DRAM, the first gen didn't even have Micron DRAM. Hope this clarifies some things. Reply
  • Doormat - Thursday, December 31, 2009 - link

    I'm disappointed in the lack of SATA 6Gb/s support, but a lot of that is product timing (its only now showing up in add-on chips, and controllers in late 2010/early 2011). You really wonder what the speeds are on an unbridled SF-based drive. Reply
  • Jenoin - Thursday, December 31, 2009 - link

    "SandForce states that a full install of Windows 7 + Office 2007 results in 25GB of writes to the host, yet only 11GB of writes are passed on to the drive. In other words, 25GBs of files are written and available on the SSD, but only 11GB of flash is actually occupied. Clearly it’s not bit-for-bit data storage."
    "What SF appears to be doing is some form of real-time compression on data sent to the drive."
    Based on what they said (and what they didn't say) I have to disagree. It appears to me that they are comparing the write that is requested with the data already on the SSD and only writing to the bits that need changed thereby write amplification ~0.5. This would explain the high number of IOPS during your compressed file write test perhaps. That test would then be a mixed test of sequential and random writes giving you performance numbers in between the two other tests. Could you verify the actual disk usage with windows 7 and Office installed? If it indicates 11gb used then it is using some kind of compression but if it indicates the full size on the disk then it is using something similar to what I detailed. I just thought it interesting that Sandforce never said things would take up less space, (which would be a large selling point) they only said it would have to write about half as much supporting my theory.
    Reply

Log in

Don't have an account? Sign up now