The Secret Sauce: 0.5x Write Amplification

The downfall of all NAND flash based SSDs is the dreaded read-modify-write scenario. I’ve explained this a few times before. Basically your controller goes to write some amount of data, but because of a lot of reorganization that needs to be done it ends up writing a lot more data. The ratio of how much you write to how much you wanted to write is write amplification. Ideally this should be 1. You want to write 1GB and you actually write 1GB. In practice this can be as high as 10 or 20x on a really bad SSD. Intel claims that the X25-M’s dynamic nature keeps write amplification down to a manageable 1.1x. SandForce says its controllers write a little less than half what Intel does.

SandForce states that a full install of Windows 7 + Office 2007 results in 25GB of writes to the host, yet only 11GB of writes are passed on to the drive. In other words, 25GBs of files are written and available on the SSD, but only 11GB of flash is actually occupied. Clearly it’s not bit-for-bit data storage.

What SF appears to be doing is some form of real-time compression on data sent to the drive. SandForce told me that it’s not strictly compression but a combination of several techniques that are chosen on the fly depending on the workload.

SandForce referenced data deduplication as a type of data reduction algorithm that could be used. The principle behind data deduplication is simple. Instead of storing every single bit of data that comes through, simply store the bits that are unique and references to them instead of any additional duplicates. Now presumably your hard drive isn’t full of copies of the same file, so deduplication isn’t exactly what SandForce is doing - but it gives us a hint.

Straight up data compression is another possibility. The idea behind lossless compression is to use fewer bits to represent a larger set of bits. There’s additional processing required to recover the original data, but with a fast enough processor (or dedicated logic) that part can be negligible.

Assuming this is how SandForce works, it means that there’s a ton of complexity in the controller and firmware. Much more than what even a good SSD controller needs to deal with. Not only does SandForce have to manage bad blocks, block cleaning/recycling, LBA mapping and wear leveling, but it also needs to manage this tricky write optimization algorithm. It’s not a trivial matter, SandForce must ensure that the data remains intact while tossing away nearly half of it. After all, the primary goal of storage is to store data.

The whole write-less philosophy has tremendous implications for SSD performance. The less you write, the less you have to worry about garbage collection/cleaning and the less you have to worry about write amplification. This is how the SF controllers get by without having any external DRAM, there’s just no need. There are fairly large buffers on chip though, most likely on the order of a couple of MBs (more on this later).

Manufacturers are rarely honest enough to tell you the downsides to their technologies. Representing a collection of bits with a fewer number of bits works well if you have highly compressible data or a ton of duplicates. Data that is already well compressed however, shouldn’t work so nicely with the DuraWrite engine. That means compressed images, videos or file archives will most likely exhibit higher write amplification than SandForce’s claimed 0.5x. Presumably that’s not the majority of writes your SSD will see on a day to day basis, but it’s going to be some portion of it.

Enter the SandForce Controlling Costs with no DRAM and Cheaper Flash
POST A COMMENT

102 Comments

View All Comments

  • Wwhat - Wednesday, January 06, 2010 - link

    You make a good point, and anand seems to deliberately deflect thinking about it, now you must wonder why.
    Anyway don't be disheartened, your point is good regardless of this support of 'magic' that anad seems to prefer over an intellectual approach.
    Reply
  • Shining Arcanine - Thursday, December 31, 2009 - link

    As far as I can tell from Anand's description of the technology, it seems that this is being done transparently to the operating system, so while the operating system thinks that 25GB have been written, the SSD knows that it only wrote 11GB. Think of it of having two balancing sheets, one that other people see that has nice figures and the other that you see which has the real figures, sort of like what Enron did, except instead of showing the better figures to everyone else when the actual figures are worse, you show the worse figures to everyone else when the actual figures are better. Reply
  • Anand Lal Shimpi - Thursday, December 31, 2009 - link

    Data compression, deduplication, etc... are all apparently picked and used on the fly. SandForce says it's not any one algorithm but a combination of optimizations.

    Take care,
    Anand
    Reply
  • AbRASiON - Friday, January 01, 2010 - link

    What about data reliability, compressed data can normally be a bit of an issue recovering it - any thoughts? Reply
  • Jenoin - Thursday, December 31, 2009 - link

    Could you please post actual disk capacity used for the windows 7 and office install?
    The "size" vs "size on disk" of all the folders/files on the drive, (listed by windows in the properties context tab) would be interesting, to see what level of compression there is.

    Thanks
    Reply
  • Anand Lal Shimpi - Thursday, December 31, 2009 - link

    Reported capacity does not change. You don't physically get more space with DuraWrite, you just avoid wasting flash erase cycles.

    The only way to see that 25GB of installs results in 11GB of writes is to query the controller or flash memory directly. To the end user, it looks like you just wrote 25GB of data to the drive.

    Take care,
    Anand
    Reply
  • notty22 - Thursday, December 31, 2009 - link


    It would be nice for the customer if OCZ did not produce multiple models with varying degrees of quality . Whether its the controller or memory , or combination thereof.
    Go to Newegg glance at OCZ 60 gig ssd and greeted with this.

    OCZ Agility Series OCZSSD2-1AGT60G

    OCZ Core Series V2 OCZSSD2-2C60G

    OCZ Vertex Series OCZSSD2-1VTX60G

    OCZ Vertex OCZSSD2-1VTXA60G

    OCZ Vertex Turbo OCZSSD2-1VTXT60G

    OCZ Vertex EX OCZSSD2-1VTXEX60G

    OCZ Solid Series OCZSSD2-1SLD60G

    OCZ Summit OCZSSD2-1SUM60G

    OCZ Agility EX Series OCZSSD2-1AGTEX60G

    219.00 - 409.00
    Low to high the way I listed them.
    I can understand when some say they will wait until the
    manufactures work out all the various bugs/negatives that must
    be inherent in all these model/name changes.
    Which model gets future technical upgrades/support ?
    Reply
  • jpiszcz - Thursday, December 31, 2009 - link

    I agree with you on that one.

    What we need is an SSD that beats the X25-E, so far, there is none.

    BTW-- is anyone here running X25-E on enterprise severs with > 100GB/day? If so, what kind of failure rates are seen?



    Reply
  • Lonyo - Thursday, December 31, 2009 - link

    I like the idea.
    Given the current state of the market, their product is pretty suitable when it comes to end user patterns.
    SSDs are just too expensive for mass storage, so traditional large capacity mechanical drives make more sense for your film or TV or music collection (all of which are likely to be compressed), which all the non-compressed stuff goes on your SSD for fast access.

    It's god sound thinking behind it for a performance drive, although in the long run I'm not so sure the approach would always be particularly useful in a consumer oriented drive.
    Reply
  • dagamer34 - Thursday, December 31, 2009 - link

    At least for now, consumer-oriented drives aren't where the money is. Until you get 160GB drives down to $100, most consumers will call SSDs too expensive for laptop use.

    The nice thing about desktops though is multiple slots. 80GB is all what most people need to install an OS, a few programs, and games. Media should be stored on a separate platter-based drive anyway (or even a centralized server).
    Reply

Log in

Don't have an account? Sign up now