The Secret Sauce: 0.5x Write Amplification

The downfall of all NAND flash based SSDs is the dreaded read-modify-write scenario. I’ve explained this a few times before. Basically your controller goes to write some amount of data, but because of a lot of reorganization that needs to be done it ends up writing a lot more data. The ratio of how much you write to how much you wanted to write is write amplification. Ideally this should be 1. You want to write 1GB and you actually write 1GB. In practice this can be as high as 10 or 20x on a really bad SSD. Intel claims that the X25-M’s dynamic nature keeps write amplification down to a manageable 1.1x. SandForce says its controllers write a little less than half what Intel does.

SandForce states that a full install of Windows 7 + Office 2007 results in 25GB of writes to the host, yet only 11GB of writes are passed on to the drive. In other words, 25GBs of files are written and available on the SSD, but only 11GB of flash is actually occupied. Clearly it’s not bit-for-bit data storage.

What SF appears to be doing is some form of real-time compression on data sent to the drive. SandForce told me that it’s not strictly compression but a combination of several techniques that are chosen on the fly depending on the workload.

SandForce referenced data deduplication as a type of data reduction algorithm that could be used. The principle behind data deduplication is simple. Instead of storing every single bit of data that comes through, simply store the bits that are unique and references to them instead of any additional duplicates. Now presumably your hard drive isn’t full of copies of the same file, so deduplication isn’t exactly what SandForce is doing - but it gives us a hint.

Straight up data compression is another possibility. The idea behind lossless compression is to use fewer bits to represent a larger set of bits. There’s additional processing required to recover the original data, but with a fast enough processor (or dedicated logic) that part can be negligible.

Assuming this is how SandForce works, it means that there’s a ton of complexity in the controller and firmware. Much more than what even a good SSD controller needs to deal with. Not only does SandForce have to manage bad blocks, block cleaning/recycling, LBA mapping and wear leveling, but it also needs to manage this tricky write optimization algorithm. It’s not a trivial matter, SandForce must ensure that the data remains intact while tossing away nearly half of it. After all, the primary goal of storage is to store data.

The whole write-less philosophy has tremendous implications for SSD performance. The less you write, the less you have to worry about garbage collection/cleaning and the less you have to worry about write amplification. This is how the SF controllers get by without having any external DRAM, there’s just no need. There are fairly large buffers on chip though, most likely on the order of a couple of MBs (more on this later).

Manufacturers are rarely honest enough to tell you the downsides to their technologies. Representing a collection of bits with a fewer number of bits works well if you have highly compressible data or a ton of duplicates. Data that is already well compressed however, shouldn’t work so nicely with the DuraWrite engine. That means compressed images, videos or file archives will most likely exhibit higher write amplification than SandForce’s claimed 0.5x. Presumably that’s not the majority of writes your SSD will see on a day to day basis, but it’s going to be some portion of it.

Enter the SandForce Controlling Costs with no DRAM and Cheaper Flash
POST A COMMENT

102 Comments

View All Comments

  • vol7ron - Monday, January 04, 2010 - link

    I don't think Anand has ever tried to predict market price. He generally lets us in on lot prices, that is, what retailers buy the merchandise for in quantities of 1000. Generally, when he does release that information, he is close to dead on. He typically does not way in on numeric estimates of market prices, other than statements like "they should be cheaper than...[insert product here]... because material/manufacturing costs are lower." The link you gave looks less to be a prediction and more to be what the suggested retail price is; much like buying a car, although the suggested price is printed, it does not mean the actual market price will be equal to it.

    As for the G1/G2, as you recall, the G2 was very low on initial release (at least at Newegg) to the tune of ~$225. There have been several factors that have driven this price up (~$300). This is due to demand, but really it is a step demand. They are on Revision 5 of the G2, but the important thing is the fact that the G2 has been recalled twice. Where demand is generally steady in terms of price, abnormal release dates have pushed demand higher at different points (the graph looks more like a staircase, hence "step"). The price will again fall in the future.

    You should note that whenever things go "out of stock," the prices will go up, supply is low and demand is high, hence bargaining power from retailers, basic economics. Criticizing Anand does not accomplish anything as his facts were correct.

    Reply
  • vol7ron - Monday, January 04, 2010 - link

    Grammar/Syntax edit:
    "...lets us in on lot prices; that is, what retailers..."
    "He typically does not [weigh] in"


    Further note:
    If you look at the Arrandale article, there is a price supply list. Those prices are for lots of 1Ku (1000 units), which reaffirms the point I made earlier, before I even looked at the Arrandale article.

    As for Newegg, it's a unique site, which prices are close, but generally higher than the 1,000unit price. The fact that the G2 price was ~$225 on initial release was probably a promotional price point that often happens with new products.
    Reply
  • viewwin - Monday, January 04, 2010 - link

    Market forces are driving the price higher than MSRP(Manufacture Suggested Retail Price). Intel tried to have lower prices, but market demand pushed it higher. Prices were far lower on Newegg.com went the G2 first came out, but shot up to $600 at one point for the 160 GB. I recall an article about it, but can't find it. Reply
  • kunedog - Tuesday, January 05, 2010 - link

    OK, so he's "out of touch" with actual market prices, instead of made-up retail prices (MSRP).


    "I recall an article about it, but can't find it."

    That's OK, I saw the whole thing play out firsthand. After Anand posted these articles . . .
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...

    . . . stressing the expected performance and *affordability* of Intel X-25M G2 drives (I quote: "The performance improved, sometimes heartily, but the pricing was the real story."), they quickly disappeared from Newegg at the Anand-predicted price (with Newegg suggesting the G1s as an alternative, for which I call foul because many or most people wouldn't know the difference). They stayed out of stock for weeks. A month later, he posts this on the weekend:
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...

    The very next day (a Monday), G2s were suddenly in stock again at a huge markup, and the prices continued to climb for a few days. They've slowly fallen since that week, but never to the Anand-predicted price, and that fact has never been acknowledged in any of the subsequent reviews.

    The pattern repeated with the Kingston 40GB drives:
    http://www.anandtech.com/storage/showdoc.aspx?i=36...">http://www.anandtech.com/storage/showdoc.aspx?i=36...

    The pricing prediction ($85 w/ rebate, $115 without) for it was apparently so important that it had to be right there in the summary (so you don't even have to click the full article to see it). I checked Newegg every day for a couple weeks after it was posted (and somewhat less often since) but *never* saw it in stock for less than $130 (which is the current price). Further, that article was repeatedly updated and bumped for minor and predictable updates (like new bugs/firmware), but the pricing of the Kingston never updated (even though the rebate is expired).

    I would argue that market prices matter *more* than MSRP, and deserve Anand's attention. The high prices themselves aren't a problem; clearly people are willing to pay that much, therefore the drives are "worth it." It's Anand's complete obliviousness to them (after previously stressing their importance and total awesomeness) that comes across as strange.
    Reply
  • chemist1 - Sunday, January 03, 2010 - link

    Hi Anand,

    When you wrote: "Current roadmaps put the next generation of Intel SSDs out in Q4 2010, although Intel tells me it will be a 'mid-year' refresh," didn't you mean "*there* [not 'it'] will be a mid-year refresh?" I.e., that the next generation is still not expected out until Q4, but that there will be a mid-year updating of the current generation? [By writing "it" will be a mid-year refresh, you communicate that Intel told you that the next gen will be released mid-year instead of Q4, which is not what I think you meant to say .... or is it?]
    Reply
  • vol7ron - Sunday, January 03, 2010 - link

    Good question.

    To clarify what he's asking:
    Is it a mid-year refresh and a 2010Q4 release?
    -or-
    Is the mid-year refresh going to take place instead of the Q4 release (Q4 is pushed back).
    Reply
  • vol7ron - Saturday, January 02, 2010 - link

    I thought GIGABYTE released a motherboard with SATA6 for AMD (GA-790FXTA-UD5). It might be nice to start testing it out and putting these SSDs to the test.

    Also, is it fair to take the enterprise level controller (SF-1500) and compare that to the consumer market product (X25-M)? Granted the SF-1500 has already stood well against the X25-E, but it's going to cost a heck of a lot more than the X25-M and the target market is the enterprise sector, anyhow.

    Regardless of what it compares to, I'm already saying that the cost of this controller is overpriced. They can justify it however they would like; that is, better performance, high research and development costs, market barriers to entry, eg. The truth, though, is that they're overcharging. The logic is mostly sound, but the price is not. OCZ should sign a contract to buy the controller for a year, sell what they can, and negotiate a lower price, or else drop the controller. I'd like to see what that does to SF's profits.

    I also would like to say that not using DRAM can have bad effects down the line. To get rid of it to justify a more expensive controller seems like an ignorant bargaining chip that SF is using to make more money. That's like saying, "I've upgraded your Ferrari with a newer, bigger engine, but it'll only take Regular [gas]." There's a high correlation between horsepower and Premium fuel; suffice to say the product might be faster, but it could be better.
    Reply
  • vol7ron - Sunday, January 03, 2010 - link

    Given time to think about this:

    Maybe it is fair to compare the SF-1500 to the X25-M, since they're both MLCs. However, if the SF-1500 is still supposed to be the enterprise version, the two products are not price equivalent.

    Regardless, I do like to see the comparison. I just don't like to see the criticism when one is deemed an enterprise version and the other is still targeted for the home consumer/enthusiast.
    Reply
  • Capt - Saturday, January 02, 2010 - link

    ...it would be nice to have a shootout between the test field (Vertex 2, X25-M, ...) and a pair some drives in a RAID0/Stripe configuration, especially comparing equal total sizes and different platforms (Intel/AMD chipset, hardware controllers). With the new about-to-be-released Intel RST drivers SSD stripe performance got boosted quite a bit, and although I guess there won't be much of an improvement in the 4k area, reading/writing larger blocks and sequential does improve by a massive amount. As a pair of two 80GB X25-Ms costs only 10% more than a single 160GB drive this scenario is very tempting... Reply
  • vol7ron - Saturday, January 02, 2010 - link

    I also have been trying to get the reviewers to show more SSD RAID configurations. Not just because the price difference is semi-negligent, but because SSDs are suppose to be more error-free, and thus a more suitable technology for RAID. After all, isn't the exponential error potential the reason why RAID-0 was frowned upon?

    On the downside, I think there have been problems recently with the Intel Matrix Storage Manager, which might be one reason why the topic has been delayed. Regardless, it would be nice if this topic was re-addressed, if only to remind us readers that it is still in your thoughts :)
    Reply

Log in

Don't have an account? Sign up now