The Unmentionables: NAND Mortality Rate

When Intel introduced its X25-M based on 50nm NAND technology we presented this slide:

A 50nm MLC NAND cell can be programmed/erased 10,000 times before it's dead. The reality is good MLC NAND will probably last longer than that, but 10,000 program/erase cycles was the spec. Update: Just to clarify, once you exceed the program/erase cycles you don't lose your data, you just stop being able to write to the NAND. On standard MLC NAND your data should be intact for a full year after you hit the maximum number of p/e cycles.

When we transitioned to 34nm, the NAND makers forgot to mention one key fact. MLC NAND no longer lasts 10,000 cycles at 34nm - the number is now down to 5,000 program/erase cycles. The smaller you make these NAND structures, the harder it is to maintain their integrity over thousands of program/erase cycles. While I haven't seen datasheets for the new 25nm IMFT NAND, I've heard the consumer SSD grade stuff is expected to last somewhere between 3000 - 5000 cycles. This sounds like a very big problem.

Thankfully, it's not.

My personal desktop sees about 7GB of writes per day. That can be pretty typical for a power user and a bit high for a mainstream user but it's nothing insane.

Here's some math I did not too long ago:

  My SSD
NAND Flash Capacity 256 GB
Formatted Capacity in the OS 238.15 GB
Available Space After OS and Apps 185.55 GB
Spare Area 17.85 GB

If I never install another application and just go about my business, my drive has 203.4GB of space to spread out those 7GB of writes per day. That means in roughly 29 days my SSD, if it wear levels perfectly, I will have written to every single available flash block on my drive. Tack on another 7 days if the drive is smart enough to move my static data around to wear level even more properly. So we're at approximately 36 days before I exhaust one out of my ~10,000 write cycles. Multiply that out and it would take 360,000 days of using my machine for all of my NAND to wear out; once again, assuming perfect wear leveling. That's 986 years. Your NAND flash cells will actually lose their charge well before that time comes, in about 10 years.

Now that calculation is based on 50nm 10,000 p/e cycle NAND. What about 34nm NAND with only 5,000 program/erase cycles? Cut the time in half - 180,000 days. If we're talking about 25nm with only 3,000 p/e cycles the number drops to 108,000 days.

Now this assumes perfect wear leveling and no write amplification. Now the best SSDs don't average more than 10x for write amplification, in fact they're considerably less. But even if you are writing 10x to the NAND what you're writing to the host, even the worst 25nm compute NAND will last you well throughout your drive's warranty.

For a desktop user running a desktop (non-server) workload, the chances of your drive dying within its warranty period due to you wearing out all of the NAND are basically nothing. Note that this doesn't mean that your drive won't die for other reasons before then (e.g. poor manufacturing, controller/firmware issues, etc...), but you don't really have to worry about your NAND wearing out.

This is all in theory, but what about in practice?

Thankfully one of the unwritten policies at AnandTech is to actually use anything we recommend. If we're going to suggest you spend your money on something, we're going to use it ourselves. Not in testbeds, but in primary systems. Within the company we have 5 SandForce drives deployed in real, every day systems. The longest of which has been running, without TRIM, for the past eight months at between 90 and 100% of its capacity.

SandForce, like some other vendors, expose a method of actually measuring write amplification and remaining p/e cycles on their drives. Unfortunately the method of doing so for SandForce is undocumented and under strict NDA. I wish I could share how it's done, but all I'm allowed to share are the results.

Remember that write amplification is the ratio of NAND writes to host writes. On all non-SF architectures that number should be greater than 1 (e.g. you go to write 4KB but you end up writing 128KB). Due to SF's real time compression/dedupe engine, it's possible for SF drives to have write amplification below 1.

So how did our drives fare?

The worst write amplification we saw was around 0.6x. Actually, most of the drives we've deployed in house came in at 0.6x. In this particular drive the user (who happened to be me) wrote 1900GB to the drive (roughly 7.7GB per day over 8 months) and the SF-1200 controller in turn threw away 800GB and only wrote 1100GB to the flash. This includes garbage collection and all of the internal management stuff the controller does.

Over this period of time I used only 10 cycles of flash (it was a 120GB drive) out of a minimum of 3000 available p/e cycles. In eight months I only used 1/300th of the lifespan of the drive.

The other drives we had deployed internally are even healthier. It turns out I'm a bit of a write hog.

Paired with a decent SSD controller, write lifespan is a non-issue. Note that I only fold Intel, Crucial/Micron/Marvell and SandForce into this category. Write amplification goes up by up to an order of magnitude with the cheaper controllers. Characterizing this is what I've been spending much of the past six months doing. I'm still not ready to present my findings but as long as you stick with one of these aforementioned controllers you'll be safe, at least as far as NAND wear is concerned.

 

Architecture & What's New Today: Toshiba 32nm Toggle NAND, Tomorrow: IMFT 25nm
Comments Locked

144 Comments

View All Comments

  • bigboxes - Thursday, February 17, 2011 - link

    Anand, I know you mentioned read/write and having your data a year after your last write. Does the future of SSD going to allow long-term storage on these devices? Will our data last longer than a year in storage or in use as read-only? I figured when cost went down and capacity went up that we'd start seeing SSD's truly replace HDD as the medium of long-term storage. Any insights into the (near) future?
  • marraco - Thursday, February 17, 2011 - link

    We need a roundup of SATA 6Gb controllers on AMD and Intel.

    How do added cards perform against integrated SATA 6Gb?
  • jwilliams4200 - Thursday, February 17, 2011 - link

    Here are the numbers given in the AS-SSD incompressible write speed chart for
    SF-2500 (clean, dirty, after TRIM):

    229.5 MB/s 230.0 MB/s 198.2 MB/s

    Logically, I would expect the dirty number to be less than or equal to the after-TRIM number. Is there a typo here?
  • jwilliams4200 - Thursday, February 17, 2011 - link

    Anand:

    Could you run the data files for your 2011 storage bench (heavy and light cases) through a couple of standard compression programs and report the compressed and uncompressed file sizes? That would be useful information to know when evaluating the performance of Sandforce SSDs on your storage benchmark.
  • Chloiber - Thursday, February 17, 2011 - link

    Indeed, this would be an important piece of information.
  • mstone29 - Thursday, February 17, 2011 - link

    It's been out for a few weeks and the performance is on par w/ the OCZ V3.

    Does OCZ pay better?
  • Anand Lal Shimpi - Sunday, February 20, 2011 - link

    We're still waiting for our Corsair P3 sample, as soon as we get it you'll see a review :)

    Take care,
    Anand
  • gotFrosty - Thursday, February 17, 2011 - link

    I personally will never buy from OCZ ever again... The way that they are treating the customers (including me) with this shady marketing scandal. Never will I deal with them. Never. Who is to say that they will not pull this crap somewhere down the line again.
    They changed the way they manufactured the drives. Ok thats well and fine, but at least change the product number/name whatever so that end users can distinguish between the products. Right now I'm sitting with a drive that they can't tell me whether its the slower 25nm or the 34. What kind of crap is that. I can't tell either because my build is waiting on the P67's to get fixed. Oh and to still market the drive as the same Vertex 2 that got all the great reviews.
    Lets just say I'm a little irritated with the whole scheme. I feel robbed.
  • Mr Perfect - Friday, February 18, 2011 - link

    Just stumbled across the whole Vertex 2 issue myself. Link to an explanation of what Frosty is mad about below:

    http://www.storagereview.com/ocz_issues_mea_culpa_...

    I'm not impressed with OCZ right now. Anand, any way you could talk to OCZ about this issue?
  • db808 - Thursday, February 17, 2011 - link

    Hi Anand,

    Thanks for another great SSD article. I own a OCZ Vertex 2 for my personal use, and I have been doing some testing of SSDs for work use.

    I have a questions/comments that will probably stir up some additional discussion.

    1) You present a good description on your personal workload write volume at 7GB / day, and how that even with that heavy amount of activity, the SSD life expectancy is much greater than the warranty period.

    Did you ever try to correlate this with the life expectancy (or read and write activity) reported by the SSD using the SMART attributes?

    In my first 3 weeks using a new Vertex 2 SSD as my boot disk, I averaged over 18 GB/day of write activity ... much greater than your reported 7 GB/day.

    I can not say for other Sandforce implementations, but the OCZ Vertex 2 does report a wide variety of useful statistics via the vendor-specific SMART statistics. These statistics can be displayed using the OCZ Toolbox:
    http://www.ocztechnologyforum.com/forum/showthread...

    I don't know if other SSD vendors have similar information. Crystal Disk Info (http://crystalmark.info/software/CrystalDiskInfo/i... also displays and formats many of the vendor-specific fields, but I don't know if it specifically displays the extended info for specific SSDs.

    Using the OCZ Toolbox (which works with all OCZ Sandforce SSDs), you can display a lot of interesting information. Here is the statistics for the first 3 weeks of usage from my SSD. No real benchmarking, just doing the initial install of Windows 7 64-bit, and then installing all the apps that I run. My 120 GB SSD is about half full, including a 8 gb page and 8 gb hiberbate file. I also relocated my Windows search index off the SSD. Temp IS on the SSD (my choice).

    SMART READ DATA
    Revision: 10
    Attributes List
    1: SSD Raw Read Error Rate Normalized Rate: 100 total ECC and RAISE errors
    5: SSD Retired Block Count Reserve blocks remaining: 100%
    9: SSD Power-On Hours Total hours power on: 351
    12: SSD Power Cycle Count Count of power on/off cycles: 84
    171: SSD Program Fail Count Total number of Flash program operation failures: 0
    172: SSD Erase Fail Count Total number of Flash erase operation failures: 0
    174: SSD Unexpected power loss count Total number of unexpected power loss: 19
    177: SSD Wear Range Delta Delta between most-worn and least-worn Flash blocks: 0
    181: SSD Program Fail Count Total number of Flash program operation failures: 0
    182: SSD Erase Fail Count Total number of Flash erase operation failures: 0
    187: SSD Reported Uncorrectable Errors Uncorrectable RAISE errors reported to the host for all data access: 0
    194: SSD Temperature Monitoring Current: 1 High: 129 Low: 127
    195: SSD ECC On-the-fly Count Normalized Rate: 100
    196: SSD Reallocation Event Count Total number of reallocated Flash blocks: 0
    231: SSD Life Left Approximate SDD life Remaining: 100%
    241: SSD Lifetime writes from host Number of bytes written to SSD: 384 GB
    242: SSD Lifetime reads from host Number of bytes read from SSD: 832 GB

    For my first 3 weeks, using the PC primarily after work and on weekends, I averaged 18.2 GB/day of write activity ... or 384 GB total.

    You may want to re-assess the classification of your 7 GB/day workload as "heavy". I don't think my 18.2 GB/day workload was extra heavy. My system has 8 GB of memory, and typically runs between 2-3 gb used, so I don't believe that there is a lot of activity to the page file. I have a hibernate file because I use a UPS, and it allows me to "resume" after a power blip vs. a full shutdown.

    Well ... back to the point .... The OCZ toolbox reports an estimated remaining life expectancy. I have not run my SSD long enough to register a 1% usage yet, but I will be looking at what volume of total write activity finally triggers the disk to report only 99% remaining life.

    I don't know if the OCZ Toolbox SMART reporting will work with non-OCZ Sandforce-based SSDs.

    If you can get a life expectancy value from your Sandforce SSDs, it would be interesting to see how it correlates with your synthetic estimates.

    Thanks again for a great article!

Log in

Don't have an account? Sign up now