The Unmentionables: NAND Mortality Rate

When Intel introduced its X25-M based on 50nm NAND technology we presented this slide:

A 50nm MLC NAND cell can be programmed/erased 10,000 times before it's dead. The reality is good MLC NAND will probably last longer than that, but 10,000 program/erase cycles was the spec. Update: Just to clarify, once you exceed the program/erase cycles you don't lose your data, you just stop being able to write to the NAND. On standard MLC NAND your data should be intact for a full year after you hit the maximum number of p/e cycles.

When we transitioned to 34nm, the NAND makers forgot to mention one key fact. MLC NAND no longer lasts 10,000 cycles at 34nm - the number is now down to 5,000 program/erase cycles. The smaller you make these NAND structures, the harder it is to maintain their integrity over thousands of program/erase cycles. While I haven't seen datasheets for the new 25nm IMFT NAND, I've heard the consumer SSD grade stuff is expected to last somewhere between 3000 - 5000 cycles. This sounds like a very big problem.

Thankfully, it's not.

My personal desktop sees about 7GB of writes per day. That can be pretty typical for a power user and a bit high for a mainstream user but it's nothing insane.

Here's some math I did not too long ago:

  My SSD
NAND Flash Capacity 256 GB
Formatted Capacity in the OS 238.15 GB
Available Space After OS and Apps 185.55 GB
Spare Area 17.85 GB

If I never install another application and just go about my business, my drive has 203.4GB of space to spread out those 7GB of writes per day. That means in roughly 29 days my SSD, if it wear levels perfectly, I will have written to every single available flash block on my drive. Tack on another 7 days if the drive is smart enough to move my static data around to wear level even more properly. So we're at approximately 36 days before I exhaust one out of my ~10,000 write cycles. Multiply that out and it would take 360,000 days of using my machine for all of my NAND to wear out; once again, assuming perfect wear leveling. That's 986 years. Your NAND flash cells will actually lose their charge well before that time comes, in about 10 years.

Now that calculation is based on 50nm 10,000 p/e cycle NAND. What about 34nm NAND with only 5,000 program/erase cycles? Cut the time in half - 180,000 days. If we're talking about 25nm with only 3,000 p/e cycles the number drops to 108,000 days.

Now this assumes perfect wear leveling and no write amplification. Now the best SSDs don't average more than 10x for write amplification, in fact they're considerably less. But even if you are writing 10x to the NAND what you're writing to the host, even the worst 25nm compute NAND will last you well throughout your drive's warranty.

For a desktop user running a desktop (non-server) workload, the chances of your drive dying within its warranty period due to you wearing out all of the NAND are basically nothing. Note that this doesn't mean that your drive won't die for other reasons before then (e.g. poor manufacturing, controller/firmware issues, etc...), but you don't really have to worry about your NAND wearing out.

This is all in theory, but what about in practice?

Thankfully one of the unwritten policies at AnandTech is to actually use anything we recommend. If we're going to suggest you spend your money on something, we're going to use it ourselves. Not in testbeds, but in primary systems. Within the company we have 5 SandForce drives deployed in real, every day systems. The longest of which has been running, without TRIM, for the past eight months at between 90 and 100% of its capacity.

SandForce, like some other vendors, expose a method of actually measuring write amplification and remaining p/e cycles on their drives. Unfortunately the method of doing so for SandForce is undocumented and under strict NDA. I wish I could share how it's done, but all I'm allowed to share are the results.

Remember that write amplification is the ratio of NAND writes to host writes. On all non-SF architectures that number should be greater than 1 (e.g. you go to write 4KB but you end up writing 128KB). Due to SF's real time compression/dedupe engine, it's possible for SF drives to have write amplification below 1.

So how did our drives fare?

The worst write amplification we saw was around 0.6x. Actually, most of the drives we've deployed in house came in at 0.6x. In this particular drive the user (who happened to be me) wrote 1900GB to the drive (roughly 7.7GB per day over 8 months) and the SF-1200 controller in turn threw away 800GB and only wrote 1100GB to the flash. This includes garbage collection and all of the internal management stuff the controller does.

Over this period of time I used only 10 cycles of flash (it was a 120GB drive) out of a minimum of 3000 available p/e cycles. In eight months I only used 1/300th of the lifespan of the drive.

The other drives we had deployed internally are even healthier. It turns out I'm a bit of a write hog.

Paired with a decent SSD controller, write lifespan is a non-issue. Note that I only fold Intel, Crucial/Micron/Marvell and SandForce into this category. Write amplification goes up by up to an order of magnitude with the cheaper controllers. Characterizing this is what I've been spending much of the past six months doing. I'm still not ready to present my findings but as long as you stick with one of these aforementioned controllers you'll be safe, at least as far as NAND wear is concerned.

 

Architecture & What's New Today: Toshiba 32nm Toggle NAND, Tomorrow: IMFT 25nm
Comments Locked

144 Comments

View All Comments

  • sheh - Thursday, February 17, 2011 - link

    Why's data retention down from 10 years to 1 year as the rewrite limit is approached?
    Does this mean after half the rewrites the retention is down to 5 years?
    What happens after that year, random errors?
    Is there drive logic (or standard software) to "refresh" a drive?
  • AnnihilatorX - Saturday, February 19, 2011 - link

    Think about how Flash cell works. There is a thick Silicon Dixoide barrier separating the floating gate with the transistor. The reason they have a limited write cycle is because the Silion dioxide layer is eroded when high voltages are required to pump electrons to the floating gate.

    As the SO2 is damaged, it is easier for the electrons in the floating gate to leak, eventually when sufficient charge is leaked the data is loss (flipped from 1 to 0)
  • bam-bam - Thursday, February 17, 2011 - link

    Thanks for the great preview! Can’t wait to get a couple of these new SDD’s soon.

    I’ll add them to an even more anxiously-awaited high-end SATA-III RAID Controller (Adaptec 6805) which is due out in March 2011. I’ll run them in RAID-0 and then see how they compare to my current set up:

    Two (2) Corsair P256 SSD's attached to an Adaptec 5805 controller in RAID-0 with the most current Windows 7 64-bit drivers. I’m still getting great numbers with these drives, almost a year into heavy, daily use. The proof is in pudding:

    http://img24.imageshack.us/img24/6361/2172011atto....

    (1500+ MB/s read speeds ain’t too bad for SATA-II based SSD’s, right?)

    With my never-ending and completely insatiable need-for-speed, I can’t wait to see what these new SATA-III drives with the new Sand-Force controller and a (good-quality) RAID card will achieve!
  • Quindor - Friday, February 18, 2011 - link

    Eeehrmm.....

    Please re-evaluatue what you have written above and how to preform benchmarks.

    I too own a Adaptec 5805 and it has 512MB of cache memory. So, if you run atto with a size of 256MB, this fits inside the memory cache. You should see performance of around 1600MB/sec from the memory cache, this is in no way related to what your subsystem storage can or cannot do. A single disk connected to it but just using cache will give you exactly the same values.

    Please rerun your tests set to 2GB and you will get real-world results of what the storage behind the card can do.

    Actually, I'm a bit surprised that your writes don't get the same values? Maybe you don't have your write cache set to write back mode? This will improve performance even more, but consider using a UPS or a battery backup cache module before doing so. Same thing goes for allowing disk cache or not. Not sure if this settings will affect your SSD's though.

    Please, analyze your results if they are even possible before believing them. Each port can do around 300MB/sec, so 2x300MB/sec =/= 1500MB/sec that should have been your first clue. ;)
  • mscommerce - Thursday, February 17, 2011 - link

    Super comprehensible and easy to digest. I think its one of your best, Anand. Well done!
  • semo - Friday, February 18, 2011 - link

    "if you don't have a good 6Gbps interface (think Intel 6-series or AMD 8-series) then you probably should wait and upgrade your motherboard first"

    "Whenever you Sandy Bridge owners get replacement motherboards, this may be the SSD you'll want to pair with them"

    So I gather AMD haven't been able to fix their SATA III performance issues. Was it ever discovered what the problem is?
  • HangFire - Friday, February 18, 2011 - link

    The wording is confusing, but I took that to mean you're OK with Intel 6 or AMD 8.

    Unfortunately, we may never know, as Anand rarely reads past page 4 or 5 of the comments.

    I am getting expected performance from my C300 + 890GX.
  • HangFire - Friday, February 18, 2011 - link

    OK here's the conclusion from 3/25/2010 SSD/Sata III article:

    "We have to give AMD credit here. Its platform group has clearly done the right thing. By switching to PCIe 2.0 completely and enabling 6Gbps SATA today, its platforms won’t be a bottleneck for any early adopters of fast SSDs. For Intel these issues don't go away until 2011 with the 6-series chipsets (Cougar Point) which will at least enable 6Gbps SATA. "

    So, I think he is associating "good 6Gbps interface) with 6&8 series, not "don't have" with 6&8.
  • semo - Friday, February 18, 2011 - link

    Ok I think I get it thanks HangFire. I remember that there was an article on Anandtech that tested SSDs on AMD's chipsets and the results weren't as good as Intel's. I've been waiting ever since for a follow up article but AMD stuff doesn't get much attention these days.
  • BanditWorks - Friday, February 18, 2011 - link

    So if MLC NAND mortality rate ("endurance") dropped from 10,000 cycles down to 5,000 with the transition to 34nm manufacturing tech., does that mean that the SLC NAND mortality rate of 100,000 cycles went down to ~ 50,000?

    Sorry if this seems like a stupid question. *_*

Log in

Don't have an account? Sign up now