The Intel SSD 710 (200GB) Reviewby Anand Lal Shimpi on September 30, 2011 8:53 PM EST
Flash memory is non-volatile storage and in that sense it's similar to a hard drive. Once you write to a NAND flash cell it can store that data for an extended period of time without power.
You write to NAND through a quantum tunneling process. Apply a high enough voltage across a floating-gate transistor and some electrons will actually tunnel through an insulating oxide layer and remain on the floating gate, even when the voltage is removed. Repeated tunneling can weaken the bonds of the oxide, eventually allowing electrons to freely leave the floating gate. It's this weakening that's responsible for a lot of NAND endurance issues, although there are other elements at play.
NAND is programmed and read by seeing how each cell responds to various voltages. This chart shows the difference between MLC (multi-level-cell) and SLC (single-level-cell) NAND:
Both types of NAND are identical architecturally, it's just a question of how many voltages you map to bits on the cell. MLC (2-bit-per-cell ) has four different voltage levels that correspond to values while SLC only has two. Note that each value can correspond to a distribution of voltages. As long as the threshold voltage falls within that range, the corresponding value is programmed or read.
The white space in between each voltage distribution is the margin you have to work with. Those blue lines above are read points. As long as the voltage distributions don't cross the read points, data is accessed correctly. The bigger the margin between these points, the more write cycles you'll get out of your NAND. The smaller the margin, the easier it is to produce the NAND. It's easier to manufacture NAND that doesn't require such precise voltages to store and read data from each cell. Over time physical effects can cause these voltage distributions to shift, which ultimately leads to cell failure.
As MLC NAND gets close to the end of its life, these margins start narrowing considerably. Continuously programming and erasing NAND cells weakens the oxide, eventually allowing electrons to become stuck in the oxide itself. This phenomenon alters the threshold voltage of the transistor, which in turn shifts bit placements:
There's now ambiguity between bits which, if this cell were allowed to remain active in an SSD, would mean that when you go to read a file on your drive there's a chance that you won't actually get the data you're requesting. A good SSD should mark these bits bad at this point.
There's a JEDEC spec that defines what should happen to the NAND once its cells get to this point. For consumer applications, the NAND should remain in a read-only state that can guarantee data availability for 12 months at 30C with the drive powered off. Manufacturers must take this into account when they test and qualify their NAND. If you're curious, JEDEC also offers guidelines on how to cycle test the NAND to verify that it's compliant.
By now we all know the numbers. At 50nm Intel's MLC NAND was rated for 10,000 program/erase cycles per cell. That number dropped to 5,000 at 34nm and remained at the same level with the move to 25nm. Across the industry 3,000 - 5,000 p/e cycles for 2x-nm 2-bit-per-cell MLC (2bpc) NAND is pretty common.
For desktop workloads, even the lower end of that range is totally fine. The SSD in your desktop or notebook is more likely to die because of some silly firmware bug or manufacturing issue than you wearing out the NAND. For servers with tons of random writes, even 5K p/e cycles isn't enough. To meet the needs of these applications, Intel outfitted the 710 with MLC-HET (High Endurance Technology) more commonly known as eMLC.
Fundamentally, Intel's MLC-HET is just binned MLC NAND. SLC NAND gets away with having ultra high p/e cycle counts by only having two bit levels to worry about. The voltage distributions for those two levels can be very far apart and remain well defined over time as a result. I suspect only the highest quality NAND was used as SLC to begin with, also contributing to its excellent endurance.
Intel takes a similar approach with MLC-HET. Placements are much more strict in MLC-HET. Remember what I said earlier, narrow ranges of voltages mapping to each bit level reduces the number of NAND die that will qualify, but you build in more margin as you cycle the NAND. If placements do shift however, Intel's SSD 710 can actually shift read points as long as the placements aren't overlapping.
Similar to frequency binning CPUs, the highest quality NAND with the tightest margins gets binned into MLC-HET while everything else is shipped as standard MLC. And just like with frequency binning, there's a good chance you'll get standard MLC that will last a lot longer than it's supposed to. In fact, I've often heard from manufacturers that hitting up to 30K p/e cycles on standard MLC NAND isn't unrealistic. With its MLC-HET Intel also more frequently/thoroughly refreshes idle NAND cells to ensure data integrity over periods of extended use.
Intel performs one other optimization on MLC-HET. After you've exceeded all available p/e cycles on standard MLC, JEDEC requires that the NAND retain your data in a power-off state for a minimum of 12 months. For MLC-HET, the minimum is reduced to 3 months. In the consumer space you need that time to presumably transfer your data over. In the enterprise world, a dying drive is useless and the data is likely mirrored elsewhere. Apparently this tradeoff also helps Intel guarantee more cycles during the drive's useful life.
At IDF Intel told us the MLC-HET in the SSD 710 would be good for around 30x the write cycles of standard (presumably 25nm) MLC. If we use 3,000 as a base for MLC, that works out to be 90K p/e cycles for Intel's 25nm MLC-HET.