NAND Recap

Flash memory is non-volatile storage and in that sense it's similar to a hard drive. Once you write to a NAND flash cell it can store that data for an extended period of time without power.

You write to NAND through a quantum tunneling process. Apply a high enough voltage across a floating-gate transistor and some electrons will actually tunnel through an insulating oxide layer and remain on the floating gate, even when the voltage is removed. Repeated tunneling can weaken the bonds of the oxide, eventually allowing electrons to freely leave the floating gate. It's this weakening that's responsible for a lot of NAND endurance issues, although there are other elements at play.

NAND is programmed and read by seeing how each cell responds to various voltages. This chart shows the difference between MLC (multi-level-cell) and SLC (single-level-cell) NAND:

Both types of NAND are identical architecturally, it's just a question of how many voltages you map to bits on the cell. MLC (2-bit-per-cell ) has four different voltage levels that correspond to values while SLC only has two. Note that each value can correspond to a distribution of voltages. As long as the threshold voltage falls within that range, the corresponding value is programmed or read.

The white space in between each voltage distribution is the margin you have to work with. Those blue lines above are read points. As long as the voltage distributions don't cross the read points, data is accessed correctly. The bigger the margin between these points, the more write cycles you'll get out of your NAND. The smaller the margin, the easier it is to produce the NAND. It's easier to manufacture NAND that doesn't require such precise voltages to store and read data from each cell. Over time physical effects can cause these voltage distributions to shift, which ultimately leads to cell failure.

As MLC NAND gets close to the end of its life, these margins start narrowing considerably. Continuously programming and erasing NAND cells weakens the oxide, eventually allowing electrons to become stuck in the oxide itself. This phenomenon alters the threshold voltage of the transistor, which in turn shifts bit placements:

 

There's now ambiguity between bits which, if this cell were allowed to remain active in an SSD, would mean that when you go to read a file on your drive there's a chance that you won't actually get the data you're requesting. A good SSD should mark these bits bad at this point.

There's a JEDEC spec that defines what should happen to the NAND once its cells get to this point. For consumer applications, the NAND should remain in a read-only state that can guarantee data availability for 12 months at 30C with the drive powered off. Manufacturers must take this into account when they test and qualify their NAND. If you're curious, JEDEC also offers guidelines on how to cycle test the NAND to verify that it's compliant.

By now we all know the numbers. At 50nm Intel's MLC NAND was rated for 10,000 program/erase cycles per cell. That number dropped to 5,000 at 34nm and remained at the same level with the move to 25nm. Across the industry 3,000 - 5,000 p/e cycles for 2x-nm 2-bit-per-cell MLC (2bpc) NAND is pretty common.

For desktop workloads, even the lower end of that range is totally fine. The SSD in your desktop or notebook is more likely to die because of some silly firmware bug or manufacturing issue than you wearing out the NAND. For servers with tons of random writes, even 5K p/e cycles isn't enough. To meet the needs of these applications, Intel outfitted the 710 with MLC-HET (High Endurance Technology) more commonly known as eMLC.

Fundamentally, Intel's MLC-HET is just binned MLC NAND. SLC NAND gets away with having ultra high p/e cycle counts by only having two bit levels to worry about. The voltage distributions for those two levels can be very far apart and remain well defined over time as a result. I suspect only the highest quality NAND was used as SLC to begin with, also contributing to its excellent endurance.

Intel takes a similar approach with MLC-HET. Placements are much more strict in MLC-HET. Remember what I said earlier, narrow ranges of voltages mapping to each bit level reduces the number of NAND die that will qualify, but you build in more margin as you cycle the NAND. If placements do shift however, Intel's SSD 710 can actually shift read points as long as the placements aren't overlapping.

Similar to frequency binning CPUs, the highest quality NAND with the tightest margins gets binned into MLC-HET while everything else is shipped as standard MLC. And just like with frequency binning, there's a good chance you'll get standard MLC that will last a lot longer than it's supposed to. In fact, I've often heard from manufacturers that hitting up to 30K p/e cycles on standard MLC NAND isn't unrealistic. With its MLC-HET Intel also more frequently/thoroughly refreshes idle NAND cells to ensure data integrity over periods of extended use.

Intel performs one other optimization on MLC-HET. After you've exceeded all available p/e cycles on standard MLC, JEDEC requires that the NAND retain your data in a power-off state for a minimum of 12 months. For MLC-HET, the minimum is reduced to 3 months. In the consumer space you need that time to presumably transfer your data over. In the enterprise world, a dying drive is useless and the data is likely mirrored elsewhere. Apparently this tradeoff also helps Intel guarantee more cycles during the drive's useful life.

At IDF Intel told us the MLC-HET in the SSD 710 would be good for around 30x the write cycles of standard (presumably 25nm) MLC. If we use 3,000 as a base for MLC, that works out to be 90K p/e cycles for Intel's 25nm MLC-HET.

The Drive Total Bytes Written & Spare Area
POST A COMMENT

68 Comments

View All Comments

  • Juri_SSD - Saturday, October 01, 2011 - link

    Anand, I have read your previous articles and there where all somehow good. But this one misses one important thing and therefore there are many comparisons, that aren´t correct. When I saw the video, I just thought what is wrong with you.

    First of all: How dare you to compare a 50nm Flash-SSD with a 25 nm Flash-SSD and say that there is only a saving of cost because of use the cheaper MLC instead of SLC? That is so wrong! You can just shrink the 50nm SLC to 35nm SLC and you have lowered the price to half, then you go on and shrink it to 25nm and you have a further reduction in price and end up at a 1/4 price of an 50nm SLC-NAND just by shrinking the Cells.

    Secondly: How dare you compare a 50nm FLASH-SSD with a 25nm Flash-SSD and then say that you have now more than 64 GB just because Intel wisely uses MLC? Hello? What about shrinking again? Your video is so wrong… 64 GB 50nm SLC -> shrinking -> 128 GB 34nm SLC -> shrinking -> 256 GB 25 nm SLC!

    What do we have? Intel could make a 256 GB SLC-drive just by shrinking. Instead of pointing this out, you told the people how “good” Intel does his job by sorting out good MLC-NAND to compete against an very very very old, really old SSD. The only winner on this “good” job is Intel itself. The enterprise-consumer waits for a competitor who actually shrinks the SLC-Nand to 25nm.

    Then again: You compare GB/Dollar. That is nice. And then you do a long speech about servers that really need all this p/e-cycles. But, if the servers really need all this p/e-cycles, why do you not compare p/e-cycles/Dollar? Perhaps, because the new 710-SSD really sucks on that comparison, also against an really old SLC-SSD like the Intel X25-E?

    Then again, you can say: “All right, you are right Juri, but there are no 34nm SLC-Flash” Ups, this is also untrue, there are 34nm SLC-Flash-drives, so why you don’t compare GB/Dollar with these drives? You don’t know what I mean? How about Intel SSD 311? If you compare that 20 GB SLC 34nm NAnd-Flash drive, you see that the price of an 710-SSD you could easily make with a simple shrinking of SLC-NAND, just like I told in the first point.

    I am really disappointed by your review.

    PS: If you think my english is bad, you can try reading in german: http://hardware-infos.com/news.php?news=3946
    Reply
  • lemonadesoda - Saturday, October 01, 2011 - link

    I disagree with the statement that the SSD market is a race to the bottom. I think this is a lazy catchphrase that demonstrates a company's unwillingness to innovate. It is like saying the CPU or GPU or TFT or mobile handset business is a race to the bottom. Clearly, this is not true!

    There is plenty of room for Intel to innovate, differentiate, and gain margin on consumer SSD.

    What SSD "technologies" would be interesting for the consumer? Encryption; Response-to-theft management; Wear leveling; SMART 2; Thunderbolt, etc. that would allow Intel to lead and to charge a premium on the consumer product.

    Intel owns the Light Peak/Thunderbolt technology. Intel should get Thunderbolt onto it's PC chipset and get a range of SSDs onto Thunderbolt. Why are we using (e)SATA as a slow intermediary layering protocol when thunderbolt could do this and do it better? With Intel thunderbolt on the Intel mainboard, and compatible Intel SSD, we would no longer find PCIe based SSD or RAID0 SATA interesting. Intel could claim the enthusiast (not just enterprise) market in one swoop. And enthusiast drives consumer branding and perception.

    There's still a lot of room for Intel in the SSD market. Or perhaps the current team has run out of ideas and motivation?
    Reply
  • Friendly0Fire - Saturday, October 01, 2011 - link

    Actually, no, there's a point you're missing. At the moment the biggest barrier to adoption with SSDs is... price. Specifically cost/GB. CPUs, GPUs and mobile handsets can be had for all price ranges, thus you see a good amount of spread between low and high end. CPUs and GPUs also have the advantage of being bundled in prefab computers, while mobiles get heavy price cuts through mobile plans.

    SSDs, however, are still restricted to a niche market, only seen as an optional component on high-end computers or bought directly as a separate piece. Sadly, most people still consider "performance" to be summarized by how many GHz and GBs your computer has. SSDs can improve performance tremendously, but good luck explaining what IOPS or bandwidth mean. Until prices are closer to that of magnetic drives, most people won't even be interested in learning about them.

    So yeah, for the time being SSDs are a race to the bottom in the customer market. Performance is what I'd call good enough for 99.95% of computer users, even when you consider 3Gbps last-generation drives. What matters now is price drops.
    Reply
  • EddyKilowatt - Tuesday, October 04, 2011 - link

    I agree that price is the #1 barrier in the minds of potential adopters, but right after that comes reliability, and I think this looms equally large once people get used to the price and understand the performance benefit.

    Many are waiting for all the myriad 'issues' to get sorted out... until they do, it won't truly be a price-driven commodity market. And until they do, Intel can offer added value -- if they're careful about reliability themselves -- that justifies the price premium they'd like to charge.

    Perhaps SSDs aren't as architecture and innovation driven as CPUs, but there's way more to them than just bulk memory mass produced at sweatshop wages.
    Reply
  • AnnonymousCoward - Saturday, October 01, 2011 - link

    Synthetic hard drive comparisons are not reality. Reply
  • Luke212 - Sunday, October 02, 2011 - link

    Anand, Businesses do not run SSDs as single drives or raid 0. Failures being 1-2% it is too disruptive to business (unless they are read only). Can you consider testing these drives in Raid 1, which is how they are used in real life? Reply
  • Iketh - Monday, October 17, 2011 - link

    That would depend on the raid controller's performance, not the drive. Reply
  • ClagMaster - Sunday, October 02, 2011 - link

    "It wouldn't be untrue to say that Intel accomplished its mission."

    Means after the reader deciphers this ...

    "It would be true to say that Intel accomplished its mission."

    Do not do this. I have skinned engineers alive for making this kind of double-negative grammatical error in their reports I often have to shovel through. I hate teaching engineers English. Do make the change.

    Your comments about Intel's leadership in the consumer SSD and enterprise SSD development pretty much hit the nail on the head.

    Intel essentially created the consumer market for these SSDs. Not OCZ, Marvel and Sandforce. They are the dogs eating the crumbs.

    Intel does some serious prototype testing before these products hit the shelves. Far more than its competitors.

    This is another well balanced, high quality SSD.
    Reply
  • ClagMaster - Sunday, October 02, 2011 - link

    When I mean well balanced, I mean this is not a SSD for the obscessive-compulsive speed free with money to burn.

    This SSD is a good balance of cost, performance and reliability for the enterprise space. Its optimized for cost and reliability which limits performance somewhat.

    Althoug slow compared to a Vertex 3, the SSD710 would still provide fine performance for consumer PC's as a boot drive.
    Reply
  • AnnonymousCoward - Sunday, October 02, 2011 - link

    "slow compared to a Vertex 3, the SSD710 would still provide fine performance"

    Wouldn't it be nice to have quantified results??? Like Windows boot time, time to launch programs, and time to open big files.

    Synthetic benchmarks are both inaccurate, and provide no relative information. And synthetic benchmarks have been known to be inaccurate, on Anand's own site!
    ____________
    http://tinyurl.com/yamfwmg

    In IOPS, RAID0 was 20-38% faster; then the loading *time* comparison had RAID0 giving equal and slightly worse performance! Anand concluded, "Bottom line: RAID-0 arrays will win you just about any benchmark, but they'll deliver virtually nothing more than that for real world desktop performance."
    ____________

    Anand stays stubborn to his flawed SSD performance test methods. If anyone is deciding between a Vertex 3 or an Intel, the single most important data would be the quantified time differences in doing different operation. You'll have to go to another website to find that out.
    Reply

Log in

Don't have an account? Sign up now