Weaknesses of TLC: One Degree Worse than MLC

In a perfect world, increasing the number of bits per cell sounds like a very easy way to increase capacities while keeping the prices down. So, why not put a thousand bits inside every cell? Unfortunately, there's a downside to storing more bits per cell.

Fundamentally, TLC shares the same problems as MLC when compared to SLC, but takes things one step further. Now that there are eight voltage levels to check, random reads will take more time: 100µs for TLC. That's four times longer than what it takes SLC to random read one bit, and twice as long as what it takes for MLC to complete the same task. Programming will also take longer, but unfortunately we don't have any figures for TLC yet.

  SLC MLC TLC
Bits per Cell 1 2 3
Random Read 25 µs 50 µs 100 µs
Erase 2ms per block 2ms per block ?
Programming 250 µs 900 µs ?

On top of the decrease in performance, TLC also has worse endurance than MLC and SLC. Precise P/E cycle figures are not yet known, but we are most likely looking at around 1000 cycles. Hynix has a brief product sheet for their 48nm TLC flash, which has 2500 P/E cycles. At least in MLC flash, the move to 3Xnm halved the P/E cycles so we would be looking at 1250 cycles. 2Xnm brought even fewer cycles, roughly 3,000, and with same math we get 750 cycles for 2Xnm TLC. X-bit labs reported 1,000 cycles for TLC, which sounds fair. It's also good to keep in mind that endurance can vary depending on the manufacturer and maturity of the process. For example the first 25nm NANDs were good for only ~1,000 cycles, whereas today's chips should last for over 3,000 cycles.

  5Xnm 3Xnm 2Xnm
SLC 100,000

100,000

N/A

MLC 10,000 5,000 3,000
TLC 2,500 1,250

750

But why does NAND with more bits degrade quicker? The reason lies in the physics of silicon. To understand this, we need to take a look at our beloved Mr. N-channel MOSFET again.

When you program a cell, you are placing a voltage on the control gate, while source and drain regions are held at 0V. The voltage forms an electric field, which allows electrons to tunnel through the silicon oxide barrier from the N-channel to the floating gate. This process is called tunneling. The silicon oxide acts as an insulator and will not allow electrons to enter or escape the floating gate unless an electrical field is formed. To erase a cell, you apply voltage on the silicon substrate (P-well in the picture) and keep control gate voltage at zero. An electric field will be formed which allows the electrons to get through the silicon oxide barrier. This is why NAND flash needs to be erased before it can be re-programmed: you need to get rid of the old electrons (i.e. old data) before you can apply new electrons (i.e. new data).

But what does this have to do with SLC, MLC and TLC? The actual MOSFET is exactly the same in all three cases, but take a look at the table below.

SLC "0" High Voltage
"1" Low Voltage
MLC "00" High Voltage
"01" Med-High Voltage
"10" Med-Low Voltage
"11" Low Voltage
TLC "000" Highest Voltage
"001" High Voltage
"010" Med-High Voltage
"100" High-Medium Voltage
"011" Low-Medium Voltage
"101" Med-Low Voltage
"110" Low Voltage
"111" Lowest Voltage


SLC


MLC


TLC

SLC only has two program states, "0" and "1". Hence either a high or low voltage is required. When the amount of bits goes up, you need more voltage stages. With MLC, there are four states, and eight states with TLC. The problem is that the silicon oxide layer is only about 10nm thick and it's not immortal; it wears out every time it's used in the tunneling process. When the silicon oxide layer wears out, the atomic bonds break and during the tunneling process, some electrons may get trapped inside the silicon oxide. This builds up negative charge in the silicon oxide, which negates some of the the control gate voltage.

At first, erasing becomes slower because higher voltages need to be applied (and for a longer time) before the right voltage is found. Higher voltage causes more stress on the oxide, wearing it out even more. Eventually, erasing will take so long that the block has to be retired to maintain the performance. There is a side effect, though. Programming will be faster because there is already some voltage in the cell due to the electron trapping. However, the time won because of that is much smaller than the time it takes to erase the cell when more voltage pulses are required to erase the cell. That's why the block has to be retired when the wear level reaches a certain point.

Here comes the differerence between SLC, MLC and TLC. The fewer bits you have per cell, the more voltage room you have. In other words, SLC can tolerate more changes in the voltage states because it has only two states. In TLC, there are eight, so the margin for errors is a lot smaller.

Lets assume that we have an SLC NAND that takes voltage between 0V and 14V. To program the cell to "1", a voltage between 4V and 5V needs to be applied. Likewise, you need a voltage from 9V to 10V to program the cell to "0". In this scenario, there is 4V of "spare" voltage between the states. If we apply this example to MLC NAND, the spare voltage will be cut to half, 2V. With TLC, that spare value is only 0.67V if we use the same 1V per voltage state ideaology.

However, when the oxide wears out and a higher voltage is needed, the programming voltages go up. To use the SLC example above, you would now need a voltage between 4V and 6V to program the cell to "0". That means a 1V loss in the spare voltage. And here comes the difference. Since SLC has more spare voltage between the states, it can tolerate a higher voltage change until the erase will be so slow that the block needs to be retired. This is why SLC has a substantially higher P/E cycle count; you can erase and reprogram the cell more times. Likewise, TLC tolerates the least change in voltage states, so it has the lowest amount of P/E cycles.

A Brief Introduction to SSDs and Flash Memory Availability and Controller support
POST A COMMENT

90 Comments

View All Comments

  • themossie - Friday, February 24, 2012 - link

    The information is straight from Micron, it's just an awkward way to explain the concept. If you want to keep the industry standard capacities in your explanation, perhaps show the math as capacity/(1, 2, 3) = transistors rather than transistors * (1, 2, 3) = capacity? If capacity is fixed, solving for number of transistors required seems more intuitive.

    Corsair, OCZ and Kingston all make 90 GB Sandforce 2281 SSDs. I don't know how many channels / what NAND die they use. Searching that information brought up this website first every time! Upon further consideration, I blame aNAND... :-)
    Reply
  • Kristian Vättö - Saturday, February 25, 2012 - link

    90GB SSDs have 96GB of NAND in them (remember that SandForce drives have ~7% over-provisioning). Most 2.5" drives have sockets for 16 NAND devices so that's simply twelve 8GB packages. Reply
  • Confusador - Friday, February 24, 2012 - link

    I read the comments thread looking for this answer, so thank you. I still don't see the logic behind it, as others have pointed out that storage capacities haven't been power-of-2 for decades. It could conceivably be firmware related, but given that overprovisioning makes (e.g.) 60 and 120 GB fairly common that seems unlikely.

    Anyway, just some questions to keep in mind as you're in contact with the manufacturers. Thanks again for the great article, as the coverage here continues to be second to none.
    Reply
  • AnnihilatorX - Friday, February 24, 2012 - link

    It has been claimed that algorithms to minimize write amplification will follow Moore's Law


    That's not really possible due to information theory. You can only compress information to reduce write by so much (entropy theory). The improvement will be more like an exponential decay rather than an exponential growth (Moore's law)
    Reply
  • Shadowmaster625 - Friday, February 24, 2012 - link

    I estimate somewhere around $80 billion has been invested in the NAND flash market, cumulatively. Despite this enormous capital investment, I am surprised prices are still so high. You'd think with this type of mass economy of scale, it wouldnt cost so much to produce 1TB of flash. I wonder how much energy it takes to produce 1TB of flash... Reply
  • MrSpadge - Friday, February 24, 2012 - link

    There's so much unused space in 2.5" SSDs, let alone 3.5" drives for desktops. People wouldn't need to worry about TLC endurance, if the NAND was put into sockets and could easily be replaced. Or upgraded later on for higher capacities. And by the time you'd be doing this NAND prices will have fallen again. There'd need to be a standard for this, though...

    MrS
    Reply
  • mark53916 - Friday, February 24, 2012 - link


    As late as 2010 SLC's typically had 10 year retention time when new, down to
    about 1 year as cells got reprogrammed and the end of life was
    indicated for the device. (The number of erase cycles was
    also higher than now, but had be decreasing for a few years prior
    also.)

    I don't know about new cell retention time when new for SLC's
    now, but MLCs either show no spec or the retention time spec for NEW
    cells is about 18 months.

    For the various reasons mentioned in the article and earlier comments,
    the effect of MLCs is that speed has been reduced and data retention time
    is reduced and the fraction of long error correction time has increased
    dramatically.

    MLCs are not suitable for long term backups and spinning drives were never
    good for more than 5 years EXPECTED powered off life)

    MLCs just get 2 times as much storage for the same price 18 months earlier.

    In the meantime, due to supply issues (capacity being used for MLC instead
    of SLC) Thus SLC typically cost 8 times as much per GB compared
    to MLC, rather than less than 2 times as much.) This amounts
    to about a 3 year delay in SLCs reaching a given price level.

    (MLC also typically comes with implementation side effects
    [interleaved data layout, in particular] that means that data in
    unchanged pages as seen outside of the SSD is rewritten
    because data was changed at the interleaved logical location,
    not because the SSD software decided that the data was getting
    "weak" and needed to be refreshed.)
    Reply
  • Hulk - Friday, February 24, 2012 - link

    Timely, informative, well written, and just the right amount of technical detail.
    Really nice job.
    Reply
  • valnar - Friday, February 24, 2012 - link

    I'm not sure who the target audience of TLC is. Is there really a group of people out there that is willing to sacrifice reliability and data integrity for price or capacity? I certainly wouldn't.

    It's bad enough that modern hard drives in the 2TB range have longevity problems. I don't want my SSD to be in the same boat, especially since that SSD tends to be the boot drive on most PC's.
    Reply
  • foolsgambit11 - Friday, February 24, 2012 - link

    I'm assuming TLC is a subclass of MLC, and not actually distinct as it's laid out in this article. Before TLC came along, all MLC belonged to (what I'll call) the DLC subclass, yeah?

    SLC = Single level cell
    MLC = Multi level cell
    a. DLC = Dual level cell
    b. TLC = Triple level cell
    Reply

Log in

Don't have an account? Sign up now