The NAND: Going Vertical, Not 3D (Yet)

Since we are dealing with a fixed number of NAND packages (unless you want to do Mushkin's and build a dual-PCB design), there are two ways to increase the capacity. One is to increase the capacity per die, which is what has been done before. This is a logical way as the lithographies get smaller (3Xnm was 32Gb, 2Xnm 64Gb and now 1X/2Ynm 128Gb) and you can double the capacity per die while keeping the die are roughly the same as with the previous process node. However, it's not very efficient to increase the capacity per die unless there is a die shrink because otherwise the end result is a large die, which is generally bad for yields and that in turn is bad for financials. Since we are still moving to 128Gb die (Samsung and IMFT have moved already, Toshiba/SanDisk and SK Hynix are still using 64Gb), moving to 256Gb die this quickly was out of question. I do not expect 256Gb die to happen until 2015/2016 and it may well be that we will not even see manufacturers going past 128Gb at that point. We'll see a big push for 3D NAND during the next few years and I am not sure if planar NAND will get to a point where 256Gb die becomes beneficial. 

Samsung SSD 840 EVO mSATA NAND Configurations
Capacity 120GB 250GB 500GB 1TB
Raw NAND Capacity 128GiB 256GiB 512GiB 1024GiB
# of NAND Packages 2 2 4 4
# of Dies per Package 4 8 8 16
Capacity per Die 16GiB 16GiB 16GiB 16GiB
Over-Provisioning 12.7% 9.1% 9.1% 9.1%

If you cannot increase the capacity per die, the only way left is to increase the die count. So far the limit has been eight dies and with traditional packaging methods there is already some performance loss after exceeding four dies per package. That is due to the limits of the interconnects that connect the dies to the PCB and as you add more dies the signal integrity degrades and latency goes up exponentially.

Source: Micron

In order to achieve the 1TB capacity with only four NAND packages, Samsung had to squeeze sixteen NAND dies into one package. To my surprise, when I started researching a bit about Samsung's 16-die NAND, I found out that it's actually nothing new. Their always-so-up-to-date NAND part number decoder from August 2009 mentions a 16-die MLC configuration and I managed to find TechInsights' report of a 512GB SSD used in 2012 Retina MacBook Pro with x-ray shots of a 16-die NAND package. That is a SSD 830 based drive, so I circled back to check the NAND used in the 512GB SSD 830 and it indeed has sixteen 32Gb dies per package too. 

Courtesy of TechInsights

I also made a graph based on the x-ray shot since it's not exactly clear unless you know what you're looking for.

Unfortunately I couldn't find any good x-ray shots of other manufacturers' NAND to see if Samsung's packaging method is different, which would explain their ability to ship a 16-die package with no significant performance loss. However, what I was able to find suggested that others use similar packaging (i.e. an inclinated tower of dies with interconnects falling from both sides). Samsung is also very tight-lipped about their NAND and the technologies involved, so I've not been able to get any details out of them. Anand is meeting with their SSD folks at CES and there is hope that he will be able to convince them to give us even a brief overview.

I am thinking this is not strictly hardware related but software too. In the end, the problem is signal integrity and latency, both which can be overcome with high quality engineering. The two are actually related: Poor signal integrity means more errors, which in turn increases latency because it's up to the ECC engine to fix the error. The more errors there are, the longer it obviously takes. With an effective combination of DSP and ECC (and a bunch of other acronyms), it's possible to stack more dies without sacrificing performance. Samsung's control over the silicon is a huge help here -- ECC needs to be built into the hardware to be efficient and since it's up to Samsung to decide how much resources and die area they want to devote to ECC, they can make it happen.

Introduction, The Drive & The Test Performance Consistency & TRIM Validation
Comments Locked

65 Comments

View All Comments

  • ahar - Thursday, January 9, 2014 - link

    Can we also have one for the article? ;)
    "...so the number's you're seeing here..."
  • Unit Igor - Saturday, January 11, 2014 - link

    Tell me Kristian please would EVO 120GB msata have any advantage over EVO 250gb msata in longer battery life when you compare power consuptipon vs. disk busy times and mb/s.I use my ultrabook only for mails ,somtimes wathing movies and surfing.I dont need more then 120GB SSD but i am willing to buy 250Gb if it would give me more battery life.What i wanted to see in your benchmark is MobileMark 2012 because msata is for laptops and that is where battery life play big role
  • guidryp - Thursday, January 9, 2014 - link

    "endurance is fine for consumer usage"

    Thanks for your opinion, but I'll stick with MLC.

    Do you also think Multi-TB HDDs are fine for consumer use? Since HDDs went over 1TB, they have been failing/wearing out for me regularly. I am sure you can find some theoretical numbers that say these are "fine for consumer usage" as well.

    There is a big trend to bigger sizes but lower reliability. That trend can get stuffed.

    Samsungs advantage of Being the only TLC player strikes me as a reason to avoid Samsung, so I can avoid TLC and decreasing endurance that goes with it.
  • Kristian Vättö - Thursday, January 9, 2014 - link

    That's just your experience, it's not a proof that over 1TB hard drives are less reliable. We can't go out and start claiming that they are less reliable unless we have some concrete proof of that (failures on our end, statistics etc).

    The same applies for TLC. All we have is the P/E cycle number and frankly it gives us a pretty good estimation of the drive's lifespan and those numbers suggest that the endurance of TLC is completely fine for consumer usage. Or do you think our calculations are incorrect?
  • MrSpadge - Thursday, January 9, 2014 - link

    And add to that that the P/E cycles are usually conservatively estimated by manufacturers. The SSD-burn-tests at XS sometimes exceed the ratings significantly.
  • guidryp - Thursday, January 9, 2014 - link

    I think if you examine any aggregate source of reviews like Newegg you will see a significant drop in drive satisfaction do to early failures, since drives went over 1TB. So it isn't just some personal fluke that half of my >1TB drives have failed worn out, so far.

    I am really sick of this trend of declining reliability being sold as good enough. If TLC is "good enough" I will take MLC with 3X "good enough" unless the we are talking about 1/3 the price for TLC.

    Weren't the Samsung 840s failing in days for Anand last year?

    Unlike reviewers, I use my products until they fail, so reliability matters a LOT, and is something that is going in the wrong direction IMO.
  • Kristian Vättö - Thursday, January 9, 2014 - link

    Reliability is not the same as endurance. TLC has lower endurance, that's a fact, but it's not less reliable. Endurance is something you can predict (in the end all electronics have a finite lifespan) but reliability you cannot since there's a lot else than just NAND that can fail. I would claim that today's SSDs are much more reliable than the SSDs we had two years ago -- there haven't been any widespread issues with current drives (compared to e.g. early SandForce drives).

    Yes, we had a total of three 840 and 840 Pros that failed but that was on pre-production firmware. The retail units shipped with a fixed firmware.

    This isn't a new trend. Historically we can go back all the way to 1920s when light bulb companies started rigging their products so the lifespan would be shorter, which would in turn increase sales. Is it fair? Of course not. Do all companies do it? Yes.

    I do see your point but I think you're exaggerating. Even TLC SSDs will easily outlive the computer as a whole since the system will become obsolete in in a matter of years anyway if it's not updated.
  • gandergray - Saturday, January 25, 2014 - link

    For information concerning hard drive failure rates that is more objective, please see the following article: http://www.extremetech.com/extreme/175089-who-make... .
  • althaz - Thursday, January 9, 2014 - link

    TLC is NOT a trade off in reliability, but a tradeoff in longevity.

    Longevity is measured in write-cycles and with heavy consumer loads TLC drives will still last for many years.
  • bsd228 - Thursday, January 9, 2014 - link

    Other than the fact that they both store data, SSDs and HDDs have nothing in common, so it's silly to presume a problem that isn't really what you think it is in the first place. HDDs got dirt cheap as we cross the TB threshold and with it went diligent QA. You want 2TB for $80, you're going to get a higher defect rate. And going to 4 or 5 platters just increases the failure points, but the razor thin margins are the culprit here.

    In contrast, a bigger SSD just means either more chips, or higher density ones. But 16 chips is nothing new, and since there are no mechanical parts, nothing to worry about. Aside from OCZ, the SSD track record for reliability has been pretty solid, and Samsung (and Intel) far better than that. If you want to stick to 256G in your laptop out of a silly fear of TLC, you're just hurting yourself. The Anand guys have already shown how overstated the wear issue has become.

Log in

Don't have an account? Sign up now