The Real Issue

While I was covering MWC a real issue with OCZ's SSDs erupted back home: OCZ aggressively moved to high density 25nm IMFT NAND and as a result was shipping product under the Vertex 2 name that was significantly slower than it used to be. Storage Review did a great job jumping on the issue right away.

Let's look at what caused the issue first.

When IMFT announced the move to 25nm it mentioned a doubling in NAND capacity per die. At 25nm you could now fit 64Gbit of MLC NAND (8GB) on a single die, twice what you could get at 34nm. With twice the density in the same die area, costs could come down considerably.


An IMFT 25nm 64Gbit (8GB) MLC NAND die

Remember NAND manufacturing is no different than microprocessor manufacturing. Cost savings aren't realized on day one because yields are usually higher on the older process. Newer wafers are usually more expensive as well. So although you get ~2x density improvement going to 25nm, your yields are lower and wafers are more expensive than they were at 34nm. Even Intel was only able to get a maximum of $110 decrease in price when going from the X25-M G2 to the SSD 320.

OCZ was eager to shift to 25nm. Last year SandForce was the first company to demonstrate 25nm Intel NAND on an SSD at IDF, clearly the controller support was there. As soon as it had the opportunity to, OCZ began migrating the Vertex 2 to 25nm NAND.

SSDs are a lot like GPUs, they are very wide, parallel beasts. While a GPU has a huge array of parallel cores, SSDs are made up of arrays of NAND die working in parallel. Most controllers have 8 channels they can use to talk to NAND devices in parallel, but each channel can often have multiple NAND die active at once.


A Corsair Force F120 using 34nm IMFT NAND

Double the NAND density per die and you can guess what happened next - performance went down considerably at certain capacity points. The most impacted were the smaller capacity drives, e.g. the 60GB Vertex 2. Remember the SF-1200 is only an 8-channel controller so it only needs eight devices to technically be fully populated. However within a single NAND device, multiple die can be active concurrently and in the first 25nm 60GB Vertex 2s there was only one die per NAND package. The end result was significantly reduced performance in some cases, however OCZ failed to change the speed ratings on the drives themselves.

The matter is complicated by the way SandForce's NAND redundancy works. The SF-1000 series controllers have a feature called RAISE that allows your drive to keep working even if a single NAND die fails. The controller accomplishes this redundancy by writing parity data across all NAND devices in the SSD. Should one die fail, the lost data is reconstructed from the remaining data + parity and mapped to a new location in NAND. As a result, total drive capacity is reduced by the size of a single NAND die. With twice the density per NAND die in these early 25nm drives, usable capacity was also reduced when OCZ made the switch with Vertex 2.

The end result was that you could buy a 60GB Vertex 2 with lower performance and less available space without even knowing it.


A 120GB Vertex 2 using 25nm Micron NAND

After a dose of public retribution OCZ agreed to allow end users to swap 25nm Vertex 2s for 34nm drives, they would simply have to pay the difference in cost. OCZ realized that was yet another mistake and eventually allowed the swap for free (thankfully no one was ever charged), which is what should have been done from the start. OCZ went one step further and stopped using 64Gbit NAND in the 60GB Vertex 2, although drives still exist in the channel since no recall was issued.

OCZ ultimately took care of those users who were left with a drive that was slower (and had less capacity) than they thought they were getting. But the problem was far from over.

Introduction The NAND Matrix
Comments Locked

153 Comments

View All Comments

  • pfarrell77 - Sunday, April 10, 2011 - link

    Great job Anand!
  • ARoyalF - Wednesday, April 6, 2011 - link

    For keeping them honest!
  • magreen - Wednesday, April 6, 2011 - link

    Intro page: "It's also worth nothing that 3000 cycles is at the lower end for what's industry standard..."

    I can't figure out your intent here. Is it worth noting or is it worth nothing?
  • Anand Lal Shimpi - Wednesday, April 6, 2011 - link

    Noting, not nothing. Sorry :)

    Take care,
    Anand
  • magreen - Wednesday, April 6, 2011 - link

    Hey, it was nothing.

    :)
  • vol7ron - Wednesday, April 6, 2011 - link

    Lmao. Magreen, I like how you addressed that.
  • Shark321 - Thursday, April 7, 2011 - link

    On many workstations in my company we have a daily SSD usage of at least 20 GB, and this is not something really exceptional. One hibernation in the evening writes 8 GB (the amount of RAM) to the SSDs. And no, Windows does not write only the used RAM, but the whole 8 GB. One of the features of Windows 8 will be that Windows does not write the whole RAM content when hibernating anymore. Windows 7 disables hibernation by default on system with >4GB of RAM for that very reason! Several of the workstation use RAM-Disks, which write a 2 or 3 GB Images on Shutdown/Hibernate. Since we use VMWare heavily, 1-2 GB is written contanstly all over the day as Spanshots. Add some backup spanshops of Visual Studio products to that and you have another 2 GB.

    Writing 20 GB a day, is nothing unusual, and this happens on at least 30 workstations. Some may even go to 30-40 GB.

    Only 3000 write cycles per cell is the reason why we had several complete failures of SSDs. Three of them from OCZ, one Corsair, one Intel.
  • Pessimism - Thursday, April 7, 2011 - link

    Yours is a usage scenario that would benefit more from running a pair of drives, one SSD and one large conventional hard drive. The conventional drive could be used for all your giant writes (slowness won't matter because you are hitting shut down and walking away) and use the SSD for windows and applications themselves.
  • Shark321 - Friday, April 8, 2011 - link

    HDD slowness does matter! A lot! Loading a VMWare snapshot on a Raptor HDD takes at least 15 seconds, compared to about 6-8 with a SDD. Shrinking the image once a month takes about 30 minutes on a SDD and 3 hours on a HDD!

    Since time is money, HDDs are not an option, except as a backup medium.
  • Per Hansson - Friday, April 8, 2011 - link

    How can you be so sure it is due to the 20GB writes per day?
    If you run out of NAND cycles the drives should not fail (as I'm implying you mean by your description)
    When an SSD runs out of write cycles you will have (for consumer drives) if memory serves about one year before data retention is no longer guaranteed.

    What that means is that the data will be readable, but not writeable
    This of course does not in any way mean that drives could not fail in any other way, like controller failure or the likes

    Intel has a failure rate of ca 0.6% Corsair ca 2% and OCZ ca 3%

    http://www.anandtech.com/show/4202/the-intel-ssd-5...

Log in

Don't have an account? Sign up now