The news of Samsung's SSD 840 EVO read performance degradation started circulating around the Internet about a month ago. Shortly after this, Samsung announced that they have found the fix and a firmware update is expected to be released on October 15th. Samsung kept its promise and delivered the update yesterday through its website (download here). 

The fix is actually a bit more than just a firmware update. Because the bug specifically affects the read speed of old data, simply flashing the firmware isn't enough. The data in the drive has to be rewritten for the changes in the new firmware to take place. Thus the fix comes in the form of a separate tool, which Samsung calls Performance Restoration Software. 

For now the tool is limited to the 840 EVO (both 2.5" and mSATA) and will only work under Windows. An OS-independent tool will be available later this month for Mac and Linux users, but currently there is no word on whether the 'vanilla' 840 and the OEM versions will get the update. Samsung told me that they've only seen the issue in the 840 EVO, although user reports suggested that the 'vanilla' 840 is affected as well. I'll provide an update as soon as I hear more from Samsung.

The performance restoration process itself is simple and doesn't require any input from the user once started. Basically, the tool will first update the firmware and ask for a shut down after the update has been completed. Upon the next startup the tool will run the actual three-step restoration process, although unfortunately I don't have any further information about what these steps actually do. What I do know is that all data in the drive will be rewritten and thus the process can take a while depending on how much data you have stored in your drive. Note that the process isn't destructive if completed successfully, but since there is always a risk of data loss when updating the firmware, I strongly recommend that you make sure that you have an up-to-date backup of your data before starting the process.

The restoration tool has a few limitations, though. First, it will require at least 10% of free space or the tool won't run at all, and there is no way around the 10% limitation other than deleting or moving files to another drive before running the tool. Secondly, only NTFS file system is supported at this stage, so Mac and Linux users will have to wait for the DOS version of the tool that is scheduled to be available by the end of this month. Thirdly, the tool doesn't support RAID arrays, meaning that if you are running two or more 840 EVOs in a RAID array, you'll need to delete the array and switch back to AHCI mode before the tool can be run. Any hardware encryption (TCG Opal 2.0 & eDrive) must be disabled too.

In regards to driver and platform support, the tool supports both Intel and AMD chipsets and storage drivers as well as the native Microsoft AHCI drivers. The only limitation is with AMD storage drivers where the driver must be the latest version, or alternatively you can temporarily switch to the Microsoft driver by uninstalling the AMD driver. Samsung has a detailed installation guide that goes through the driver switch process along with the rest of the performance restoration process. 

Explaining the Bug

Given the wide spread of the issue, there has been quite a bit of speculation about what is causing the read performance to degrade over time. I didn't officially post my theory here, although I did Tweet about it and also mentioned it in the comments of the original news post. It turns out that my theory ended up being pretty much spot on as Samsung finally disclosed some details of the source of the bug.

As most of you likely know already, the way NAND works is by storing a charge in the floating gate. The amount of charge determines the voltage state of the cell, which in turn translates to the bit output. Reading a cell basically works by sensing the cell voltage, which works by increasing the threshold voltage until the cell responds.

 

However, the cell charge is subject to multiple variables over time. Electron leakage through the tunnel oxide reduces the cell charge over time and may result in a change in the voltage state. The neighboring cells also have an impact through cell-to-cell interference in the form of floating gate coupling, which is at its strongest when programming a neighbor (or just a nearby) cell. That will affect the charge in the cell and the effect becomes stronger over time if the cell isn't erased and reprogrammed for a long time (i.e. more neighbor cell programs = more interference = bigger shift in cell charge). 

Because cell voltage change is a characteristic of NAND, all SSDs and other NAND-based devices utilize NAND management algorithm that takes the changes into account. The algorithm is designed to adjust the voltage states based on the variables (in reality there are far more than the two I mentioned above) so that the cell can be read and programmed efficiently.

In case of the 840 EVO, there was an error in the algorithm that resulted in an aggressive read-retry process when reading old data. With TLC NAND more sophisticated NAND management is needed due to the closer distribution of the voltage states. At the same time the wear-leveling algorithms need to be as efficient as possible (i.e. write as little as possible to save P/E cycles), so that's why the bug only exists on the 840 and 840 EVO. I suspect that the algorithm didn't take the change in cell voltage properly into account, which translated into corrupted read points and thus the read process had to be repeated multiple times before the cell would return the correct value. Obviously it takes more time if the read process has to be performed multiple times, so the user performance suffered as a result.

Unfortunately I don't have an 840 EVO that fits the criterion of the bug (i.e. a drive with several months old data), so I couldn't test more than the restoration process itself (which was smooth, by the way). However, PC Perspective's and The Tech Report's tests confirm that the tool restores the performance back to the original speeds. It's too early to say whether the update fixes long-term performance, but Samsung assured that the update does actually fix the NAND management algorithm and should thus be a permanent fix. 

The EVO has been the most popular retail SSD so far, so it's great to see Samsung providing a fix in such a short time. None of the big SSD manufacturers have been able to avoid widespread bugs (remember the 8MB bug in the Intel SSD 320 and the 5,000-hour bug in the Crucial m4?) and I have to give Samsung credit for handling this well. In the end, this bug never resulted in data loss, so it was more of an annoyance than a real threat.

Comments Locked

95 Comments

View All Comments

  • Montago - Saturday, October 18, 2014 - link

    My 500 GB Vanilla Samsung 840 is just as bad :(

    I really hope they release a fix - alternatively i could issue a warranty claim
  • Oxford Guy - Friday, October 17, 2014 - link

    "With TLC NAND more sophisticated NAND management is needed due to the closer distribution of the voltage states. At the same time the wear-leveling algorithms need to be as efficient as possible (i.e. write as little as possible to save P/E cycles)

    None of the big SSD manufacturers have been able to avoid widespread bugs ... and I have to give Samsung credit for handling this well."

    This is a problem that comes from using TLC NAND, which is simply inferior to MLC. Unrelated bugs aren't so relevant. Samsung is using all sorts of tricks to try to maintain good performance with an inferior type of NAND.
  • Romberry - Friday, October 17, 2014 - link

    I suggest you go back into the archives here at AnandTech and do some reading on TLC. This whole "inferior type of NAND" stuff is basically without merit for consumer level (or even pro-level, so long as we aren't talking massive continuous I/O such as with a database server) devices.
  • Oxford Guy - Friday, October 17, 2014 - link

    Wrong. It's fundamentally inferior to MLC just as MLC is inferior to SLC in terms of latency, longevity, voltage requirements, and so on. The only thing the more layered NAND types offer is less cost due to higher density. Since MLC is not expensive to make in terms of cost vs. capacity when compared with TLC, TLC is a solution in need of a problem.

    Samsung's many schemes to milk performance out of TLC are being exposed. First there is the abysmal steady state performance of the original 840 120 and now this.
  • Romberry - Saturday, October 18, 2014 - link

    You're arguing semantics. A Buck Riviera is fundamentally inferior for going really fast in comparison to a Formula One car. but it is not fundamentally inferior when used for a suitable purpose. TLC is not "fundamentally inferior" to SLC or MLC for use in consumer class drives, or even regular "business class" drives (not enterprise/server) for that matter.
  • alacard - Saturday, October 18, 2014 - link

    Of course it is inferior, and your analogy is flawed because it only takes speed into account and not durability.

    A better one would be three cars. One made out of steel with a v6 and a supercharger (SLC), one made out of aluminum with a v4 and a supercharger (MLC, and one made out of wood with a v4 (TLC). Sooner than later (when compared to the others) the wood of the TLC automobile is going to rot or buckle under the stress of usage and the car will fall apart. TLC is slower, weaker, and more prone to error than either of the others, which means it is less suitable for any workload compared to the others. And any stress you do put on the drive lowers the endurance and increases the potential for errors far faster than equivalent levels stress placed on the others. This is not semantics, this is reality.

    Now you can do some tricks to increase the speed of TLC, like using system ram for cache or making a portion of the drive act as a supercharged SLC, but in that case you're using SLC for speed not TLC, and the reason you do that is because TLC is inferior and you're trying to mask that.

    I have a 1tb EVO. I'm using it right now to write this to you. But i don't fool myself into thinking i bought something that's faster or more reliable than it actually is. You might do the same.
  • simonpschmitt - Friday, October 17, 2014 - link

    I am getting tired of all the TLC bashing here.
    TLC is the main reason (consumer) SSDs are so cheap that I can now even recommend SSDs for sub 400$ systems. That is a huge win for the avereage user.
    I can see, that TLC needs some more sophisticated algorithms than MLC (Just as MLC needs better ones than SLC). But the manufacturers are getting there and for now without a really bad (read: data loss) bug.
    If you really think TLC is crab, pay the premium for a MLC or SLC Drive and be happy.
  • hojnikb - Friday, October 17, 2014 - link

    Jokes on you, there are plenty mlc drives that are cheaper than tlc drives :)
  • Oxford Guy - Friday, October 17, 2014 - link

    Yeah... some people need to follow slickdeals. There have been plenty of deals on MLC drives from reputable makers.
  • hojnikb - Friday, October 17, 2014 - link

    Yeah. MX100 for example even had msrp quite a bit lower than what TLC were selling for. So yeah, having TLC won't make ssd magically cheaper.

Log in

Don't have an account? Sign up now