Concluding Remarks

My Saturday plans went haywire, thanks to the DS414j going belly-up. However, I did end up proving that as long as the disks were functional, it is possible to easily recover data from a Synology RAID-5 volume by connecting the drives to a PC and using UFS Explorer. Users wanting more of a challenge can also use Ubuntu and mdadm for the same purpose. In my case, the data was in a SHR (Synology Hybrid RAID) volume with 1-disk redundancy, but the disks were all of the same size (making it RAID-5 effectively).

Lessons that I learned from my data recovery experience:

  • Have access to a PC with multiple spare SATA slots, preferably hot-swap capable
  • Back up data written to a NAS frequently (if possible, in real-time)
  • Have access to a high capacity DAS (with more free space than the largest NAS volume that you may have to recover)
  • Avoid encrypting shared folders and/or volumes, if possible
  • Prefer straightforward RAID-x volumes compared to customized (note: customized need not necessarily mean proprietary) RAID implementations and/or automatic RAID level management (such as Synology's SHR / Seagate's SimplyRAID / Netgear X-RAID2)
  • In critical environments, run two NAS units in high availability (HA) mode

Things I would like from the NAS vendors' side (Synology already ticks most of these):

  • Don't use proprietary RAID / hardware RAID for consumer NAS units
  • Instead of (or, in addition to) supplying backup software, provide licensed versions of data recovery software such as UFS Explorer (or, supply one developed internally for Windows / Mac / Linux)
  • Provide official documentation for recovering data using PCs in case of NAS hardware failure (using either commercial software such as UFS Explorer or open source ones like TestDisk)

Synology alone is not to blame for this situation. If QNAP's QSync had worked properly, I could have simply tried to reinitialize the NAS instead of going through the data recovery process. That said, for the same purpose, QNAP's QSync worked much better than Synology's Cloud Station (which was the primary reason our configuration utilized a share set up on the DS414j as the target folder location for QNAP's QSync). In any case, I would like to stress that this anecdotal sample point in no way reflects the reliability of Synology's NAS units. I used to run a DS211+ 24x7 without issues for 3 years before retiring it. More recently, our Synology DS1812+ has been running 24x7 for the last one year as a syslog server. The DS414j which failed on me has been in operation for less than two months. I put it down to the 'infant mortality' component in the reliability engineering 'bathtub curve'. Synology provides a 2-year warranty on the DS414j, and any end-users affected by such hardware issues are definitely protected. One just needs to make sure that the data on the NAS is backed up frequently.

DS414j Status: Disk Problems or Hardware Failure?
Comments Locked

55 Comments

View All Comments

  • zodiacfml - Saturday, August 23, 2014 - link

    Still that hard to restore files from a NAS? Vendors should develop a better way.
  • colecrowder - Saturday, August 23, 2014 - link

    At work we recently had 2 drives error on a Synology RAID 5. It wasn't quite total failure of two drives, but it crashed the volume. It's a 30+ TB system for our film digitization business, 13-disk RAID (1812+ and 513 expansion) and we've tried just about everything to recover, to no avail. UFS didn't help, an expert in the Ubuntu method we hired couldn't fix it either. Lesson learned: back everything up every night! Did find this useful guide for situations like ours, though:

    http://community.spiceworks.com/how_to/show/24731-...
  • ganeshts - Saturday, August 23, 2014 - link

    Shame about the lost data, but the link is definitely interesting.

    In case of drive failures within accepted limits (1 for RAID-5, 2 for RAID-6), the NAS itself should be able to rebuild the array. If more drives are lost, RAID rebuild softwares can't help since data is actually missing and there is no parity to recover the lost data.

    That said, if there is a drive failure as well as a NAS failure, I would personally make sure to image the remaining live drives on to some other storage before attempting recovery using software (rather than trying to recover from the live disks themselves)
  • Navvie - Monday, September 1, 2014 - link

    30TB RAID5? Please tell me you replaced that array with something more suitable.
  • DNABlob - Sunday, August 24, 2014 - link

    Good article & research.

    These days for my personal stuff, I use cloud backups (CrashPlan) and a single disk or striped pair (space and/or performance). If quick recovery is imperative, I'll employ something like AeroFS to sync data between two hosts on the same LAN. Pretty decent setup if you don't need to maintain meta-data like owner & ACLs.

    I'll spare you a long diatribe about software RAID5 and how a partial stripe write can silently corrupt data on a crash. As far as I can tell, this isn't fixed in Linux's RAID implementation. At the time Sun was very proud of their ZFS / RAID-Z implementation which fixed the partial write problem. For light write workloads, partial stripe writes are unlikely, but still a very real risk.

    https://blogs.oracle.com/bonwick/en_US/entry/raid_...
  • KAlmquist - Monday, August 25, 2014 - link

    In reply to DNABlob: As far as I know, no released version of the Linux RAID has had a problem with silent data corruption, so there is no need for a fix.

    The author of the article you've linked acknowledges that RAID 5 can be implemented correctly in software when he writes, "There are software-only workarounds for this, but they're so slow that software RAID has died in the marketplace." It is true that an incorrect implementation of RAID 5 could result in silent data corruption, but the same thing can be said of any software, including ZFS. ZFS includes checksums on all data, but those checksums don't do any good if a careless programmer has neglected to call the code that verifies the checksums.
  • elFarto - Sunday, August 24, 2014 - link

    The reason your mdadm commands weren't working is because you were attempting to use the disks themselves, not their partitions.
  • mannyvel - Monday, August 25, 2014 - link

    What the article shows is that if your device uses a Linux raid implementation you can get the data off of your drives if your device goes belly up by using free or commercial tools. While useful, you could have done the same thing by buying a new device and dropping your drives in - correct?

    This isn't really data recovery where your raid craps out because of a two-drive failure or some other condition that whacks your data. This is recovery due to an enclosure failure. Show me a recovery where your RAID dies, not where your enclosure dies.
  • Lerianis - Friday, September 5, 2014 - link

    Not always. Some machines are so badly designed that they INITIALIZE (wipe the drives) when old drives with data are put into them.
  • crashplan - Thursday, September 18, 2014 - link

    True. Anyways great article.

Log in

Don't have an account? Sign up now