POST A COMMENT

79 Comments

Back to Article

  • sleewok - Monday, July 21, 2014 - link

    Based on my experience with the WD Red drives I'm not surprised you had one fail that quickly. I have a 5 disk (2TB Red) RAID6 setup with my Synology Diskstation. I had 2 drives fail within a week and another 1 within a month. WD replaced them all under warranty. I had a 4th drive seemingly fail, but seemed to fix itself (I may have run a disk check). I simply can't recommend WD Red if you want a reliable setup. Reply
  • Zan Lynx - Monday, July 21, 2014 - link

    If we're sharing anecdotal evidence, I have two 2TB Reds in a small home server and they've been great. I run a full btrfs scrub every week and never find any errors.

    Child mortality is a common issue with electronics. In the past I had two Seagate 15K SCSI drives that both failed in the first week. Does that mean Seagate sucks?
    Reply
  • icrf - Monday, July 21, 2014 - link

    I had a lot of trouble with 3 TB Green drives, had 2 or 3 early failures in an array of 5 or 6, and one that didn't fail, but silently corrupted data (ZFS was good for letting me know about that). Once all the failures were replaced under warranty, they all did fine.

    So I guess test a lot, keep on top of it for the first few months or year, and make use of their pretty painless RMA process. WD isn't flawless, but I'd still use them.
    Reply
  • Anonymous Blowhard - Monday, July 21, 2014 - link

    >early failure
    >Green drives

    Completely unsurprised here, I've had nothing but bad luck with any of those "intelligent power saving" drives that like to park their heads if you aren't constantly hammering them with I/O.

    Big ZFS fan here as well, make sure you're on ECC RAM though as I've seen way too many people without it.
    Reply
  • icrf - Monday, July 21, 2014 - link

    I'm building a new array and will use Red drives, but I'm thinking of going btrfs instead of zfs. I'll still use ECC RAM. Did on the old file server. Reply
  • spazoid - Monday, July 21, 2014 - link

    Please stop this "ZFS needs ECC RAM" nonsense. ZFS does not have any particular need of ECC RAM that every other filer doesn't. Reply
  • Anonymous Blowhard - Monday, July 21, 2014 - link

    I have no intention of arguing with yet another person who's totally wrong about this. Reply
  • extide - Monday, July 21, 2014 - link

    You both are partially right, but the fact is that non ECC RAM on ANY file server can cause corruption. ZFS does a little bit more "processing" on the data (checksums, optional compression, etc) which MIGHT expose you to more issues due to bit flips in memory, but stiff if you are getting frequent memory errors, you should be replacing the bad stick, good memory does not really have frequent bit errors (unless you live in a nuclear power station or something!)

    FWIW, I have a ZFS machine with a 7TB array, and run scrubs at least once a month, preferably twice. I have had it up and running in it's current state for over 2 years and have NEVER seen even a SINGLE checksum error according to zpool status. I am NOT using ECC RAM.

    In a home environment, I would suggest ECC RAM, but in a lot of cases people are re-using old equipment, and many times it is a desktop class CPU which won't support ECC, which means moving to ECC ram might require replacing a lot of other stuff as well, and thus cost quite a bit of money. Now, if you are buying new stuff, you might as well go with an ECC capable setup as the costs aren't really much more, but that only applies if you are buying all new hardware. Now for a business/enterprise setup yes, I would say you should always run ECC, and not only on your ZFS servers, but all of them. However, most of the people on here are not going to be talking about using ZFS in an enterprise environment, at least the people who aren't using ECC!

    tl/dr -- Non ECC is FINE for home use. You should always have a backup anyways, though. ZFS by itself is not a backup, unless you have your data duplicated across two volumes.
    Reply
  • alpha754293 - Monday, July 21, 2014 - link

    The biggest problem I had with ZFS is its total lack of data recovery tools. If your array bites the dust (two non-rotating drives) on a stripped zpool, you're pretty much hosed. (The array won't start up). So you can't just do a bit read in order to recover/salvage whatever data's still on the magnetic disks/platters of the remaining drives and the're nothing that has told me that you can clone the drive (including its UUID) in its entirety in order to "fool" the system thinking that it's the EXACTLY same drive (when it's actually been replaced) so that you can spin up the array/zpool again in order to begin the data extraction process.

    For that reason, ZFS was dumped back in favor of NTFS (because if an NTFS array goes down, I can still bit-read the drives, and salvage the data that's left on the platters). And I HAD a Premium Support Subscription from Sun (back when it was still Sun), and even they TOLD me that they don't have ANY data recovery tools like that. And they couldn't tell me the procedure for cloning the dead drives either (including its UUID).

    Btrfs was also ruled out for the same technical reasons. (Zero data recovery tools available should things go REALLY farrr south.)
    Reply
  • name99 - Tuesday, July 22, 2014 - link

    "because if an NTFS array goes down, I can still bit-read the drives, and salvage the data that's left on the platters"
    Are you serious? Extracting info from a bag of random sectors was a reasonable thing to do from a 1.44MB floppy disk, it is insane to imagine you can do this from 6TB or 18 TB or whatever of data.
    That's like me giving you a house full of extremely finely shredded then mixed up paper, and you imaging you can construct a million useful documents from it.
    Reply
  • alpha754293 - Monday, July 21, 2014 - link

    Yeah....while the official docs say you "need" ECC, the truth is - you really don't. It's nice, and it'll help to mitigate like bit-flip errors and stuff like that, but I mean...by that point, you're already passing PBs of data through the array/zpool before it's even noticable. And part of that has to do with the fact that it does block-by-block checksumming, which means that given the nature of how people run their systems, it'll probably reduce your ERRs even further, but you might be talking like a third of what's already an INCREDIBLY small percentage.

    A system will NEVER complain if you have ECC RAM (and have ECC enabled, because my servers have ECC RAM, but I've always disabled ECC in the BIOS), but it isn't going to NOT startup if you have ECC RAM, but with ECC disabled.

    And so far, I haven't seen ANY discernable evidence that suggests that ECC is an absolute must when running ZFS, and you can SAY that I am wrong, but you will also need to back that statement up with evidence/data.
    Reply
  • npz - Tuesday, July 22, 2014 - link

    The reason why you need ECC with ZFS, BTRFS, ReFS (if checksum is enabled), is because what happens if you encounter a bit flip while doing the first checksum calculation that will be stored on disk?

    If you are doing a 2 way mirror, you now have two permanently "bad" blocks. That makes it unrecoverable, even though the disks are perfectly fine and the blocks are actually fine.
    Reply
  • m0du1us - Friday, July 25, 2014 - link

    @npz Agreed, the whole point of using ZFS, etc... is to utilize the extended data security feature of the FS. For instance file check-summing, data-deduplication and block indexing which all rely on correct check-sums. On a new system build, ECC is a must for a reliable file system. Reply
  • AlmaFather - Monday, July 28, 2014 - link

    Some information:

    http://forums.freenas.org/index.php?threads/ecc-vs...
    Reply
  • Samus - Monday, July 21, 2014 - link

    The problem with power saving "green" style drives is the APM is too aggressive. Even Seagate, who doesn't actively manufacture a "green" drive at a hardware level, uses firmware that sets aggressive APM values in many low end and external versions of their drives, including the Barracuda XT.

    This is a completely unacceptable practice because the drives are effectively self-destructing. Most consumer drives are rated at 250,000 load/unload cycles and I've racked up 90,000 cycles in a matter of MONTHS on drives with heavy IO (seeding torrents, SQL databases, exchange servers, etc)

    HDPARM is a tool that you can send SMART commands to a drive and disable APM (by setting the value to 255) overriding the firmware value. At least until the next power cycle...
    Reply
  • name99 - Tuesday, July 22, 2014 - link

    I don't know if this is the ONLY problem.
    My most recent (USB3 Seagate 5GB) drive consistently exhibited a strange failure mode where it frequently seemed to disconnect from my Mac. Acting on a hunch I disabled the OSX Energy Saver "Put hard disks to sleep when possible" setting, and the problem went away. (And energy usage hasn't gone up because the Seagate drive puts itself to sleep anyway.)

    Now you're welcome to read this as "Apple sux, obviously they screwed up" if you like. I'd disagree with that interpretation given that I've connected dozens of different disks from different vendors to different macs and have never seen this before. What I think is happening is Seagate is not handling a race condition well --- something like "Seagate starts to power down, half-way through it gets a command from OSX to power down, and it mishandles this command and puts itself into some sort of comatose mode that requires power cycling".

    I appreciate that disk firmware is hard to write, and that power management is tough. Even so, it's hard not to get angry at what seems like pretty obvious incompetence in the code coupled to an obviously not very demanding test regime.
    Reply
  • jay401 - Tuesday, July 22, 2014 - link

    > Completely unsurprised here, I've had nothing but bad luck with any of those "intelligent power saving" drives that like to park their heads if you aren't constantly hammering them with I/O.

    I fixed that the day i bought mine with the wdidle utility. No more excessive head parking, no excessive wear. I've had 3 2TB Greens and 2 3TB Greens with no issues so far (thankfully). Currently running a pair of 4TB Reds, but have not seen any excessive head parking showing up in the SMART data with those.
    Reply
  • chekk - Monday, July 21, 2014 - link

    Yes, I just test all new drives thoroughly for a month or so before trusting them. My anecdotal evidence across about 50 drives is that they are either DOA, fail in the first month or last for years. But hey, YMMV. Reply
  • icrf - Monday, July 21, 2014 - link

    My anecdotal experience is about the same, but I'd extend the early death window a few more months. I don't know that I've gone through 50 drives, but I've definitely seen a couple dozen, and that's the pattern. One year warranty is a bit short for comfort, but I don't know that I care much about 5 years over 3. Reply
  • Guspaz - Tuesday, July 22, 2014 - link

    I've had a bunch of 2TB greens in a ZFS server (15 of them) for years and none of them have failed. I expected them to fail, and I designed the setup to tolerate two to four of them failing without data loss, but... nothing. Reply
  • Wixman666 - Tuesday, July 22, 2014 - link

    The WD Green and other drives fail more often because they are not intended for NAS use. They lack the anti-vibration mechanism, so they shake apart. Reply
  • anandtech_user01 - Monday, July 21, 2014 - link

    Given my own previous experiences, I wouldn't trust high density 3.5" drives from any manufacturer period. Of course, lots of other people do (or somehow feel forced to). As for WD RED, they can be better than the competition in respect to being able to switch off the head-parking every 10 seconds (with wdidle3.exe DOS utility). Wheras I've got 2 Seagate Momentus 7200.2/4 and that's impossible with them. Looking at the specs for the 2.5" RED it's pretty much just a re-branded WD Black from the previous year. And those ones have been out for a few years, and are reliable / do have a good reputation. Wheras I've heard that the 3.5" RED something more like a re-worked / tweaked 3.5" WD Green. Not so sure about those ones. Reply
  • kmi187 - Monday, July 21, 2014 - link

    Anecdotal evidence, while it should not be discarded, is rather irrelevant since the scale is rather low. Now if you look at some statistics from data-centers it shows seagate as a clear winner when it come to failure rates. In other words, they aren't doing too well.

    This data is much more reliable since it's data from a lot of hard drives, so it paints a much clearer picture.

    I build custom pc's for a computer store and it's just not fun when you have kickass system that went out the door and 2 months later you have to tell the client, yeah sorry man but the drive died, we are going to have to reinstall your pc. If that starts to happen on regular basis, you know you have to look for alternatives.
    Reply
  • romrunning - Monday, July 21, 2014 - link

    Why wouldn't you build w/SSDs as your base drive, and only use spinning disks as secondary storage? It would seem that you would have better reliability that way. Reply
  • asmian - Monday, July 21, 2014 - link

    Agree! It clearly isn't a "kickass" custom system if there's no SSD as boot drive. I pity these customers if they are being sold something as "special" without that as a basic building block these days. :( Please tell us where you work so we can avoid your store. Reply
  • erple2 - Tuesday, July 22, 2014 - link

    Because managing two disks is a total waste of time and resources. If you're already going to a shop to have a machine built for you, then you've realized that your time is worth more to you than the inconvenience or rolling your own machine. Therefore, it stands to reason that you would also not want to be bothered with having to juggle installations to ssd vs. HDD. I have a setup like that in a laptop, and I hate having to figure out what applications I should put on the ssd vs the HDD. Just give me something that works. That and my time is worth far more than the cost spent trying to juggle applications on and off the ssd when it fills up.

    That having been said, ssd is pretty cheap now, so I'm not sure why you wouldn't put in a 500 gb to 1 tb ssd in a higher end build.
    Reply
  • Samus - Monday, July 21, 2014 - link

    I actually haven't had a drive from Seagate, WD or Hitachi fail in years. The last one was a 7200.10 1.5TB (~2008)

    I have more WD Red's in deployment than any other drive and they've all been great. However, the largest capacity I've rolled out are 2TB models.
    Reply
  • iLovefloss - Monday, July 21, 2014 - link

    Their older Barracuda drivers sure did.

    http://blog.backblaze.com/2014/01/21/what-hard-dri...

    Of course, they gotten better.

    http://www.hardware.fr/articles/920-6/disques-durs...

    Still, no matter how you look at it, those "Green" drives tend to fail more often than their other counterparts.
    Reply
  • cm2187 - Tuesday, July 22, 2014 - link

    Same here. A bunch of 4TB Red and never managed to make them work in a hardware RAID array (LSI and Adaptec). Same symptoms as in the article. Drive marked as failed in the array but works well as standalone. Still had some problems though much less in a soft array (synology). Hitachi desktop drives behave much better in a hardware RAID. Reply
  • LoneWolf15 - Friday, July 25, 2014 - link

    I've been running Reds (mix of 2TB and 3TB, as I'm slowly migrating capacity) in an HP SmartArray P222 in an HP Microserver Gen8 for some time now. 6TB RAID-5 array and no failures.

    I will admit, I haven't tried the 4TB models for RAID, but do have one in a USB3 MacAlly external enclosure for backing up the box (Server 2012R2 Standard).
    Reply
  • JohnMD1022 - Thursday, July 24, 2014 - link

    Actually, yes.

    I have seen too many bad Seagate drives to use or recommend.

    At one point, I had 9 bad Seagates in my shop at the same time.

    In addition, their customer service leaves a lot to be desired.
    Reply
  • comomolo - Monday, July 21, 2014 - link

    Have you actually read the article?

    It's clearly written that the drive DID NOT fail. The drive in question passed all the tests and ran perfectly fine by itself on a PC. The author states this looks like a compatibility issue with QNAP's server.
    Reply
  • GTVic - Monday, July 21, 2014 - link

    A lot of people claim the failure is related to shipping methods, particularly blaming Newegg on this. Proper shipping = reliable drive. I'd believe that sooner than "WD Red sucks" comments. Reply
  • Wixman666 - Tuesday, July 22, 2014 - link

    I have a sea of WD Red hard drives out in the field at various customer locations. I've only ever had one fail. Reply
  • romrunning - Monday, July 21, 2014 - link

    I don't understand - if you had both WD Red and WD Red Pro drives (according to your other quick note on these new drive models), why didn't you review the WD Red Pro? Reply
  • ganeshts - Monday, July 21, 2014 - link

    As I wrote in the pipeline section, the WD Red Pro review will come next week.

    This is for the 6 TB capacity.

    The 4TB versions' review will include the WD Red Pro (sometime next week)
    Reply
  • continuum - Saturday, July 26, 2014 - link

    http://forums.storagereview.com/index.php/topic/36...

    Claims there's an early model issue on the regular WD Red's causing them to be invalid? But that's the only site I've heard of claiming this...
    Reply
  • Rythan - Monday, July 21, 2014 - link

    I've gone through this article a couple of times - where are the idle and load power numbers? Reply
  • ganeshts - Monday, July 21, 2014 - link

    I will add them later tonight (along with the missing He6 benchmark numbers). Reply
  • romrunning - Tuesday, July 22, 2014 - link

    Ah... It just seems that some of your numbers in this face-off would change if the WD drive was 7200rpm instead of 5400rpm. Perhaps that would affect your conclusion as well. But I suppose if you didn't get a 6TB WD Red Pro drive, then it's a moot point. Reply
  • ganeshts - Tuesday, July 22, 2014 - link

    There is no 6 TB WD Red Pro out in the market. The Pro version tops out at 4 TB (for now) - 800 GB x 5 platters Reply
  • harshw - Monday, July 21, 2014 - link

    This week I had a LaCie 5Big NAS Pro barf on my 4TB Seagate NAS HDDs. Reformatting and re-testing them with sector scans revealed nothing. But the LaCie would claim that one disk was bad. Of course LaCie also claims the 4TB NAS HDDs are completely compatible.

    But to have a 16TB array die after re-synching 80% and having to start from scratch ... yeah it plain sucks.

    So yes, it is best to look at evidence from the field and not just rely on manufacturer's recommended & compatible lists. And it's not just WD Red ...
    Reply
  • Hrel - Monday, July 21, 2014 - link

    It's frustrating that despite the rapid growth in the NAS industry hard drive prices have remained largely stagnant. 4TB drives are basically at the same price point's they were a year ago. It used to be if a drive was released at $200 a year later it was $100 or less.

    I'm still waiting for 4TB drives to drop to the $100 mark before I make the jump.

    What happened to all those 121TB hard drives that we were supposed to be seeing? I specifically remember an article on anandtech like 1+ years ago talking about how 12TB hard drives would be a reality "a year from today". More than a year later, we're talking about 6TB drives. Very upsetting.
    Reply
  • Beany2013 - Tuesday, July 22, 2014 - link

    The floods in Thialand a few years ago set things back - we're only now seeing the manufacturers get their primary build locations back up to full speed not just in manufacturing existing gear, but developing new stuff.

    We've had 2tb drives for *years*, but 3tb and above are the results of the 'HDD Homelands' getting back up to speed as I understand it from my works disty/channel contacts, at least. Mebbe one for a pipeline article on the stagnation of HDD capacity, staffers?
    Reply
  • extide - Monday, July 21, 2014 - link

    When are we going to get 4k native drives!! I hate this stupid 512b emulation crap! Reply
  • Zan Lynx - Monday, July 21, 2014 - link

    I am pretty sure 4K native drives are already out there. I recall a Linux Kernel message thread discussing testing 4K drives and there was a tool to turn off 512B emulation.

    If you want them to turn off emulation, I doubt that will happen. Its too easy to leave the code in the firmware.
    Reply
  • edlee - Monday, July 21, 2014 - link

    seagate wipes the floor with all other manufactures when it comes to enterprise products.

    that being said, they are expensive as shit, so I bought WD red for my home Nas, but use seagate in my office server for reliability.
    Reply
  • jabber - Monday, July 21, 2014 - link

    Liabilities waiting to happen. Reply
  • asmian - Monday, July 21, 2014 - link

    Completely agree! The two enterprise-class drives are just about OK to use in arrays (and the Helium tech of the HGST looks very interesting, I hope they bring that to smaller drives as well) but the WD Red at that size is a crazy proposition. See my calculation about the risks of rebuilding arrays with those at http://anandtech.com/comments/8273/western-digital...

    Anybody building large arrays with these consumer-class 6TB Reds is a fool.
    Reply
  • bsd228 - Monday, July 21, 2014 - link

    Asmian - you may be taking that URE value too literally. I find it very hard to believe that enterprise drives are exactly 10x as good as consumer drives. When the number is so round as 1x10^14 or 1x10^15, it makes me believe them the same way I do the MTBF values. Consider how many premium or enterprise products we see where the only different is a software setting activating a feature. Reply
  • jabber - Tuesday, July 22, 2014 - link

    Quality of HDDs is plummeting. The mech drive makers have lost interest, they know the writing is on the wall. Five years ago it was rare to get a HDD fail of less than 6 months old. But now I regularly get in drives with bad sectors/failed mechanics in that are less than 6-12 months old.

    I personally don't risk using any drives over a terrabyte for my own data.
    Reply
  • asmian - Tuesday, July 22, 2014 - link

    You're not seriously suggesting that WD RE drives are the same as Reds/Blacks or whatever colour but with a minor firmware change, are you? If they weren't significantly better build quality to back up the published numbers I'm sure we'd have seen a court case by now, and the market for them would have dried up long ago.

    On the subject of my rebuild failure calculation, I wonder whether that is exactly what happened to the failing drive in the article: an unrecoverable bit read error during an array rebuild, making the NAS software flag the drive as failed or failing, even though the drive subsequently appears to perform/test OK. Nothing to do with compatability, just the verification of their unsuitability for use in arrays due to their size increasing the risk of bit read errors occurring at critical moments.
    Reply
  • NonSequitor - Tuesday, July 22, 2014 - link

    It's more likely that they are binned than that they are manufactured differently. Think of it this way: you manufacture a thousand 4TB drives, then you take the 100 with the lowest power draw and vibration. Those are now RE drives. Then the rest become Reds.

    Regarding the anecdotes of users with several grouped early failures: I tend to blame some of that on low-dollar Internet shopping, and some of it on people working on hard tables. It takes very little mishandling to physically damage a hard drive, and even if the failure isn't initial a flat spot in a bearing will eventually lead to serious failure.
    Reply
  • Iketh - Tuesday, July 22, 2014 - link

    LOL no Reply
  • m0du1us - Friday, July 25, 2014 - link

    @NonSequitor This is exactly how enterprise drives are chosen, as well as using custom firmware. Reply
  • LoneWolf15 - Friday, July 25, 2014 - link

    Aren't most of our drives fluid-dynamic bearing rather than ball bearing these days? Reply
  • asmian - Wednesday, July 23, 2014 - link

    Just in case anyone is still denying the inadvisability of using these 6TB consumer-class Red drives in a home NAS, or any RAID array that's not ZFS, here's the maths:

    6TB is approx 0.5 x 10^14 bits. That means if you read the entire disk (as you have to do to rebuild a parity or mirrored array from the data held on all the remaining array disks) then there's a 50% chance of a disk read error for a consumer-class disk with 1 in 10^14 unrecoverable read error rate (check the maker's specs). Conversely, that means there's a 50% chance that there WON'T be a read error.

    Let's say you have a nice 24TB RAID6 array with 6 of these 6TB Red drives - four for data, two parity. RAID6, so good redundancy right? Must be safe! One of your disks dies. You still have a parity (or two, if it was a data disk that died) spare, so surely you're fine? Unfortunately, the chance of rebuilding the array without ANY of the disks suffering an unrecoverable read error is: 50% (for the first disk) x 50% (for the second) x 50% (for the third) x 50% (for the fourth) x 50% (for the fifth. Yes, that's ** 3.125% ** chance of rebuilding safely. Most RAID controllers will barf and stop the rebuild on the first error from a disk and declare it failed for the array. Would you go to Vegas to play those odds of success?

    If those 6TB disks had been Enterprise-class drives (say WD RE, or the HGST and Seagates reviewed here) specifically designed and marketed for 24/7 array use, they have a 1 in 10^15 unrecoverable error rate, an order of magnitude better. How does the maths look now? Each disk now has a 5% chance of erroring during the array rebuild, or a 95% chance of not. So the rebuild success probability is 95% x 95% x 95% x 95% x 95% - that's about 77.4% FOR THE SAME SIZE OF DISKS.

    Note that this success/failure probability is NOT PROPORTIONAL to the size of the disk and the URE rate - it is a POWER function that squares, then cubes, etc. given the number of disks remaining in the array. That means that using smaller disks than these 6TB monsters is significant to the health of the array, and so is using disks with much better URE figures than consumer-class drives, to an enormous extent as shown by the probability figure above.

    For instance, suppose you'd used an eight-disk RAID6 of 6TB Red drives to get the same 24TB array in the first example. Very roughly your non-error probability per disk full read is now 65%, so the probability of no read errors over a 7-disk rebuild is roughly 5%. Better than 3%, but not by much. However, all other things being equal, using far smaller disks (but more of them) to build the same size of array IS intrinsically safer for your data.

    Before anyone rushes to say none of this is significant compared to the chance of a drive mechanically failing in other ways, sure, that's an ADDITIONAL risk of array failure to add to the pretty shocking probabilities above. Bottom line, consumer-class drives are intrinsically UNSAFE for your data at these bloated multi-terabyte sizes, however much you think you're saving by buying the biggest available, since the build quality has not increased in step with the technology cramming the bits into smaller spaces.
    Reply
  • asmian - Wednesday, July 23, 2014 - link

    Apologies for proofing error: "For instance, suppose you'd used an eight-disk RAID6 of 6TB Red drives" - obviously I meant 4TB drives. Reply
  • KAlmquist - Wednesday, July 23, 2014 - link

    "6TB is approx 0.5 x 10^14 bits. That means if you read the entire disk (as you have to do to rebuild a parity or mirrored array from the data held on all the remaining array disks) then there's a 50% chance of a disk read error for a consumer-class disk with 1 in 10^14 unrecoverable read error rate (check the maker's specs)."

    What you are overlooking is that even though each sector contains 4096 bytes, or 32768 bits, it doesn't follow that to read the contents of the entire disk you have to read the contents of each sector 32768 times. To the contrary, to read the entire disk, you only have to read each sector once.

    Taking that into account, we can recalculate the numbers. A 5.457 gigabyte drive contains 1,464,843,750 sectors. If the probability of an unrecoverable read error is 1 in 10^14, and the probability of a read error on one sector is independent of the probability of a read error in any other sector, then the probability of getting a read error at some point when reading the entire disk is 0.00146%. I suspect that the probability of getting a read error in one sector is probably not independent of the probability of getting a read error in any other sector, meaning that the 0.00146% figure is too high. But sticking with that figure, it gives us a 99.99268% probability of rebuilding safely.

    I don't know of anyone who would dispute that the correct way for a RAID card to handle an unrecoverable read error is to calculate the data that should have been read, try to write it to the disk, and remove the disk from the array if the write fails. (This assumes that the data can be computed from data on the other disks, as is the case in your example of rebuilding a RAID 6 array after one disk has been replaced.) Presumably a lot of RAID card vendors assume that unrecoverable read errors are rare enough that the benefits of doing this right, rather than just assuming that the write will fail without trying, are too small to be worth the cost.
    Reply
  • asmian - Wednesday, July 23, 2014 - link

    That makes sense IF (and I don't know whether it is) the URE rate is independent of the number of bits being read. If you read a sector you are reading a LOT of bits. You are suggesting that you would get 1 single URE event on average in every 10^14 sectors read, not in every 10^14 BITS read... which is a pretty big assumption and not what the spec seems to state. I'm admittedly suggesting the opposite extreme, where the chance of a URE is proportional to the number of bits being read (which seems more logical to me). Since you raise this possibility, I suspect the truth is likely somewhere in the middle, but I don't know enough about how UREs are calculated to make a judgement. Hopefully someone else can weigh in and shed some light on this.

    Ganesh has said that previous reviews of the Red drives mention they are masking the UREs by using a trick: "the drive hopes to tackle the URE issue by silently failing / returning dummy data instead of forcing the rebuild to fail (this is supposed to keep the RAID controller happy)." That seems incredibly scary if it is throwing bad data back in rebuild situations instead of admitting it has a problem, potentially silently corrupting the array. That for me would be a total deal-breaker for any use of these Red drives in an array, yet again NOT mentioned in the review, which is apparently discussing their suitability for just that... <sigh>
    Reply
  • NonSequitor - Thursday, July 24, 2014 - link

    So something isn't adding up here. I have a set of nine Red 3TB drives in a RAID6. They are scrubbed - a full rebuild - once a month. They have been in service for a year and a half with no drive failures. Since it's RAID6, if one drive returns garbage it will be spotted by the second parity drive. Obviously they can't be returning bad data, or they would have been failed out long ago. The array was previously made of 9 1TB Greens, scrubbed monthly, and over two and a half years I had a total of two drive failures, one hard, one a SMART pre-failure. Reply
  • asmian - Friday, July 25, 2014 - link

    Logically, I think this might well be due to the URE masking on these Red drives - something the Green drives weren't doing. You've been lucky that with a non-degraded RAID6 you've always had that second parity drive that has perhaps enabled silent repair by the controller when a URE occurred. I've been pondering more about this and here's what I have just emailed to Ganesh, with whom I've been having a discussion...

    --------------------

    On 24/07/2014 03:45, Ganesh T S wrote:
    > Ian,
    >
    > Irrespective of the way URE needs to be interpreted, the points you
    > raise are valid within a certain scope, and I have asked WD for their
    > inputs. I will ping Seagate too, since their new NAS HDD models are
    > expected to launch in late August.

    Thanks. This problem is a lot more complicated than it looks, I think, and than a single URE figure might suggest. But the other "feature" of these Red drives is also extremely concerning to me now. I am a programmer and often think in low-level algorithms, and everything about this URE masking seems wrong in the likely usage scenarios. Please help me with my logic train here - correct me if I'm wrong. Let's assume we have an array of these Red drives. Irrespective of the chance of a URE while the array is rebuilding or in use, let's assume one occurs. You have stated WD's info that the URE is masked and the disk simply returns dummy info for the read.

    If you were in a RAID5 and you were rebuilding after you've lost a drive, that MUST mean the array is now silently corrupted, right? The drive has masked the read failure and reported no error, and the RAID controller/software has no way to detect the dummy data. Critically, having a back-up doesn't help if you don't know that you NEED to repair or restore damaged files, and without a warning, due to simple space contraints a good backup will likely be over-written with one that now contains the corrupted data... so all ways round you are screwed. Having a masked URE in this situation is worse than having one reported, as you have no chance to take any remedial action.

    If you are in RAID6 and you lost a drive, then you still have a parity check to confirm the stripe data with while rebuilding. But if there's a masked URE and dummy data, then how will the controllers react? I presume they ALWAYS test the stripe data with the remaining parity or parities for consistency... so at that point the controller MUST throw a major rebuild error, right? However, they cannot determine which drive gave the bad data - just that the parity is incorrect - unless it's a parity disc that errored and one parity calculates correctly while the other is wrong. If they knew WHICH disc had UREd then they could easily exclude it from the parity calculation and rebuild that dummy data on the fly with the spare parity, but the masking makes that impossible. The rebuild must fail at this point. At least you have your backups... hopefully.

    Obviously the above situation will also be the same for a RAID5 array in normal usage or a degraded RAID6. Checking reads with parity, a masked URE means a failure with no way to recover. If you have an unmasked URE at least the drive controller can exclude the erroring disc for that stripe and just repair data silently using the remaining redundancy, knowing exactly where the error has come from. After all, it's logically just an EXPECTED disk event with a statistically low chance of happening, not necessarily an indication of impending disk failure unless it is happening frequently on the same disc. The only issue will be the astronomically unlikely chance of another URE occurring in the same stripe on another disc.

    Fundamentally, a masked URE means you get bad data without any explanation of why a disc is returning it, which gives no information to the user (or to the RAID controller so it can take action or warn the user appropriately). For me, that's catastrophic. It all really depends on what the controllers do when they discover parity errors and UREs in rebuild situations and how robust their recovery algorithms are - an unmasked URE does not NEED to be a rebuild-killer for RAID6, as thank G*d you had a second redundancy disc...

    Anyway, the question will be whether these new huge drives will, as you say, accumulate empirical evidence from users that array failures are happening more and more frequently. Without information from reviews like yours that warn against their use in RAID5 (or mirrors) despite the marketing as NAS products, the thought-experiment above suggests the most likely scenario is extremely widespread silent array corruption. I stand by my comment that this URE masking should be a total deal-breaker in considering them for home array usage. Better a disk that at least tells you it's errored.
    Reply
  • NonSequitor - Tuesday, July 29, 2014 - link

    That still doesn't add up - there are no unmasked UREs in my situation, as I'm using Linux software RAID. It is set to log any errors or reconstructions, and there are literally none. One of these arrays has a dozen scrubs on it now with no read errors whatsoever. Reply
  • m0du1us - Friday, July 25, 2014 - link

    @asmian This is why we run 28 disk arrays minimum for SAN. If you really want RAID6 to be reliable, you need A LOT of disks, no matter the size. Larger disks decrease your odds of rebuilding the array, that just means you need more disks. Also, you should never build a RAID6 array with less than 5 disks. At 5 disks, you get no protection from a disk failure during rebuild. At 12 disks, You can have 2 spares and 2 failures during rebuild before loosing data. Reply
  • NonSequitor - Friday, July 25, 2014 - link

    Your comment makes no sense. A five disk RAID-6 is N+2. A 28 disk RAID-6 is N+2. More disks will decrease reliability. Our storage vendor advised keeping our N+2 array sizes between 12 and 24 disks, for instance: 12 disks as a minimum before it started impacting performance, 24 disks as a maximum before failure risks started to get too high. Our production experience has borne this out. Bigger arrays are actually treated as sets of smaller arrays. Reply
  • LoneWolf15 - Friday, July 25, 2014 - link

    asmian, it totally depends on what you are building the array for, and how you are building it.

    Myself, I'd use them for home or perhaps small business, but in a RAID 10 or RAID 6. Then I'd have some guaranteed security. That said, if I needed constant write performance (e.g., multiple IP cameras) I'd use WD RE, and if I wanted enterprise level performance, I'd use enterprise drives.

    That, and I realize that RAID != backup strategy. I have a RAID-5 at home; but I sure as heck have a backup, too.
    Reply
  • tuxRoller - Monday, July 21, 2014 - link

    Hi Ganesh,

    Would you mind listing the max power draw of these drives? That is, how much power is required during spin-up?
    Reply
  • WizardMerlin - Tuesday, July 22, 2014 - link

    If you're not going to show the results on the same graph then for the love of god don't change the scale of the axis between graphs - makes quick comparison completely impossible. Reply
  • ganeshts - Tuesday, July 22, 2014 - link

    I have tried that before and the problem is that once all the drives are on the same scale, then some of them become really difficult to track absolute values across. Been down that road and decided the issues it caused are not worth the effort taken for readers (including me and my colleagues) to glance right and left on the axes side to see what the absolute numbers are. Reply
  • Iketh - Tuesday, July 22, 2014 - link

    what? why would they become difficult to "track absolute values" ?? Reply
  • ganeshts - Tuesday, July 22, 2014 - link

    I have seen graphs in preparation where some set of values are just dwarfed by the higher values from other result sets. What you see is basically a set of clustered points at the base (very close to the X axis), while there is a nicer graph with clearly spaced out values for the other result set in the middle of the graph. Trust me, I have done this and it doesn't look nice or present meaningful information to the readers. Reply
  • Fjodor2000 - Tuesday, July 22, 2014 - link

    Are there any measurements and comparison of the noise levels for these new WD HDDs?

    Particularly for the WD Red 5 and 6 TB HDDs this would be interesting to know, and how they compare to the 4 platter WD Red 4 TB HDD.
    Reply
  • stevenrix - Tuesday, July 22, 2014 - link

    Any kind of hard-drive will fail sooner or later, but the quality of hard-drives has been horrible for the last 5 years or maybe more, which is unfortunately expected since bigger hard-drives increase MTBF. The professional drives seem to fail faster than consumer drives from what I've seen: a 1 tb SAS drive 3.5 inch 15K speed can fail just about 2 months after purchasing it, because the platters spin faster and they are no other additional technical enhancements on these drives compared to consumer drives.
    Out of probably 200 drives I've used over the last 2 decades, only one never failed and it is a WD 73 Gb in EIDE. After seeing so many failures, I decided to use RAID levels and backup my data on backup tapes.
    There is also one important point: the replaced drives in the US is refurbished most of the time, while that's not the case in Europe, unless this has changed. Once you get a replacement, you can be sure that the drives will fail in less than 1 year, so those companies are defeating the purpose, replacing a drive by an old hard-drive that has been refurbished to meet satisfactory criterias, that is not good customer service without mentioning those monopolistic companies. A little bit more competition should be good for us consumers.
    The only company so far I've been satisfied is with HGST.

    Reply
  • jabber - Wednesday, July 23, 2014 - link

    I live in Europe. Had a WD 75GB Raptor fail on me after a couple of years. The replacement was marked as 'Recertified'.

    That replacement is still going nearly 7 years later as one of my stunt drives. You know the one or two drives you have laying around that get tossed into all sorts of projects. Just keeps on trucking.
    Reply
  • Jorsher - Thursday, July 24, 2014 - link

    A lot of people seem to assume that new is better than "refurbished."

    What they don't realize is that refurbished products tend to go through a much more thorough testing process than brand new products. This is just based on a couple consumer electronics companies I've worked at, with the largest being LG. New products go through a relatively quick test, and in many cases they only test a few out of each batch. Products that are sent back due to warranty are first put through a thorough test to determine what problems it has, then it's decided if it's worth repairing or not, then it's repaired, then it's put through another test -- for every one.

    Since those experiences, I now buy factory-refurbished when possible.
    Reply
  • hwnmafia02 - Friday, July 25, 2014 - link

    Man those companies provided you with such wonderful hardware and you can't even include the logos for each in the thank you section? Come on, now... Reply
  • Elmeransa - Saturday, July 26, 2014 - link

    Running 8*3TB WD Red since 2 years back, no issues at all in raid5 Reply
  • Haravikk - Monday, July 28, 2014 - link

    One thing that wasn't mentioned was noise; while there were several reasons for my choosing WD Reds for my Synology NAS (a DS212j with two 3tb disks simply concatenated for capacity since it's already a redundant backup). My main decision factors were that the WD Reds were a reasonable price (not too far above greens), but are also exceptionally quiet; except for when they're initially spinning up or spinning down I can't even hear them over the sound of the NAS' fan, and the fan is barely audible to begin with. Meanwhile I've known enterprise drives that are very noisy, as well as various desktop disks that sound like someone pouring gravel (during normal operation). Reply
  • swansearecovery - Monday, July 28, 2014 - link

    This post includes a great article on hard drives, truly brilliant. After reading the information given I have no second thought on that you people are also very good experts in handling the cases of Data Recovery from Hard Drive as well.

    http://www.swanseadatarecovery.co.uk/hard-drive-re...
    Reply

Log in

Don't have an account? Sign up now