Hard drives

One of the most frequently asked questions I hear is 'what's the most reliable hard drive?'  The answer to this question is straightforward - the one that's backed up frequently.  Home file servers can be backed up with a variety of devices, from external hard drives to cloud storage.  As a general guideline, RAID enhances performance but it is not a backup solution.  Some RAID configurations (such as RAID 1) provide increased reliability, but others (such as RAID 0) actually decrease reliability.  A detailed discussion of different kinds of disk arrays is not within the scope of this guide, but the Wikipedia page is a good place to start your research if you're unfamiliar with the technology.

As for hard drive reliability, every hard drive can fail.  While some models are more likely to fail than others, there are no authoritative studies that implement controlled conditions and have large sample sizes.  Most builders have preferences - but anecdotes do not add up to data.  There are many variables that all affect a drive's long-term reliability: shipping conditions, PSU quality, temperature patterns, and of course, specific make and model quality.  Unfortunately, as consumers we have little control over shipping and handling conditions until we get a drive in our own hands.  We also generally don't have much insight into a specific hard drive model's quality, or even a manufacturer's general quality.  However, we can control PSU quality and temperature patterns, and we can use S.M.A.R.T. monitoring tools

One of the most useful studies on hard drive reliability was presented by Pinheiro, Weber, and Barroso at the 2007 USENIX Conference on File and Storage Technologies.  Their paper, Failure trends in a large disk drive population, relied on data gleaned from Google.  So while the controls are not perfect, the sample size is enormous, and it's about as informative as any research on disk reliability.  The PDF is widely available on the web and is definitely worth a read if you've not already seen it and you have the time (it's short at only 12 pages with many graphs and figures).  In sum, they found that SMART errors are generally indicative of impending failure - especially scan errors, reallocation counts, offline reallocation counts, and probational counts.  The take home message: if one of your drives reports a SMART error, you should probably replace it and send it in for replacement if it's under warranty.  If one of your drives reports multiple SMART errors, you should almost certainly replace it as soon as possible.

From Pinheiro, Weber, and Barroso 2007.  Of all failed HDDs, more than 60% had reported a SMART error. 

Pinheiro, Weber, and Barroso also showed how temperature affects failure rates.  They found that drives operating at low temperatures (i.e. less than 75F/24C) actually have the highest (by far) failure rates, even greater than drives operating at 125F/52C.  This is likely an irrelevant point to many readers, but for those of us who live further up north and like to keep our homes at less than 70F/21C in the winter, it's an important recognition that colder is not always better for computer hardware.  Of use to everyone, the study showed that the pinnacle of reliability occurs around 104F/40C, from about 95F/35C to 113F/45C. 

From Pinheiro, Weber, and Barroso 2007.  AFR: Annualized Failure Rate - higher is worse!

Given the range of temperatures that hard drives appear to function most reliably at, it might take some experimentation in any given case to get a home file server's hard drives in an ideal layout. 

So rather than answering what specific hard drive models are the most reliable, we recommend you do everything you can to prevent catastrophic failure by using quality PSUs, maintaining optimal temperatures, and paying attention to SMART utilities.  For such small sample sizes as a home file server necessitates, the most important factor in long-term HDD reliability is probably luck.   

Pragmatically, low-rpm 'green' drives are the most cost-effective storage drives.  Note that many of the low-rpm drives are not designed to operate in a RAID configuration - be sure to research specific models.  The largest drives currently available are 3TB, which can now be found for as little as $110.  The second-largest capacity drives at 2TB generally offer the best $/GB ratio, and can regularly be found for $70 (and less when on sale or after rebate).  1TB drives are fine if you don't need much space, and can sometimes be found for as little as $40.

Cases and Power Supplies Concluding Remarks
Comments Locked

152 Comments

View All Comments

  • HMTK - Monday, September 5, 2011 - link

    Inferior as in PITA for rebuilds and stuff like that. On my little Proliant Microserver I use the onboard RAID because I'm too cheap to buy something decent and it's only my backup machine (and domain controller, DHCP, DNS server) but for lots of really important data I'd look for a true RAID card with an XOR processor and some kind of battery protection: on the card or a UPS.
  • fackamato - Tuesday, September 6, 2011 - link

    I've used Linux MD software RAID for 2 years now, running 7x 2TB 5400 rpm "green" drives, and never had an issue. (except one Samsung drive which died after 6 months).

    This is on an Atom system. It took roughly 24h to rebuild to the new drive (CPU limited of course), while the server was happily playing videos in XBMC.
  • Sivar - Tuesday, September 6, 2011 - link

    This is not true in my experience.
    Hardware RAID cards are far, far more trouble than software RAID when using non-enterprise drives.

    The reason:
    Nearly all hard drives have read errors, sometimes frequently.
    This usually isn't a big deal: The hard drive will just re-read the same area of the drive over and over until it gets the data it needs, then probably mark the trouble spot as bad, remapping it to spare area.

    The problem is that consumer hard drives are happy to spend a LONG time rereading the trouble spot. Far longer than most hardware RAID cards need to decide the drive is not responding and drop it -- a perfectly good drive.

    For "enterprise" SATA drives, often the *only* difference, besides price, is that enterprise drives have a firmware flag set to limit their error recovery time, preventing them from dropping unless they have a real problem. Look up "TLER" for more information.

    Hardware RAID cards generally assume they are using enterprise drives. With RAID software it varies, but in Linux and Windows Server 2008R2 at least, I've never had a good drive drop. This isn't to say it can't happen, of course.

    ------------------------------

    For what it's worth, I recommend Samsung drives for home file servers. The 2TB Samsung F4 has been excellent. Sadly, Samsung is selling its HDD business.

    I expressly do not recommend the Western Digital GP (Green) series, unless you can older models before TLER was expressly disabled in the firmware (even as an option).
  • Havor - Sunday, September 4, 2011 - link

    HighPoint RocketRAID 2680 SGL PCI-Express x4 SATA / SAS (Serial Attached SCSI) Controller Card

    In stock.
    Now: $99.00

    http://www.newegg.com/Product/Product.aspx?Item=N8...

    Screw software raid, and then there are many card with more options like online array expansion.
  • Ratman6161 - Tuesday, September 6, 2011 - link

    For home use, a lot/most people are probably not going to build a file server out of all new components. We are mostly recycling old stuff. My file server is typically whatever my old desktop system was. So when I built my new i7-2600K system, my old Core 2 Quad 6600 desktop system became my new server. But...the old P35 motherboard in this system doesn't have RAID and has only 4 SATA ports. It does have an old IDE Port. So it got my old IDE CD-ROM, and three hard drives that were whatever I had laying around. Had I wanted RAID though, I would probably get a card.

    Also, as to OS; A lot of people for use as a home file server are not going to need ANY "server" os. If you just need to share files between a couple of people, any OS you might run on that machine is going to give you the ability to do that. Another consideration is that a lot of services and utilities have special "server" versions that will cost you more. Example: I use Mozy for cloud backup but if I tried to do that on a Windows Server, it would detect that it was a server and want me to upgrade to the Mozy Pro product which costs more. So by running the "server" on an old copy of Windows XP, I get around that issue. Unless you really need the functionality for something, I'd steer clear of an actual "server" OS.
  • alpha754293 - Tuesday, September 6, 2011 - link

    @Rick83

    "MY RAID card recommendation is a mainboard with as many SATA ports as possible, and screw the RAID card."

    I think that's somewhat of a gross overstatement. And here's why:

    It depends on what you're going to be building your file server for and how much data you anticipate on putting on it, and how important is that data? LIke would it be a big deal if you lost all of it? Some of it? A weeks worth? A day's worth? (i.e. how fault tolerant ARE you?)

    For most home users, that's likely going to be like pictures, music, and videos. With 3 TB drives at about $120 a pop (upwards of $170 a pop), do you really NEED a dedicated file server? You can probably just set up an older, low-powered machine with a Windows share and that's about it.

    @Rick83/PCTC2

    I think that when you're talking about rebuild rates, it depends on what RAID level you were running. Right now, I've got a 27 TB RAID5 server (30 TB raw, 10 * 3TB, 7200 rpm Hitachi SATA-3 on Areca ARC-1230 12-port SATA-II PCIe x8 RAID HBA); and it was going to take 24 hours using 80% background initialization or 10 hours with foreground initialization. So I would imagine that if I had to rebuild the entire 27 TB array; it's going to take a while.

    re: SW vs. HW RAID
    I've had experience with both. First is onboard SAS RAID (LSI1068E) then ZFS on 16*500 GB Hitachi 7200 rpm SATA on Adaptec 21610 (16-port SATA RAID HBA), and now my new system. Each has it's merits.

    SW RAID - pros:
    It's cheap. It's usually relatively easy to set up. They work reasonably well (most people probably won't be able to practically tell the difference in performance). It's cheap.

    SW RAID - cons:
    As I've experienced, twice; if you don't have backups, you can be royally screwed. Unless you've actually TRIED transplanting a SW RAID array, it SOUNDS easy, but it's almost never is. A lot of the times, there are a LOT of things that happen/running in the background that's transparent to the end user so if you tried to transplant it, it doesn't always work. And if you've ever tried transplanting a Windows install (even without RAID); you'll know that.

    There's like the target, the LUN, and a bunch of other things that tell the system about the SW RAID array.

    It's the same with ZFS. In fact, ZFS is maybe a little bit worse because I think there was like a 56-character tag that each hard drive gets as a unique ID. If you pulled a drive out from one of the slots and swapped it with another, haha...watch ZFS FREAK out. Kernel panics are sooo "rampant" that they had a page that told you how to clear the ZFS pool cache to stop the endless kernel panic (white screen of death) loop. And then once you're back up and running, you had to remount the ZFS pool. Scrub it, to make sure no errors, and then you're back up.

    Even Sun's own premium support says that in the event of a catastrophic failure with SW RAID, restore your data from back-ups. And if that server WAS your backup server -- well...you're SOL'd. (Had that happen to me TWICE because I didn't export and down the drives before pulling them out.)

    So that's that. (Try transplanting a Windows SW RAID....haha...I dare you.) And if you transplanted a single Windows install enough times, eventually you'll fully corrupt the OS. It REALLLY hates it when you do that.

    HW RAID - pros:
    Usually it's a lot more resilent. A lot of them have memory caches and some of them even have backup battery modules that help store the write intent operations in the event of a power failure so that at next power-up, it will complete the replay.* (*where/when supported). It's to prevent data corruption in the event that say you are in the middle of copying something onto the server, but then the power dies. It's more important with automated write operations, but since most people kinda slowly pick and choose what they put on the server anyways, that's usually not too bad. You might remember where it left off and pick it up from there manually.

    It's usually REALLY REALLY fast because it doesn't have OS overhead.

    ZFS was a bit of an exception because it waits until a buffer of operations is full before it actually executes the disk. So, you can get a bunch of 175 MB/s bursts (onto a single 2.5" Fujitsu 73 GB 10krpm SAS drive), but your clients might still be reporting 40 MB/s. On newer processors, it effectively was idle. On an old Duron 1800, it would register 14% CPU load doing the same thing.

    HW RAID - cons:
    Cost. Yes, the controllers are expensive. But you can also get some older systems/boards with onboard (HW RAID) (like LSI based controllers), but they work.

    With a PCIe x8 RAID HBA, even with PCIe 1.0 slots, each lane is 2 Gbps (250 MB/s) in each direction. So an 8-lane PCIe 1.0 card can do 16 Gbps (2 GB/s) or 32 Gbps (4 GB/s). SATA-3 is only good to 6 Gbps (750 MB/s including overhead). The highest I'm hitting with my new 27 TB server is just shy of 800 MB/s mark. Sustained read is 381 MB/s (limited by SATA-II connector interface). It's the fastest you can get without PCIe SSD cards. (And as far as I know, you CAN'T RAID the PCIe SSD cards. Not yet anyways.)
  • Brutalizer - Friday, September 9, 2011 - link

    It doesnt sound like I have the same experience of ZFS as you.

    For instance, your hw-raid ARECA card, is it in JBOD mode? You know that hw-raid cards screw ZFS seriously?

    I have pulled disks and replaced them without problems, you claim you had problems? I have never heard of such problems.

    I have also pulled out every disk, and inserted them again in other slots and everything worked fine. No problem. It helps to do a "zpool export" and "import" also.

    I dont understand all your problems with ZFS? Something is wrong, you should be able to pull out disks and replace them without problems. ZFS is designed for that. I dont understand why you dont succeed.
  • plonk420 - Sunday, September 4, 2011 - link

    friend has had good luck with a $100ish 8xSATAII PCI-X Supermicro card (no raid). he uses lvm in ubuntu server. i think they have some PCI-e cards in the same price range, too.

    i got a cheapish server-grade card WITH raid (i had to do some heavy research to see if it was compatible with linux), however it seems there's no SMART monitoring on it (at least in the drive manager GUI; i'm a wuss, obviously).
  • nexox - Wednesday, September 7, 2011 - link

    Well, there are about a million replies here, but I think I've got some information that others have missed:

    1) Motherboard SATA controllers generally suck. They're just no good. I don't know why this site insists on benchmarking SSDs with them. They tend to be slow and handle errors poorly. Yes, I've tested this a fair amount.

    2) Hardware RAID has it's positives and negatives, but generally it's not necessary, at least in Linux with mdraid - I can't speak for Windows.

    So what do you do with these facts? You get a quality Host Bust Adaptor (HBA.) These cards generally provide basic raids (0,1,) but mostly they just give you extra SAS/SATA ports, with decent hardware. I personally like the LSI HBAs (since LSI bought most of the other storage controller companies,) which come in 3gbit and 6gbit SAS/SATA, on PCI-Express x4 and x8, with anywhere from 4 to 16 ports. 8 lanes of PCI-Express 2.0 will support about 4GB/s read, which should be enough. And yes, SAS controllers are compatible with SATA devices.

    Get yourself an LSI card for your storage drives, use on board SATA for your boot drives (software raid1,) and run software raid5 for storage.

    Of course this means you can't use an Atom board, since they generally don't have PCI-e, and even the Brazos boards only offer PCI-e 4x (even if the slots look like a 16x.)

    For some reason SAS HBAs are some kind of secret, but they're really the way to go for a reliable, cheap(ish) system. I have a $550 (at the time) 8 port hardware raid card, which is awesome (Managed to read from a degraded 8 disk raid5, cpu limited at 550MB/s, on relatively old and slow 1TB drives, which isn't going to happen with software raid,) but when I build my next server (or cluster - google ceph) I will be going with sofware raid on a SAS HBA.
  • marcus77 - Saturday, October 6, 2012 - link

    I would recommend you euroNAS http://www.euronas.com as OS because it would provide you more flexibility (you can decide which hw to use and can upgrade it easely).

    Raid controllers don't always make sense - especially when it comes to recovery (multiple drive failures) software raid is much more powerful than most raid controllers.

    If you wish to use many drives you will need an additional controller - LSI makes pretty good HBAs - they don't provide raid functionality but have many ports for the drives. You could use it in combination with software raid. http://www.lsi.com/products/storagecomponents/Page...

    If you are looking for a real HW raid controller - I would recommend Adaptec - they have a very good linux support which is mostly used with storage servers

Log in

Don't have an account? Sign up now