WD Red Pro Review: 4 TB Drives for NAS Systems Benchmarked

Name: WD Red Pro Review: 4 TB Drives for NAS Systems Benchmarked
Item: WD Red Pro Review: 4 TB Drives for NAS Systems Benchmarked
Author: Ganesh T S

by Ganesh T S on August 8, 2014 9:00 AM EST

62 Comments | Add A Comment

62 Comments

Feature Set Comparison

Enterprise hard drives come with features such as real time linear and rotational vibration correction, dual actuators to improve head positional accuracy, multi-axis shock sensors to detect and compensate for shock events and dynamic fly-height technology for increasing data access reliability. These hard drives also expose some of their interesting firmware aspects through their SATA controller, but, before looking into those, let us compare the specifications of the ten drives being considered today. Even though most of the data for all ten drives is available below, readers can view only two at a time side-by-side due to usability issues.

Comparative HDD Specifications
Aspect
Model Number	WD4001FFSX	WD4001FFSX
Interface	SATA 6 Gbps	SATA 6 Gbps
Sector Size / AF	512E	512E
Rotational Speed	7200 RPM	7200 RPM
Cache	64 MB	64 MB
Rated Load / Unload Cycles	600 K	600 K
Non-Recoverable Read Errors / Bits Read	< 1 in 10¹⁴	< 1 in 10¹⁴
MTBF	1 M	1 M
Rated Workload	~ 180 TB/yr	~ 180 TB/yr
Operating Temperature Range	5 to 60 C	5 to 60 C
Acoustics (Seek Average - dBA)	34 dBA	34 dBA
Physical Parameters	14.7 x 10.16 x 2.61 cm, 750 g	14.7 x 10.16 x 2.61 cm, 750 g
Warranty	5 years	5 years
Price (in USD, as-on-date)	$260	$260

A high level overview of the various supported SATA features is provided by HD Tune Pro.

A brief description of some of the SATA features is provided below:

S.M.A.R.T: Most readers are familiar with the SMART (Self-Monitoring, Analysis and Reporting Technology) feature, which provides drive parameters that can server as reliability indicators. Some of these include the reallocated sector count (indication of bad blocks), spin retry count (indication of problems with the spindle motors), command timeout (indication of issues with the power supply or data cable), temperature, power-on hours etc.
48-bit Address: The first ATA standard specified 24 bits for the logical block address (sector), which was later updated to 28 bits. Using 28 bits, one could address up to 137.4 GB (2^28 * 2^9 bytes), which capped the SATA drive size. In 2003, an update to the standard was released to allow 48 bits for the LBA address to get past this issue. No modern SATA drive comes without support for 48-bit addresses.
Read Look-Ahead: Drives supporting this feature keep reading ahead even after the current command is completed. The data is transferred to the buffer for faster response to the host in the case of sequential accesses.
Write Cache: This feature is pretty much self-explanatory, with data being stored in the buffers prior to being committed to the platters. There is a risk of data loss due to power loss. The feature can be disabled by the end user.
Host Protected Area (HPA): Drives supporting this feature have some sectors hidden from the OS. It is usually used by manufacturers to store recovery data, but users can also 'hide' data by allocating sectors to the HPA.
Device Configuration Overlay (DCO): Drives supporting this feature can report modified drive parameters to the host.
Security Mode: Drives supporting this feature can help protect themselves from illegal accesses or setting of new passwords (by freezing such functions). Readers might have encountered frozen security settings for SSDs while trying to secure erase them..
Automatic Acoustic Management: AAM was declared obsolete in the 2010 ATA standards revision. On supported disks, it enables reduction of noise that rise from fast spin-ups of the disk. In general, configure the AAM value to something low would result in a quiet, but slow, disk, while a high value would result in a loud, but fast, disk.
Power Management: Support for this feature enables drives to follow specific power management state transitions via commands from the host. Supported modes include IDLE, SLEEP and STANDBY.
Advanced Power Management (APM): This feature allows setting of a value to allow for disk spindowns as well as adjustment of head-parking frequency. Some disks have proprietary commands for achieving this functionality (for example, the WDIDLE tool from Western Digital can be used with the Green drives).
Interface Power Management: Drives supporting this feature allow for fine-tuning of power consumption by being aware of various interface power modes such as PHY Ready, Partial and Slumber (in the order of power consumption). Transitions from a higher power mode to a lower one usually happen after some period of inactivity. They can be either host-initiated (HIPM) or device-initiated (DIPM). Note that these refer to the SATA interface and not the device itself. As such, they are complementary to the power management feature mentioned earlier.
Power-up in Standby: This SATA feature allows drives to be powered up into the Standby state to minimize inrush current at power-up and allow the host to sequence the spin-up of devices. This is particularly useful for NAS units and RAID environments. Desktop drives usually come with power management disabled, but there are jumper settings on the drive to enable controlled spin-up via ATA standard spinup commands. For drives targeting NAS units, power Power-up in Standby is enabled by default.
SCT Tables: The SMART Command Transport (SCT) tables feature extends the SMART protocol and provides additional information about the drive when requested by the host.
Native Command Queuing (NCQ): This is an extension to the SATA protocol to allow drives to reorder the received commands for more optimal operation.
TRIM: This is a well known feature for readers familiar with SSDs. It is not relevant to any of the drives being discussed today.

We get a better idea of the supported features using FinalWire's AIDA64 system report. The table below summarizes the extra information generated by AIDA64 (that is not already provided by HD Tune Pro).

Comparative HDD Features
Aspect
DMA Setup Auto-Activate	Supported, Disabled	Supported, Disabled
Extended Power Conditions	Supported, Disabled	Supported, Disabled
Free-Fall Control	Not Supported	Not Supported
General Purpose Logging	Supported, Enabled	Supported, Enabled
In-Order Data Delivery	Not Supported	Not Supported
NCQ Priority Information	Supported	Supported
Phy Event Counters	Supported	Supported
Release Interrupt	Not Supported	Not Supported
Sense Data Reporting	Not Supported	Not Supported
Software Settings Preservation	Supported, Enabled	Supported, Enabled
Streaming	Supported, Disabled	Supported, Disabled
Tagged Command Queuing	Not Supported	Not Supported

4 TB NAS and Nearline Drives Face-Off: The Contenders Performance - Raw Drives

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

62 Comments

View All Comments

shodanshok - Sunday, August 10, 2014 - link
It is not a single post. It is a lengthy discussion of 18 different posts. Let me forward you to the first post: http://marc.info/?l=linux-raid&m=1406709331293...

When used in single parity scheme, no RAID implementation or file system is immune to UREs that happen during rebuild. What ZFS can do it to catch when a disk suddenly return garbage, which with other filesystem normally result in silent data corruption.

But UREs are NOT silent corruption. They happen when the disk can not read the requested block and give you a "sorry, I can't read that" message.

Regards.
asmian - Sunday, August 10, 2014 - link
>But URE's are NOT silent corruption.

They are if you are using WD Red drives, which Ganesh has previously said are using URE masking to play nicer with RAID controllers. They issue dummy data and no error instead of a URE. This, and the serious implications of it especially with single parity RAID (mirror/RAID5), is NOT mentioned in this comparative article, which is shocking.

To reiterate: if a RAID5 array (or a degraded RAID6) has a masked URE, there is no way to know which disk the error came from. And if the controller is NOT continuously checking parity against all reads for speed then the dummy data will be passed through without any error being raised at all. Worse, since you don't know there has been a read error, you will assume your data is OK to backup, so you will likely overwrite good old backups with corrupt data, since space for multiple copies is likely to be at a premium, so any backup mitigation strategy is screwed.

Given the fact that these are 4GB consumer class drives with 1 in 10^14 URE numbers, the chance of a URE when rebuilding is very high, which is why these Red drives are extremely unsafe in RAID implementations that do NOT check parity continuously. I already ran the numbers in a previous post, although they haven't been verified - Ganesh said he was seeking clarification from the manufacturers. Bottom line: caveat emptor if you risk your data to these drives, with or without RAID or a backup strategy.
shodanshok - Sunday, August 10, 2014 - link
Can you provide a reference about URE masking? I carefully read WD Red specs (http://www.wdc.com/wdproducts/library/SpecSheet/EN... and in no place they mention something similar to what you are referring. Are you sure you are not confusing URE with TLER?

After all, I find extremely difficult to think that an hard drive will intentionally return bad data instead of a URE.

The only product range where I can _very remotely_ find a similar thing useful is with WD Purple (DVR) series: being often used as simple "video storage" in single disk configuration, masking an URE will not lead to big problems. However, the proper solution here is to implement a configurable SCTERC o TLRE.

Regards.
asmian - Sunday, August 10, 2014 - link
> I find extremely difficult to think that an hard drive will intentionally return bad data instead of a URE.

Ganesh wrote to me: "As discussed in earlier WD Red reviews, the drive hopes to tackle the URE issue by silently failing / returning dummy data instead of forcing the rebuild to fail (this is supposed to keep the RAID controller happy)."
shodanshok - Sunday, August 10, 2014 - link
This seems more the functionality of TLER, rather than some form of URE masking. Anyway, if the RED drive really, intentionally return garbage instead of a read error, it should absolutely avoided.

Ganesh, can you clarify this point?
asmian - Sunday, August 10, 2014 - link
A quick search back through previous WD Red drive reviews reveals nothing immediately. Ganesh ran a large article on Red firmware differences that covered configurable TLER behaviour, which is about dropping erroring drives out of an array quickly so that the array parity or other redundancy can take over and provide the data that the drive can't immediately retrieve, but nothing like this was mentioned.

However, in http://www.anandtech.com/show/6083/wd-introduces-r... the author Jason Inofuentes wrote: "They've also included error correction optimizations to prevent a drive from dropping out of a RAID array while it chases down a piece of corrupt data. The downside is that you might see an artifact on the screen briefly while streaming a movie, the upside is that you won't have playback pause for a few seconds, or for good depending on your configuration, while the drive drops off the RAID to fix the error."

That sounds like what Ganesh has said, although I can't see anything in his articles mentioning it. It may be a complete misunderstanding of the TLER behaviour, though. The problem with the behaviour described above is that it assumes that the data is not important, something that will only manifest as a little unnoticed corruption while watching a video file. But what if it happens while you're copying data to your backup array? What if it's not throwaway data, but critical data and you now have no idea that it's corrupt or unrecoverable on the disk so you NEED that last good backup you took... I don't think ANYONE is (or should be) as casual as that about the intrinsic VALUE of their data - why bother with parity/mirror RAID otherwise? If the statement is correct, it's extremely concerning. If not, it needs correcting urgently.
Zan Lynx - Monday, August 11, 2014 - link
To me that sounds like a short TLER setting. The description says nothing about if the drive returns an error or not. It may very well be the playback software receiving the error but continuing playback.
asmian - Monday, August 11, 2014 - link
But a short TLER is designed specifically to allow the array parity/redundancy to kick in immediately and provide the missing data by reconstruction. There wouldn't BE any bad data returned (unless there was no array redundancy). So as described this is NOT anything to do with short TLER. It is about the drive not returning an error when it can't read data successfully (ie. a URE), and issuing dummy data instead. The fundamental issue is that without an error being raised, neither the array hardware/software nor the user can take any action to remedy the data failure, whether that's restoring the bad data from backup or even highlighting the drive to see if this is a pattern indicative of likely failure.

There are some comments about it in that article which try to explain the scope (it seems to be limited to some ATA commands), but not in sufficient detail for me or most average users who don't know what ATA commands are sent by specific applications or the file system, and they certainly didn't answer my questions and misgivings.
shodanshok - Monday, August 11, 2014 - link
Hi, it seems more as a short TLER timeout rather than URE masking. Ganesh, can you clarify?
ganeshts - Saturday, August 23, 2014 - link
Yes, shodanshok is right ; TLER feature in these NAS drives is a shorter timeout rather than URE masking. Ian's quote of my exchange in a private e-mails was later clarified, but the conversation didn't get updated here:

1. When URE happens, the hard drive returns an error code back to the RAID controller (in the case of devices with software RAID, it sends the error back to the CPU). The error code can be used to gauge what exactly happened. A fairly detailed list can be found here: http://en.wikipedia.org/wiki/Key_Code_Qualifier : URE corresponds to a medium error with this key code description: "Medium Error - unrecovered read error"

2. Upon recognition of URE, it is up to the RAID controller to decide what needs to be done. Systems usually mark the sector as bad and try to remap it. It is then populate with data recovered using the other drives in the RAID array. It all depends on the vendor implementation. Since most off-the-shelf NAS vendors use mdadm, I think the behaviour will be similar for all of those.

3. TLER just refers to quicker return of error code back to controller rather than 'hanging' for a long time. The latter behaviour might cause the RAID controller to mark the whole disk as bad when we have URE for only one sector.

WD Red Pro Review: 4 TB Drives for NAS Systems Benchmarked

Feature Set Comparison

Post Your Comment

62 Comments

View All Comments

shodanshok - Sunday, August 10, 2014 - link

asmian - Sunday, August 10, 2014 - link

shodanshok - Sunday, August 10, 2014 - link

asmian - Sunday, August 10, 2014 - link

shodanshok - Sunday, August 10, 2014 - link

asmian - Sunday, August 10, 2014 - link

Zan Lynx - Monday, August 11, 2014 - link

asmian - Monday, August 11, 2014 - link

shodanshok - Monday, August 11, 2014 - link

ganeshts - Saturday, August 23, 2014 - link

Log in

Don't have an account? Sign up now