Performance - Raw Drives

Prior to evaluating the performance of the drives in a NAS environment, we wanted to check up on the best-case performance of the drives by connecting them directly to a SATA 6 Gbps port. Using HD Tune Pro 5.50, we ran a number of tests on the raw drives. The following screenshots present the results for the various drives in an easy-to-compare manner. Note that some of the screenshots are from the previous roundup where we used HD Tune Pro 5.0.

Sequential Reads:

Sequential Writes:

Random Reads:

Random Writes:

Miscellaneous Reads:

Miscellaneous Writes:

 

Feature Set Comparison Single Client Access - DAS Benchmarks
Comments Locked

62 Comments

View All Comments

  • shodanshok - Sunday, August 10, 2014 - link

    It is not a single post. It is a lengthy discussion of 18 different posts. Let me forward you to the first post: http://marc.info/?l=linux-raid&m=1406709331293...

    When used in single parity scheme, no RAID implementation or file system is immune to UREs that happen during rebuild. What ZFS can do it to catch when a disk suddenly return garbage, which with other filesystem normally result in silent data corruption.

    But UREs are NOT silent corruption. They happen when the disk can not read the requested block and give you a "sorry, I can't read that" message.

    Regards.
  • asmian - Sunday, August 10, 2014 - link

    >But URE's are NOT silent corruption.

    They are if you are using WD Red drives, which Ganesh has previously said are using URE masking to play nicer with RAID controllers. They issue dummy data and no error instead of a URE. This, and the serious implications of it especially with single parity RAID (mirror/RAID5), is NOT mentioned in this comparative article, which is shocking.

    To reiterate: if a RAID5 array (or a degraded RAID6) has a masked URE, there is no way to know which disk the error came from. And if the controller is NOT continuously checking parity against all reads for speed then the dummy data will be passed through without any error being raised at all. Worse, since you don't know there has been a read error, you will assume your data is OK to backup, so you will likely overwrite good old backups with corrupt data, since space for multiple copies is likely to be at a premium, so any backup mitigation strategy is screwed.

    Given the fact that these are 4GB consumer class drives with 1 in 10^14 URE numbers, the chance of a URE when rebuilding is very high, which is why these Red drives are extremely unsafe in RAID implementations that do NOT check parity continuously. I already ran the numbers in a previous post, although they haven't been verified - Ganesh said he was seeking clarification from the manufacturers. Bottom line: caveat emptor if you risk your data to these drives, with or without RAID or a backup strategy.
  • shodanshok - Sunday, August 10, 2014 - link

    Can you provide a reference about URE masking? I carefully read WD Red specs (http://www.wdc.com/wdproducts/library/SpecSheet/EN... and in no place they mention something similar to what you are referring. Are you sure you are not confusing URE with TLER?

    After all, I find extremely difficult to think that an hard drive will intentionally return bad data instead of a URE.

    The only product range where I can _very remotely_ find a similar thing useful is with WD Purple (DVR) series: being often used as simple "video storage" in single disk configuration, masking an URE will not lead to big problems. However, the proper solution here is to implement a configurable SCTERC o TLRE.

    Regards.
  • asmian - Sunday, August 10, 2014 - link

    > I find extremely difficult to think that an hard drive will intentionally return bad data instead of a URE.

    Ganesh wrote to me: "As discussed in earlier WD Red reviews, the drive hopes to tackle the URE issue by silently failing / returning dummy data instead of forcing the rebuild to fail (this is supposed to keep the RAID controller happy)."
  • shodanshok - Sunday, August 10, 2014 - link

    This seems more the functionality of TLER, rather than some form of URE masking. Anyway, if the RED drive really, intentionally return garbage instead of a read error, it should absolutely avoided.

    Ganesh, can you clarify this point?
  • asmian - Sunday, August 10, 2014 - link

    A quick search back through previous WD Red drive reviews reveals nothing immediately. Ganesh ran a large article on Red firmware differences that covered configurable TLER behaviour, which is about dropping erroring drives out of an array quickly so that the array parity or other redundancy can take over and provide the data that the drive can't immediately retrieve, but nothing like this was mentioned.

    However, in http://www.anandtech.com/show/6083/wd-introduces-r... the author Jason Inofuentes wrote: "They've also included error correction optimizations to prevent a drive from dropping out of a RAID array while it chases down a piece of corrupt data. The downside is that you might see an artifact on the screen briefly while streaming a movie, the upside is that you won't have playback pause for a few seconds, or for good depending on your configuration, while the drive drops off the RAID to fix the error."

    That sounds like what Ganesh has said, although I can't see anything in his articles mentioning it. It may be a complete misunderstanding of the TLER behaviour, though. The problem with the behaviour described above is that it assumes that the data is not important, something that will only manifest as a little unnoticed corruption while watching a video file. But what if it happens while you're copying data to your backup array? What if it's not throwaway data, but critical data and you now have no idea that it's corrupt or unrecoverable on the disk so you NEED that last good backup you took... I don't think ANYONE is (or should be) as casual as that about the intrinsic VALUE of their data - why bother with parity/mirror RAID otherwise? If the statement is correct, it's extremely concerning. If not, it needs correcting urgently.
  • Zan Lynx - Monday, August 11, 2014 - link

    To me that sounds like a short TLER setting. The description says nothing about if the drive returns an error or not. It may very well be the playback software receiving the error but continuing playback.
  • asmian - Monday, August 11, 2014 - link

    But a short TLER is designed specifically to allow the array parity/redundancy to kick in immediately and provide the missing data by reconstruction. There wouldn't BE any bad data returned (unless there was no array redundancy). So as described this is NOT anything to do with short TLER. It is about the drive not returning an error when it can't read data successfully (ie. a URE), and issuing dummy data instead. The fundamental issue is that without an error being raised, neither the array hardware/software nor the user can take any action to remedy the data failure, whether that's restoring the bad data from backup or even highlighting the drive to see if this is a pattern indicative of likely failure.

    There are some comments about it in that article which try to explain the scope (it seems to be limited to some ATA commands), but not in sufficient detail for me or most average users who don't know what ATA commands are sent by specific applications or the file system, and they certainly didn't answer my questions and misgivings.
  • shodanshok - Monday, August 11, 2014 - link

    Hi, it seems more as a short TLER timeout rather than URE masking. Ganesh, can you clarify?
  • ganeshts - Saturday, August 23, 2014 - link

    Yes, shodanshok is right ; TLER feature in these NAS drives is a shorter timeout rather than URE masking. Ian's quote of my exchange in a private e-mails was later clarified, but the conversation didn't get updated here:

    1. When URE happens, the hard drive returns an error code back to the RAID controller (in the case of devices with software RAID, it sends the error back to the CPU). The error code can be used to gauge what exactly happened. A fairly detailed list can be found here: http://en.wikipedia.org/wiki/Key_Code_Qualifier : URE corresponds to a medium error with this key code description: "Medium Error - unrecovered read error"

    2. Upon recognition of URE, it is up to the RAID controller to decide what needs to be done. Systems usually mark the sector as bad and try to remap it. It is then populate with data recovered using the other drives in the RAID array. It all depends on the vendor implementation. Since most off-the-shelf NAS vendors use mdadm, I think the behaviour will be similar for all of those.

    3. TLER just refers to quicker return of error code back to controller rather than 'hanging' for a long time. The latter behaviour might cause the RAID controller to mark the whole disk as bad when we have URE for only one sector.

Log in

Don't have an account? Sign up now