Doubling Theoretical Performance: RAID-0

For those of you who are already familiar with RAID and how it works, go ahead and skip to the benchmarks; these next two pages are designed to serve as brief introductions to the two most common forms of RAID on the desktop: RAID-0 and RAID-1.

Otherwise known as striping, RAID-0 is the only performance-enhancing form of RAID that we'll be talking about in this article. The premise behind striping is simple. Data being written to a drive is split into "stripes", generally 16 - 256KB in size, with each stripe being written to a different drive in the array. For example, say we were dealing with a 2-drive RAID-0 array with a stripe size of 128KB and we wanted to write 256KB of data; drive 0 would get the first 128KB of data written to it, and drive 1 would get the remaining 128KB.




Writing to a single hard disk




Writing to a two-disk RAID-0 array


Here, you can see that the write performance of RAID-0 can be almost double that of a single drive, since twice as much data gets written at the same time. The higher write performance is obtained at the expense of some controller overhead, since the RAID controller has to handle splitting up data into stripes before sending it to the drives themselves - but with modern day microprocessors being as fast as they are, the overhead is usually thought of as negligible.

Reading works the exact same way, but in reverse. Say that we want to read that same 256KB of data back; we pull one stripe from drive 0 and the other stripe from drive 1. The read is now completed in half the time, theoretically doubling performance.

We are careful to use the word "theoretical" because the performance advantages of RAID-0 disappear quickly if we're not dealing in ideal situations like the ones we just described. If too large of a stripe size is used, then the performance advantages of RAID-0 can be lost, while too small of a stripe size could result in excess overhead, reducing the performance improvement of the striped array.

We have seen in the past that for most desktop applications, the largest stripe size that a desktop RAID controller will offer is usually the best choice for performance. With Intel's ICH5/6, that translates into a 128KB stripe size, which for our comparison is what we decided to go with. The other stripe size options didn't offer any better performance for our desktop test suite.

The main downside to RAID-0, other than cost, is reliability. The size of a RAID-0 array is the sum of all of its members; so, two 100GB drives in a RAID-0 array will give you one array with a 200GB total capacity. Unfortunately, if you lose any one of the drives in the array, all of your data is lost and isn't recoverable. Since two drives are working in tandem and are both necessary to hold your data, you effectively halve the mean time between failure by moving to a two-drive RAID-0 array.

Index Putting the Redundancy in RAID: RAID-1
POST A COMMENT

126 Comments

View All Comments

  • mdrohn - Tuesday, July 06, 2004 - link

    "Now im sure ill get a roasting from statisticians for not following the rules exactly however as has already been mentioned previously, the notion that by buying 2 of something will halve its chances of enjoying a useful life is just nonsense in individual cases."

    I'm not a statistician, nor do I play one on TV ;) But similarly to WaltC, you are misunderstanding the fact that in a RAID 0 setup, if ONE member drive fails, the whole array fails IN ITS FUNCTION AS AN ARRAY. What you say is true--the individual life of a single drive is not affected by how many drives you own. But when we are talking about an ARRAY of drives, the operating life of each individual drive in the array is not what is at issue. What is relevant is ARRAY failure, not DRIVE failure.

    Let's say you have two drives in a RAID 0 array. One drive fails and the other drive remains in perfect working order. You can reformat the surviving drive and keep using it as long as it continues to function. But you have lost the data on the ENTIRE ARRAY because in RAID 0 there is no redundancy and no backup, and you need both drives working in order to access the data on the array.
    Reply
  • mdrohn - Tuesday, July 06, 2004 - link

    "Thus the chance of failure for a RAID 0 array is the probability at any given time that *_one_* of the component drives will fail. Assuming all the disks are identical, that chance is equal to the failure probability of one drive, multiplied by the number of drives."

    OK remind me never to pull formulae out of my butt on a holiday weekend. The actual probability of failure for a RAID 0 array with n members is as follows:

    fRAID0 = 1 - (1-fa)(1-fb)(1-fc)...(1-fn)

    Where fa, fb, fc, etc are the individual chances of failure for each array member.

    The question we are asking, "what is the chance that at least one component drive in a RAID 0 array will fail?" is mathematically identical to asking, "what is the complement (opposite) of the chance that none of the component drives will fail?" The chance that a drive will not fail is the complement of the drive's chance to fail, or 1-fa. The probability that multiple independent events will occur simultaneously ("none of the drives will fail") is the product of those chances. So the probability that multiple independent events will NOT occur simultaneously is the complement of that product.
    Reply
  • MadAd - Tuesday, July 06, 2004 - link

    Theres lies, damn lies and statistics.

    The problem with probabilities are that it is a general model to make assumptions and not meant to replicate real world events.

    If I get 1 raffle ticket from a raffle of 100, then the probability is 1:100 that I will win. If I buy 2 tickets then thats 2:100 or 1 in 50 chance I will win. However in the worst case there are still 98 other tickets that could be drawn from the hat before one of mine and the 1 in 50 figure will only be realistic if we do lots and lots of raffles and calculate the results as a set.

    As far as MTBF is concerned, I would say that a way to more realsticaly plot the likelyhood of faliure of multiple units would be to analyse the values within the range of MTBF results, to 2 s.d. (2 standard deviation measures 95% of results).

    E.G. if MTBF is say 60 months, and 95% of the results fall within the 55 to 65 month range then while one drive is likely to last 60 months, either of 2 drives should last at least 57.5 months.

    Of course theres a chance that you get a dodgy one that fails in 10 months, that doesnt make it wrong, just that one was one that fell outside the 95% level on the curve.

    Now im sure ill get a roasting from statisticians for not following the rules exactly however as has already been mentioned previously, the notion that by buying 2 of something will halve its chances of enjoying a useful life is just nonsense in individual cases.
    Reply
  • masher - Tuesday, July 06, 2004 - link

    #80 says:
    > Sending the two seek commands versus one should
    > add negligeable time. The actual seeks would be
    > done concurrently. The rotational latencies on
    > each drive is independent. Therefore the time
    > to locate the data should be very close to the
    > same as for a single drive.

    The latencies for each drive are indepdent, yes...thats the very reason the overall latency is higher. Simple statistics. I'll give you a somewhat simplified explanation of why.

    A seek request sent to a single drive finds the disk in a random position, evenly distributed between (best_case_latency) and (worse_case_latency). The mean latency is therefore (best+worse)/2.

    Add a second drive to the picture now. On the average, half the time it will be faster than the first drive at a given request, and half the time slower. In the first case, the ARRAY speed is limited by the first drive. In the second case, the array is limited by disk two, which will be randomly distributed between (worst-best)/2 and (worst). The average in this case is therefore (3w-b)/4.

    Probability of first case = (1/2)
    Probability of second case = (1/2)

    Overall mean = (1/2)(w+b)/2 + (1/2)(3w-b)/4 = 5w+b/8.

    Assuming best case=0 and worst case=1, you get a mean seek for a single disk of 50%, and a mean seek for a two-disk array of 62%.
    Reply
  • mdrohn - Monday, July 05, 2004 - link

    WaltC says:

    "(3)Because RAID 0 employs two drives to form one combined drive, the probability of a RAID 0 drive failure is exactly twice as high as it is for a single drive."

    Nighteye2 is correct. The above quote contains a fundamental misstatement and does not correctly represent why RAID 0 multiplies failure rate. WaltC's entire ensuing argument is logically correct, but because it is based on the wrong premise it is not relevant to RAID 0 failure rates. The quote should have read as follows:

    "Because RAID 0 employs two drives to form one combined drive, the probability of a RAID 0 *_ARRAY_* failure is exactly twice as high as it is for a single drive."

    Having multiple disks in a RAID 0 array does not, as WaltC correctly says, affect an individual disk's chance of failure. But what is relevant to this subject is the failure of the array as a whole. Since in RAID 0 the component drives are linked together without any redundancy or backup, losing one component means that the entire array fails. Thus the chance of failure for a RAID 0 array is the probability at any given time that *_one_* of the component drives will fail. Assuming all the disks are identical, that chance is equal to the failure probability of one drive, multiplied by the number of drives.

    Let's take the car analogy. In WaltC's example the two cars are independent, autonomous vehicles. To make it a proper analogy to RAID 0, the two cars would have to be functionally linked so that they operate as one. Let's say you welded the two cars together side by side with steel bars to make one supervehicle. Then if the tires gave out on any one of the two component cars, the entire supervehicle would be stuck.
    Reply
  • Nighteye2 - Monday, July 05, 2004 - link

    If a single HD has a 50% chance of failing in 5 years, a RAID 0 array with 2 of those drives has a 50% chance of failing in about 4 years, dependant on the distribution of the failure probability function. Reply
  • Nighteye2 - Monday, July 05, 2004 - link

    #84, you should study failure theory better. RAID 0 in fact *does* double the chance of failure at any given time. However, this does not mean the MTBF is halved, because disk failure chances are time-dependant, and increase over time.
    Reply
  • Pumpkinierre - Sunday, July 04, 2004 - link

    #84, Even though I agree with some of your comments on the testing, the fact is that Anand was looking at Raid0 from the viewpoint of the desktop user/gamer which is the target audience of the AT website. So he is legitimate in using the tests that are relevant and understood by this target audience for testing HDD performance in both single and RAID combinations rather than specific HDD performance tests. He reaches similar conclusions to storagereviews.com's assessment of RAID use in the desktop environment. However, criticism about failure of testing other controllers (even if limited to onboard controllers) and RAID1 performance I feel are valid.

    With regards to the likelihood of failure of a component, it must be recognised that all processes in nature are stochastic (probabibility based). This is at the core of quantum mechanics. So all components have an associated probability of failure. That probability is lessened by better manufacturing, quality control, newness etc. but is always present. Naturally, the longer you use the HDD the greater the probability of failure due to wear etc.but it still is possible for it to fail in the first year (and this does happen). The warranty period doesnt mean your HDD is not going to fail, it means they will replace it if it fails. The laws of probability are clear, if you have two components with associated probabilities of failure, you must ADD the two probabilities if you want the probability of ANY ONE of them failing. So, in the case of using two new HDDs Raid O has double the probality of you losing your data to a single HDD.

    The consequence of the above (and having lost a HDD at 3yrs and 1day!) means to me, along with many others desktop users who fail to backup (despite having burners) because of laziness, that the oft forgotten Raid1 ought to be the prime candidate for the desktop. Here the probabilities are refined to simultaneous failure of the HDDs on any PARTICULAR day of the 3yr warranty period which is a different probability to failure of EITHER of the discs over the WHOLE 3years. Naturally, when one disc fails in Raid1, the desktop user gets off her butt and backs up on the day prior to any repair. The fact that Raid1 ought to be better at reads than even Raid0 (see my previous posts) is even greater reason to adopt this mode for the desktop (where writes are less used) but has been ignored by the IT community.


    Reply
  • TheCimmerian - Sunday, July 04, 2004 - link

    Thanks for DV capture stuff, PrinceGaz.

    Reply
  • WaltC - Sunday, July 04, 2004 - link

    There are so many basic errors in this article that it's difficult to know just where to start, but I'll wing it...;)

    From the article:

    "The overall SYSMark performance graph pretty much says it all - a slight, but completely unnoticeable, performance increase, thanks to RAID-0, is what buying a second drive will get you."

    Heh...;) Next time you review a 3d card you could use all of the "real world" benchmarks you selected for this article and conclude that there's "no difference in performance" between a GF4 and a 6800U, or an R8500 and an x800PE, too...;) That would be, of course, because none of these "real world" benchmarks you selected (Sysmark, Winstone, etc.) was created for the specific purpose of measuring 3d gpu performance. Rather, they measure things other than 3d-card performance, and so the kind of 3d card you install would have minimal to no impact at all on these benchmark scores. Likewise, in this case, it's the same with hard drive performance relative to to the functions measured by the "real world" benchmarks you used.

    Basically, overall Sysmark scores, for instance, may include possibly 10% (or less) of their weight in measuring the performance of the hard drive arrangements in the system tested. So, even if the mb/sec read from hard disk for RAID 0 is *double* that of normal single-drive IDE in the tested system, because of the fact that these benchmarks spend 90% or more of their time in the cpu and system ram doing things other than testing HD performance, these benchmarks may reflect only a tiny, near insignificant increase in overall performance between RAID 0 and single-drive IDE systems--which is exactly what you report.

    But that's because all of the "real world" benchmarks you used here are designed to tell you little to nothing specifically about hard-drive performance, just as they are not suitable for use in evaluating performance differences between 3d gpus, either. Your conclusions as I quoted them above, to the effect that these "real world" benchmark results prove that RAID 0 has no impact on "real world" performance, are therefore invalid. The problem is that the software you used doesn't specifically attempt to measure the real-world read & write performance of RAID 0, or even the performance of single-drive IDE for that matter, much less provide any basis from which to compare them and draw the conclusions you've reached.

    I'd recommend at this point that you return to your own article and carefully read the descriptions of the "real world" benchmarks you used, as quoted by you (verbatim in your article, direct from the purveyors of these "real world" benchmarks), and search for even one of them which declares: "The express purpose of this benchmark is to measure, in terms of mbs/sec, the real-world read and write performance of hard drives and their associated controllers." None of the "real-world" benchmarks you used make such a declaration of purpose, do they?

    Next, although I consider this really a minor footnote in comparison to the basic flaw in your review method here and the inaccuracies resulting in the inappropriate conclusions you've reached, I have to second what others have said in response to your article about the fact that if your intent is actually to at some point measure hard drive and controller read/write performance and to then draw conclusions and make general recommendations--that you be mindful that just as their are differences relative to performance among hard drives made by competing companies, there are also differences between the hard drive controllers different companies make, and this certainly applies to both standard single-drive IDE controllers as well as to RAID controllers. So I think you want to avoid drawing blanket conclusions based merely on even the appropriate testing for a single manufacturer's hard drive controller, regardless of whether it's a RAID controller or something else. One size surely doesn't fit all.

    As to your conclusions in this article, again, I'm also really surprised that you didn't logically consider their ramifications, apparently. I'm surprised it didn't occur to that if it was true that RAID 0 had no impact on read/write drive performance that it would also have to be true that Intel, nVidia (and all the other core-logic chip and HD-controller manufacturers to which this applies), not to mention controller manufacturers like Promise, are just wasting their time and throwing good money after bad in their development and deployment of RAID 0 controllers.

    I think you'll have to agree that this is an illogical proposition, and that all of these manufacturers clearly believe their RAID 0 implementations have a definite performance value over standard single-drive IDE--else the only kind of RAID development we'd see is RAID mirroring for the purpose of concurrent backup.

    In reading some of the responses in this thread, it's obvious that a lot of your readership really doesn't understand the real purpose of RAID 0, and views it as a "marketing gimmick" of some ill-defined and vague nature that in reality does nothing and provides no performance advantages over standard IDE controller support. I think it's unfortunate that you haven't served them in providing them with worthwhile information in this regard, but instead are merely echoing many of the myths that persist as to RAID 0, myths based in ignorance as opposed to knowledge. My opinion as to the value of RAID 0 is as follows:

    For years, ever since the first hard drives emerged, the chief barrier and bottleneck to hard drive performance has always been found within hard drives themselves, in the mechanisms that have to do with how hard drives work--platters, heads, rotational rate, platter size and density, etc. The bottleneck to IDE hard drive performance, measured in mbs/sec read & write performance, has actually never been the host-bus interface for the drive, and even today the vintage ATA100 bus interface is an average of 2x + faster than the fastest mass-market IDE drives you can buy, which average 30-50mbs/sec average in sustained read from the platters.

    Drives can "burst" today right up to the ceiling of the host-bus interface they support, but these transfer speeds only pertain to data in the drive's cache transferring to the host bus and do not apply to drive data which must be retrieved from the drive because it isn't in the cache--which is when we drop back to the maximums currently possibly with platter technology--30-50mbs/sec depending on the drive.

    Increases in platter density and rotational speeds, and increases in the amount of onboard cache in hard drives, have been the way that hard drive performance has traditionally improved. At a certain point--say 7,200 rpms for platter rotation--an equilibrium of sorts is reached in terms of economies of scale in the manufacture of hard drives, and pushing the platter rotational speed beyond that point--to 10,000 rpms and up-- results in marked diminishing returns both in price and performance, and the price of hard drives then begins to skyrocket in cost per megabyte (thermal issues and other things also escalate to further complicate things.) So the bottom line for mass-market IDE drives in terms of ultimate maximum performance is drawn both by cost and by the current SOA technical ceilings in hard drive manufacturing.

    Enter RAID 0 as a relatively inexpensive, workable, and reliable solution to the performance--and capacity--bottlenecks imposed in single-drive manufacturing. With RAID 0, striped according to the average file size that best fits the individual user's environment, it's fairly common to see read speeds (and sometimes write, too) in mbs/sec go to *double* that possible with either single drive used in a RAID 0 setup when you run it individually on a standard IDE controller, regardless of the host-bus interface.

    At home I've been running a total of 4 WD ATA100 100mb PATA drives for the last couple of years. Two of them--the older 2mb-cache versions--I run singly on IDE 0 as M/S through the onboard IDE controller, and the other two are 8mb-cache WD ATA100 100mb drives running in RAID 0 from a PCI Promise TX2K RAID controller as a single 200mb drive, out of which I have created several partitions.

    From the standpoint of Windows the two drives running through the Promise controller in RAID 0 are transparent and indistinguishable from the operation and management of a single 200mb physical hard drive. What I get from it is a 200mb drive with read/write performance up to double the speed possible with each single drive, a 200mb RAID 0 drive utilizing 16mbs of onboard drive cache, and I get a 200mb hard drive which formats and partitions and behaves just like an actual 200mb single drive but which costs significantly less (but not, to be fair, if I include the cost of the RAID controller--but I'm willing to pay it for performance ceilings just not possible with a current 200mb single IDE drive.)

    Here are some of the common myths about such a setup that I hear:

    (1) The RAID 0 performance benefit is a red herring because you don't always get double the performance of a single drive. It's so silly to say that, imo, since single-drive performance isn't consistent, either, as much depends on the platter location of the data in a single drive as to the speed at which it can be read, and so on, just as it does in a RAID drive. What's important to RAID 0 performance, and is certainly no red herring, is that read/write drive performance is almost always *higher* than the same drive run in single-drive operation on IDE, and can reach double the speed at various times, especially if the user has selected the proper stripe size for his personal environment.

    (2) RAID 0 is unsafe for routine use because the drives aren't mirrored. The fact is that RAID 0 is every bit as safe and secure as normal single-drive IDE use, as those aren't mirrored, either (which you'd think ought to be common sense, right?)...;) As with single-drive use, the best way to protect your RAID 0 drive data is to *back it up* to reliable media on a regular basis.

    On a personal note, one of my older WD's at home died a couple of weeks ago of natural causes--WD's diagnostic software showed the drive unable to complete both smart diagnostic checks, so I know the drive is completely gone. The failed drive was my IDE Primary slave, not one of the RAID drives. Apart from what I had backed up, I lost all the data on it, of course. Proves conclusively that single-drive operation is no defense against data loss...;)

    OTOH, in two+ years of daily RAID 0 operation, I have yet to lose data in any fashion from it, and have never had to reformat a RAID 0 drive partition because of data loss, etc. It has consistently functioned as reliably as my single IDE drives, and indeed my IDE single-drive failure was the first such failure I've had in several years with a hard drive, regardless of controller.

    If people would think rationally about it they'd understand that the drives connected to the RAID controller are the same drives when connected individually to the standard IDE controller, and work in exactly the same way. The RAID difference is a property of the controller, not the drive, and since the drives are the same, the probability of failure is exactly the same for a physical drive connected to a RAID controller and the same drive connected to an IDE controller. There's just no difference.

    (3)Because RAID 0 employs two drives to form one combined drive, the probability of a RAID 0 drive failure is exactly twice as high as it is for a single drive. This is another of those myths that circulates through rumor because people simply don't stop to think it through. While it is true that the addition of a second drive, whether it's added on the Primary IDE channel as a slave, or constitutes the second drive in a RAID 0 configuration, elevates the chance that "a drive" will fail slightly above the chance of failure presented by a single drive--since you now have two drives running instead of one--does this mean you now have increased the probability that a drive will fail by 100%? If you think about it that makes no sense because...

    If I install a single drive which, just for the sake of example, is of sufficient quality that I can reasonably expect it to operate daily for three years, and then I add another drive of exactly the same quality, how can I rationally expect both drives to operate reliably for anything less than three years, since the reliability of either drive is not diminished in the least merely by the addition of another drive just like it? I mean, how does it follow that adding in a second drive just like the first suddenly means I can expect a drive failure in 18 months, instead of three years?...;) Adding a second drive does not diminish the quality of the first, since the second drive is exactly like the first and is of equal quality, and hence both drives should theoretically be equal in terms of longevity.

    But the rumor mongering about RAID 0 is that adding in a second drive somehow means that the theoretical operational reliability of *each* drive is magically reduced by 50%...;) That's nonsense of course, since component failure is entirely an individual affair, and is not affected at all by the number of such components in a system. The best way to project component reliability, then, is not by the number of like components in a system, but rather by the *quality* of each of those components when considered individually. Considering components in "pairs," or in "quads," etc., tells us nothing about the likelihood that "a component" among them will fail.

    Look at the converse as proof: If I have two drives connected to IDE 0 as m/s, and I expect each of those drives to last for three years, does it follow logically that if I remove the slave drive that I increase the projected longevity of the master drive to six years?...;) Of course not--the projected longevity is the same, whether it's the master drive alone, or master and slave combined, because projected component longevity is calculated completely on an individual basis, and is unaffected entirely by the number of such components in a system. The fact is that I could remove the slave drive and the next day the master could fail...;) But that failure would have had nothing whatever to do with the presence or absence of the second drive.

    Putting it another way, does it follow that one 512mb DIMM in a system will last twice as long as two 512mb DIMMs in that system? If I have one floppy drive is it reasonable to expect that adding another just like it will cut the projected longevity of each floppy in half? If I have a motherboard with four USB ports, does it follow that by disabling three of them the theoretical longevity of the remaining USB port will be quadrupled? No? Well, neither does it follow that enabling all four ports will quarter the projected longevity of any one of them, either.

    Consider as well the plight of the hard drive makers if the numerical theory of failure likelihood had legs: if it was true that as the number of like components increases the odds for the failure of each of them increases by 100%, irrespective of individual component quality, then assembly-line manufacturing of the type our civilization depends on would have been impossible, since after manufacturing x-number of widgets they would all begin to fail...;)

    One last example: my wife and I each bought new cars in '98. Both cars included four factory-installed tires meeting the road. Flash forward four years--and I had replaced my wife's entire set of tires with an entirely different make of tire, because with her factory tires she suffered two tread separations while driving--no accidents though as she was very fortunate, and the other two constantly lost air inexplicably. All the difference with the new set. As for my factory tires, however, I'm still driving on them today, with tread to spare, and never a blow-out or leak since '98. The cars weigh nearly the same (mine is actually about 500lbs heavier), the cars are within 5,000 miles of each other in total mileage, and neither of us is lead-footed. Additionally, I serviced both cars every 3,000 miles with an oil change and tire rotation, balancing, inflation, etc.

    The stark variable between us, as it turned out, was that my factory-installed tires were of a much higher quality than her factory-installed tires, as I discovered when replacing hers. It's yet another example in reality of how the number of like components in a system is far less important than the quality of those components individually, when making projections as to when any single component among them might fail.

    Anyway, I think it would nice if we could move into the 21st century when talking about RAID 0, and realize that crossing ourselves, throwing salt over a shoulder, or avoiding walking under ladders won't add anything in the way of longevity to our individual components, nor will this behavior in any way serve to reduce that longevity, which is endemic to the quality of the component, regardless of number. Given time, all components will fail, but when they fail, they always fail individually, and being one of many has nothing to do with it, but being crappy has everything to do with it, which is the point to remember...;)
    Reply

Log in

Don't have an account? Sign up now