Doubling Theoretical Performance: RAID-0

For those of you who are already familiar with RAID and how it works, go ahead and skip to the benchmarks; these next two pages are designed to serve as brief introductions to the two most common forms of RAID on the desktop: RAID-0 and RAID-1.

Otherwise known as striping, RAID-0 is the only performance-enhancing form of RAID that we'll be talking about in this article. The premise behind striping is simple. Data being written to a drive is split into "stripes", generally 16 - 256KB in size, with each stripe being written to a different drive in the array. For example, say we were dealing with a 2-drive RAID-0 array with a stripe size of 128KB and we wanted to write 256KB of data; drive 0 would get the first 128KB of data written to it, and drive 1 would get the remaining 128KB.




Writing to a single hard disk




Writing to a two-disk RAID-0 array


Here, you can see that the write performance of RAID-0 can be almost double that of a single drive, since twice as much data gets written at the same time. The higher write performance is obtained at the expense of some controller overhead, since the RAID controller has to handle splitting up data into stripes before sending it to the drives themselves - but with modern day microprocessors being as fast as they are, the overhead is usually thought of as negligible.

Reading works the exact same way, but in reverse. Say that we want to read that same 256KB of data back; we pull one stripe from drive 0 and the other stripe from drive 1. The read is now completed in half the time, theoretically doubling performance.

We are careful to use the word "theoretical" because the performance advantages of RAID-0 disappear quickly if we're not dealing in ideal situations like the ones we just described. If too large of a stripe size is used, then the performance advantages of RAID-0 can be lost, while too small of a stripe size could result in excess overhead, reducing the performance improvement of the striped array.

We have seen in the past that for most desktop applications, the largest stripe size that a desktop RAID controller will offer is usually the best choice for performance. With Intel's ICH5/6, that translates into a 128KB stripe size, which for our comparison is what we decided to go with. The other stripe size options didn't offer any better performance for our desktop test suite.

The main downside to RAID-0, other than cost, is reliability. The size of a RAID-0 array is the sum of all of its members; so, two 100GB drives in a RAID-0 array will give you one array with a 200GB total capacity. Unfortunately, if you lose any one of the drives in the array, all of your data is lost and isn't recoverable. Since two drives are working in tandem and are both necessary to hold your data, you effectively halve the mean time between failure by moving to a two-drive RAID-0 array.

Index Putting the Redundancy in RAID: RAID-1
POST A COMMENT

126 Comments

View All Comments

  • Pumpkinierre - Thursday, July 08, 2004 - link

    Wrong again, mdrohn, Australia to be exact, but I was using an Oxford dictionary. The hyperdictionary.com (USA based I think as it advertises cheap dental insurance for US residents) gives redundancy as:

    Definition: [n] repetition of an act needlessly
    [n] the attribute of being superfluous and unneeded; "the use of industrial robots created redundancy among workers"
    [n] (electronics) a system design that duplicates components to provide alternatives in case one component fails
    [n] repetition of messages to reduce the probability of errors in transmission

    Which agrees with both our definitions. Yours is more correct electronically. Hyperdictionary has a more extensive electronic description but doesnt add much more to the above electronic definition:
    http://www.hyperdictionary.com/dictionary/redundan...

    I'm at odds with that electronic meaning of redundancy. After all the language came before the electronics.
    Reply
  • mdrohn - Thursday, July 08, 2004 - link

    Heh, I guess that means you are writing from Britain, Pumpkinierre. That special meaning of 'redundancy' in the workplace context of someone losing their job is unknown here in the USA. In fact I'd never heard it in my life until watching 'The Office' on DVD last month ;) We call that layoffs or downsizing here.

    The electronics/systems meaning I posted was also taken straight from a dictionary.
    Reply
  • Pumpkinierre - Thursday, July 08, 2004 - link

    Your job becomes redundant when you are no longer of use. You then get a redundancy payout based on the years worked etc.. So I think they initially used the term to describe drives that were no longer of use as they had been replaced by newer bigger drives. My dictionary has redundant as meaning superfluous which is more like your definition, #100, and I suppose you could regard a backup drive as such....until your main drive lets go. So, I dont like the usage of redundancy for duplexed or mirrored drives.
    Reply
  • mdrohn - Wednesday, July 07, 2004 - link

    "Redundancy means of no further use."

    Actually, 'redundant' more precisely means 'exceeding what is required' or 'exactly duplicating the function or meaning of another', which is an important distinction.

    'Redundancy' in an electronics or systems context means 'incorporating extra components that perform the same function in order to cope with failures and errors'. Thus RAID 0 is not, strictly speaking, a 'redundant' array of disks despite its RAID name, since every drive in RAID 0 records different data. RAID 1 is classic redundancy--all the drives in a RAID 1 array are reading and writing exactly the same data.
    Reply
  • Pumpkinierre - Wednesday, July 07, 2004 - link

    I dont know about that latency increase with RAID. Seek times dont seem to be much affected in reviews I've seen. If the controller reads the drives simultaneously then there shouldnt be much effect on latency.

    (will it make it to the 6th page?!)
    Reply
  • masher - Wednesday, July 07, 2004 - link

    > I'm fairly certain that the performance
    > advantages of having 4 or 5 striped drives are
    > likely to be a lot better than just 2...

    No, not for ATA drives. You're still limited to the max bandwidth of 133MB/s (150 for some SATA implementations), so beyond 3 drives you don't get the full transfer rate of each drive. Plus, the latency gets worse the more drives you add...with 5 or more drives, your mean latency is essentially your max latency of any single drive.

    So a larger array is a repeat of the 2-drive situation. Its much faster in the rare case of a disk-bound app transferring huge files...and no faster (or possibly slightly slower) the rest of the time.

    You are right on one thing though. Cheaper (translation: slower) disks would tend to look a bit better here. Not a huge difference, but the slower the disk, the more likely the app is to be disk-bound.
    Reply
  • kapowaz - Wednesday, July 07, 2004 - link

    Perhaps the review ought to have pointed out what RAID stands for: Redundant Array of *Inexpensive* Disks. The idea is to improve the performance or reliability of a system by using many smaller/cheaper disks compared to using a single expensive disk. Often this isn't the case (I doubt anyone would say the 15krpm disks in modern servers are 'inexpensive'), but the origin behind the technology remains applicable today.

    Maybe a better test would be to take some cheap disks and see how well they perform. Also, am I not right in thinking that SATA RAID allows for more than just two devices? I'm fairly certain that the performance advantages of having 4 or 5 striped drives are likely to be a lot better than just 2...
    Reply
  • masher - Wednesday, July 07, 2004 - link

    Umm, MadAd...The "array" of drives can't fail unless a drive in it fails...and it will always fail if one of its drives does. Its just a logical grouping, not a separate entity.

    Furthermore, MTBF is the wrong statistic to use here. MTTF is the relevant one.

    Obviously the raid controller itself could fail, but this is outside the scope of the argument. And such a failure is highly unlikely to impact data in any case.
    Reply
  • MadAd - Tuesday, July 06, 2004 - link

    "But when we are talking about an ARRAY of drives, the operating life of each individual drive in the array is not what is at issue. What is relevant is ARRAY failure, not DRIVE failure."

    Are you also trying to say that if the array fails without a drive failing then thats still down to the drives MTBF? wouldnt that be an array MTBF?

    If a drive fails in Raid0 then of course we expect the array will fail. If a drive does not fail but the array fails (and you can reuse the drive) then thats nothing to do with the drives MTBF is it? The drives not failed, its still got service life, the array failed. You'll need a different way to measure the chance of an array failure since (unless its connected with a drive failure) its nothing to do with the expected longevity of the components that we measure by drive manufacturers MTBF figures of service life.
    Reply
  • Pumpkinierre - Tuesday, July 06, 2004 - link

    Yes that's correct #93. Its the data that counts. The probability of failure of one drive in a Raid0 OR Raid1 over a given period is the same. For two drive Raids, this is double the probability of a single drive failure over the same period if all drives are the same at start of functioning(ie same prob. of failure). In Raid0, probability of LOSS OF DATA corresponds to this doubled single drive failure probability. However, in Raid1, the parameters change. Here, it is the probability of both drives failing on the SAME day in a given period (assuming backup can be completed in a day). This probability is much, much lower than a single HDD or Raid0 data loss probability which is ANY day of a given period.
    This makes Raid1 the superior Raid for desktop use despite the apparent loss of capacity. With cheap 160GB around, I dont think that's a problem (I got a 120 and its not a third full and I dont backup because I'm lazy and evil). Read requests in Raid1 ought to be faster than Raid0 as variable size virtual striping could be carried out on this raid format. Unfortunately, they used to stripe Raid1 but dont anymore relegating it to the duplexing or mirroring role. Reads apparently are only improved in modern Raid1 when a simultaneous multiple read requests are initiated. Here the controller's extra buffering and ability to read the Raid drives simultaneously at different locations helps out. Once again good for Servers where this is a common requirement but not good for desktops where a striped read would be of far greater use for the speed it brings. We really need Arnie on this one- the broom and the Gatling!

    Redundancy means of no further use. A backup drive isnt of no use. So redundancy doesnt mean backup despite how some people use the term to describe Raid1. RAID which stands (I teenk) for Redundant Array of Independent Drives was initially a method of combining older (hence smaller) drives into one big drive. That saved them from being thrown out ie redundant.
    Reply

Log in

Don't have an account? Sign up now