Western Digital's Raptors in RAID-0: Are two drives better than one?by Anand Lal Shimpi on July 1, 2004 12:00 PM EST
- Posted in
Doubling Theoretical Performance: RAID-0For those of you who are already familiar with RAID and how it works, go ahead and skip to the benchmarks; these next two pages are designed to serve as brief introductions to the two most common forms of RAID on the desktop: RAID-0 and RAID-1.
Otherwise known as striping, RAID-0 is the only performance-enhancing form of RAID that we'll be talking about in this article. The premise behind striping is simple. Data being written to a drive is split into "stripes", generally 16 - 256KB in size, with each stripe being written to a different drive in the array. For example, say we were dealing with a 2-drive RAID-0 array with a stripe size of 128KB and we wanted to write 256KB of data; drive 0 would get the first 128KB of data written to it, and drive 1 would get the remaining 128KB.
Writing to a single hard disk
Writing to a two-disk RAID-0 array
Here, you can see that the write performance of RAID-0 can be almost double that of a single drive, since twice as much data gets written at the same time. The higher write performance is obtained at the expense of some controller overhead, since the RAID controller has to handle splitting up data into stripes before sending it to the drives themselves - but with modern day microprocessors being as fast as they are, the overhead is usually thought of as negligible.
Reading works the exact same way, but in reverse. Say that we want to read that same 256KB of data back; we pull one stripe from drive 0 and the other stripe from drive 1. The read is now completed in half the time, theoretically doubling performance.
We are careful to use the word "theoretical" because the performance advantages of RAID-0 disappear quickly if we're not dealing in ideal situations like the ones we just described. If too large of a stripe size is used, then the performance advantages of RAID-0 can be lost, while too small of a stripe size could result in excess overhead, reducing the performance improvement of the striped array.
We have seen in the past that for most desktop applications, the largest stripe size that a desktop RAID controller will offer is usually the best choice for performance. With Intel's ICH5/6, that translates into a 128KB stripe size, which for our comparison is what we decided to go with. The other stripe size options didn't offer any better performance for our desktop test suite.
The main downside to RAID-0, other than cost, is reliability. The size of a RAID-0 array is the sum of all of its members; so, two 100GB drives in a RAID-0 array will give you one array with a 200GB total capacity. Unfortunately, if you lose any one of the drives in the array, all of your data is lost and isn't recoverable. Since two drives are working in tandem and are both necessary to hold your data, you effectively halve the mean time between failure by moving to a two-drive RAID-0 array.
Post Your CommentPlease log in or sign up to comment.
View All Comments
timw - Thursday, July 8, 2004 - linkThis isn't really anything new. As someone else mentioned, seek time and cache size with the right firmware optimizations are the most important. RAID 0 won't be able to improve that, and may actually be slower than a single drive in many instances. If you don't believe what Anandtech has to say, take a look at the latest article at storagereview.com.
Pumpkinierre - Thursday, July 8, 2004 - linkWrong again, mdrohn, Australia to be exact, but I was using an Oxford dictionary. The hyperdictionary.com (USA based I think as it advertises cheap dental insurance for US residents) gives redundancy as:
Definition: [n] repetition of an act needlessly
[n] the attribute of being superfluous and unneeded; "the use of industrial robots created redundancy among workers"
[n] (electronics) a system design that duplicates components to provide alternatives in case one component fails
[n] repetition of messages to reduce the probability of errors in transmission
Which agrees with both our definitions. Yours is more correct electronically. Hyperdictionary has a more extensive electronic description but doesnt add much more to the above electronic definition:
I'm at odds with that electronic meaning of redundancy. After all the language came before the electronics.
mdrohn - Thursday, July 8, 2004 - linkHeh, I guess that means you are writing from Britain, Pumpkinierre. That special meaning of 'redundancy' in the workplace context of someone losing their job is unknown here in the USA. In fact I'd never heard it in my life until watching 'The Office' on DVD last month ;) We call that layoffs or downsizing here.
The electronics/systems meaning I posted was also taken straight from a dictionary.
Pumpkinierre - Thursday, July 8, 2004 - linkYour job becomes redundant when you are no longer of use. You then get a redundancy payout based on the years worked etc.. So I think they initially used the term to describe drives that were no longer of use as they had been replaced by newer bigger drives. My dictionary has redundant as meaning superfluous which is more like your definition, #100, and I suppose you could regard a backup drive as such....until your main drive lets go. So, I dont like the usage of redundancy for duplexed or mirrored drives.
mdrohn - Wednesday, July 7, 2004 - link"Redundancy means of no further use."
Actually, 'redundant' more precisely means 'exceeding what is required' or 'exactly duplicating the function or meaning of another', which is an important distinction.
'Redundancy' in an electronics or systems context means 'incorporating extra components that perform the same function in order to cope with failures and errors'. Thus RAID 0 is not, strictly speaking, a 'redundant' array of disks despite its RAID name, since every drive in RAID 0 records different data. RAID 1 is classic redundancy--all the drives in a RAID 1 array are reading and writing exactly the same data.
Pumpkinierre - Wednesday, July 7, 2004 - linkI dont know about that latency increase with RAID. Seek times dont seem to be much affected in reviews I've seen. If the controller reads the drives simultaneously then there shouldnt be much effect on latency.
(will it make it to the 6th page?!)
masher - Wednesday, July 7, 2004 - link> I'm fairly certain that the performance
> advantages of having 4 or 5 striped drives are
> likely to be a lot better than just 2...
No, not for ATA drives. You're still limited to the max bandwidth of 133MB/s (150 for some SATA implementations), so beyond 3 drives you don't get the full transfer rate of each drive. Plus, the latency gets worse the more drives you add...with 5 or more drives, your mean latency is essentially your max latency of any single drive.
So a larger array is a repeat of the 2-drive situation. Its much faster in the rare case of a disk-bound app transferring huge files...and no faster (or possibly slightly slower) the rest of the time.
You are right on one thing though. Cheaper (translation: slower) disks would tend to look a bit better here. Not a huge difference, but the slower the disk, the more likely the app is to be disk-bound.
kapowaz - Wednesday, July 7, 2004 - linkPerhaps the review ought to have pointed out what RAID stands for: Redundant Array of *Inexpensive* Disks. The idea is to improve the performance or reliability of a system by using many smaller/cheaper disks compared to using a single expensive disk. Often this isn't the case (I doubt anyone would say the 15krpm disks in modern servers are 'inexpensive'), but the origin behind the technology remains applicable today.
Maybe a better test would be to take some cheap disks and see how well they perform. Also, am I not right in thinking that SATA RAID allows for more than just two devices? I'm fairly certain that the performance advantages of having 4 or 5 striped drives are likely to be a lot better than just 2...
masher - Wednesday, July 7, 2004 - linkUmm, MadAd...The "array" of drives can't fail unless a drive in it fails...and it will always fail if one of its drives does. Its just a logical grouping, not a separate entity.
Furthermore, MTBF is the wrong statistic to use here. MTTF is the relevant one.
Obviously the raid controller itself could fail, but this is outside the scope of the argument. And such a failure is highly unlikely to impact data in any case.
MadAd - Tuesday, July 6, 2004 - link"But when we are talking about an ARRAY of drives, the operating life of each individual drive in the array is not what is at issue. What is relevant is ARRAY failure, not DRIVE failure."
Are you also trying to say that if the array fails without a drive failing then thats still down to the drives MTBF? wouldnt that be an array MTBF?
If a drive fails in Raid0 then of course we expect the array will fail. If a drive does not fail but the array fails (and you can reuse the drive) then thats nothing to do with the drives MTBF is it? The drives not failed, its still got service life, the array failed. You'll need a different way to measure the chance of an array failure since (unless its connected with a drive failure) its nothing to do with the expected longevity of the components that we measure by drive manufacturers MTBF figures of service life.