Disk strategies

With magnetic disks, there are two strategies to get good OLTP or mail server performance. The "traditional way" is to combine a number of 15000RPM SAS "spindles", all working in parallel. The more "rebellious way" or "Google way" is to use a vast number of cheaper SATA drives. This last strategy is based on the observation that although SATA drives come with higher access times, you can buy more SATA spindles than SAS spindles for the same price. While Google opted for desktop drives, we worked with what we had in the lab: 16 enterprise 1TB Western Digital drives. Since these are one of the fastest 7200RPM drives that can be found on the market, it should give you a good idea what an array with lots of SATA drives can do compared to one with fewer fast spinning SAS drives.

SSDs add a new strategy: if space is not your primary problem, you can trade in storage space for huge amounts of random I/O operations per second, requiring fewer but far more expensive drives to obtain the same performance. SSDs offer superb read access times but slightly less impressive write access times.

As Anand has pointed out, a cheap SSD controller can really wreak havoc on writing performance, especially in a server environment where many requests are issued in parallel. EMC solved this with their high-end Enterprise Flash Disks, produced by STEC, which can store up to 400GB and come with a controller with excellent SRAM caches and a super capacitor. The super capacitor enables the controller to empty the relatively large DRAM caches and write the date to the flash storage in the event of a sudden power failure.

Intel went for the midrange market, and gave its controller less cache (16MB). The controller is still intelligent and powerful enough to crush the competition with the cheap JMicron JMF602-controllers. We check out the SLC version, the Intel X25-E SLC 32GB.

The newest Intel Solid State Disks with their access times of 0.075 ms and 0.15W power consumption could change the storage market for OLTP databases. However, the SLC drives have a few disadvantages compared to the best SAS drives out there:

  • No dual ports
  • The price per GB is 13 times higher

You can see the summary in the table below.

Enterprise Drive Pricing
Drive Interface Capacity Pricing Price per GB
Intel X25-E SLC SATA 32GB $415-$470 $13
Intel X25-E SLC SATA 64GB $795-$900 $12
Seagate Cheetah 15000RPM SAS 300GB $270-$300 $0.90
Western Digital 1000FYPS SATA 1000GB $190-$200 $0.19

If you really need capacity, SATA or even SAS drives are probably the best choice. On the other hand, if you need spindles to get more I/O per second, it will be interesting to see how a number of SAS or SATA drives compares to the SLC drives. The most striking advantages of the Intel X25-E SLC drive are extremely low random access times, almost no power consumption at idle, low power consumption at full load, and high reliability.

Enterprise Drive Specifications
Drive Read Access Time Write Access Time Idle Power Full Power MTBF
(hours)
Intel X25-E SLC 32GB 0.075 ms 0.085 ms 0.06 W 2.4 W 2 million
Intel X25-E SLC 64GB 0.075 ms 0.085 ms 0.06 W 2.6 W 2 million
Seagate Cheetah 15000RPM 5.5 ms (*) 6 ms 14.3 W 17 W 1.4 million
Western Digital 1000FYPS 13 ms (**) n/a 4 W 7.4 W 1.2 million

(*) 5.5 ms = 3.5 ms seek time + 2 ms latency (rotation)
(**) 13 ms = 8.9 ms seek time + 4.1 ms latency (rotation)

Reliability testing is outside the scope of this document, but if only half of the Intel claims are true, the x25-E SLC drives will outlive the vast majority of magnetic disks. First is the 2 million MTBF specification, which is far better than the best SAS disks on the market (1.6 million hour MTBF). Intel also guarantees that if the X25-E performs 7000 8KB random access per second, consisting of 66% reads and 33% writes, the drive will continue to do so for 5 years! That is 2.9TB of written data per day, and it can sustain this for about 1800 days. That is simply breathtaking as no drive has to sustain that kind of IOPS 24 hours per day for such a long period.

Index Configuration and Benchmarking Setup
POST A COMMENT

66 Comments

View All Comments

  • marraco - Wednesday, March 25, 2009 - link

    The comparison is not fair, but can be fairer:

    If the RAID of SATA/SAS disks is restricted to the same storage capacity than the SSD, limiting the partition to the fastest external tracks/cilynders, the latency is significantly reduced, and average read/write speed is significantly increased, so

    PLEASE, PLEASE, PLEASE

    Repeat the benchmarcks, but with short stroking for magnetic disks.
    Reply
  • JohanAnandtech - Friday, March 27, 2009 - link

    May I ask what the difference with the fact that we created a relatively small partition across our RAID-5 raidset? Also, you can imagine that our 23 GB database was at the outer tracks of the disks. I have to verify, but that seems logical.

    This kind of testing should give the same effects as short stroking. I personally think Short stroking can not be good for your actuator, while a small partition should be no problem.
    Reply
  • marraco - Friday, March 27, 2009 - link

    See this link.
    http://www.tomshardware.com/reviews/short-stroking...">http://www.tomshardware.com/reviews/short-stroking...

    Clearly, you results are orders of magnitude than those showed on that benchmark.

    As I understand, short stroking increase actuator health, because reduces physical acceleration on the actuator.

    Anything necessary, is to use a small partition on the fastest external track.

    you utilized a raid 0 of 16 disks, with less than 1000 gb/second.

    On Tomshardware, a raid of only 4 disk achieved average (not maximun) 1400 to 1600 Mb/s. (of course, the test are not the same; for that reason, I ask for new test)

    About the RAID 5: I would love to see RAID 0.

    I are interesed on comparing a fast SSD as the intels, (or OCZ Vostro/Summit), with what can be achieved at the same cost, with magnetic media, if the partition size is restricted to the same total capacity than the SSD.

    Anyway, thanks for the article. Good work.

    So good, I want to see more :)
    Reply
  • marraco - Sunday, April 05, 2009 - link

    Please, tell me you are preparing such article :) Reply
  • JohanAnandtech - Tuesday, April 07, 2009 - link

    We are investigating the issue. I like to have some second opinions before I start heavy benchmarking on THG article. They tend to be sensational... Reply
  • araczynski - Wednesday, March 25, 2009 - link

    wow, color me impressed. all the more reason to upgrade everything to gigabit and fiber. Reply
  • BailoutBenny - Tuesday, March 24, 2009 - link

    Can we get any updates on the future of chalcogenide glass (phase change) based drive technologies? IBM's Millipede and other MEMS probe storage devices? Any word about Intel and STMicroelectronics' shipments of PRAM samples to customers that happened last year? What do the rumor mills say? Are these technologies proving viable? It is difficult to formulate a coherent picture for these technologies without being an industry insider. Reply
  • Black Jacque - Tuesday, March 24, 2009 - link

    RAID 5 in Action

    ... However, it is rarely if ever used for any serious application.

    You are obviously not a SAN Admin or know too much about enterprise level storage.

    RAID 5 is the mainstay of block-level storage systems by companies like EMC.

    In addition, the article mentions STEC EFDs used by EMC. On the EMC CLARiiON line, those EFDs are provisioned in RAID 5 groups.


    Reply
  • spikespiegal - Wednesday, March 25, 2009 - link

    [quote]RAID 5 is the mainstay of block-level storage systems by companies like EMC. [/quote]

    Which thus explains why in this day in age I see so many SANs blowing entire volumes and costing days of restoration when the room temp gets a few degrees above ambient.

    Corrupted RAID 5 arrays have cost me more lost enterprise data than all the non-RAID client side disks I've ever replaced; iSeries, all brands of x386, etc. EMC has a great script to account for this in which they always blame the drives first, then only when cornered by an enraged CIO will they admit it's their controllers. Been there...done that...for over a decade in many different industries.

    If you haven't been burned by RAID 5, or dare claim a drive controller in RAID 5 mode has a better MTBF than the drives it's hosting, then it's time to quite your day job at the call center in India. RAID 5 saves you the cost of one drive every four, which was logical in 1998 but not today. At least span across multiple redundant controllers in RAID 10 or something....
    Reply
  • JohanAnandtech - Tuesday, March 24, 2009 - link

    I fear you misread that sentence:

    "RAID 0 is good way to see how adding more disks scales up your writing and reading performance. However, it is rarely if ever used for any serious application."

    So we are talking about RAID-0 not RAID-5.
    http://it.anandtech.com/IT/showdoc.aspx?i=3532&...">http://it.anandtech.com/IT/showdoc.aspx?i=3532&...

    Reply

Log in

Don't have an account? Sign up now