Back to Article

  • shady28 - Sunday, November 15, 2009 - link

    I would have really like to see single drive performance of SAS 15K drives vs SSDs. The cost of a SAS controller ($60) + a 15K 150Gig drive ($110-$160) is less than any of the high end SSDs, and about the same as a low end SSD. It's a viable option to get a 15K Drive, but very difficult to see what is the best choice when looking at RAID configs and database IOPs.
  • newriter27 - Tuesday, May 05, 2009 - link

    What was the Queue Depth setting used with IOmeter? Was it maintained consistently?

    Also, how come no response times?

  • mikeblas - Friday, April 17, 2009 - link

    Intel has posted a firmware upgrade for their SSD drives which tries to address the write leveling problem. The patch improves matters, somewhat, but the overall performance level from the drives is still completely unacceptable for production applications.

    You can find it here:">
  • Lifted - Sunday, April 12, 2009 - link

    I like it! Reply
  • turrican2097 - Monday, March 30, 2009 - link

    Please mention or correct this on your article.
    1) You should mention that the price per GB is 65x higher than the 1TB drives, since you chose to include them.
    2) Your WD is a poor performance 5400RPM Green Power drive:">
    3) If you make such a strong point on how much faster SSDs are than platters, you can't pick the best SSD and then use the hardrives you happen to have laying around the lab. Pick Velociraptors or WD RE3 7200RPM and then Seagate 15K7.

    Thank you
  • mutantmagnet - Monday, April 06, 2009 - link

    It's irrelevant. Raptors don't outperform SAS which are better in terms of performance for the GB paid for. There's no need to belittle them when they are clearly aware of the type of point you are making and went beyond it.

    So far I've found these recent SSD articles to be a fun and worthwhile read; and the comments have been invaluable, even if some people sound a little too aggressive in making their points.
  • virtualgeek - Friday, March 27, 2009 - link

    Just wanted to point this out - we are now shipping these 200GB and 400GB SLC-based STEC drives in EMC Symmetrix, CLARiiON and Celerra. These are the 2nd full generation of EFDs.

    Gang - this IS the future of performance-oriented storage (not implying it will be EMC-unique - it won't be - everyone will do it - from the high end to the low end) - only a matter of time (we're currently at the point where they are 1/3 the acquisition cost to hit a given IOPS workload - and they have dropped by a factor of 4x in ONE YEAR).

    With Intel and Samsung entering to the market full force - the price/performance/capacity curve will continue to accelerate.
  • ms0815 - Friday, March 27, 2009 - link

    Since modern Graphic cards crack passwords more than 10 times faster than a CPU, wouldn't they also be greate Raid Controllers with their massive paralel design? Reply
  • Casper42 - Thursday, March 26, 2009 - link

    I would have liked to have seen 2 additional drives tossed into the mix on this one.

    1) The Intel X25-M - Because I think it would serve as a good middleground between the SAS Drives and the E model. Cheaper/GB but still gets you a much faster Random Read result and I'm sure a slightly faster Random Write as well.

    2) 2.5" SAS Drives - Because mainstream servers like HP and Dell seem to be going more and more this direction. I don't know many Fortune 500s using Supermicro. 2.5" SAS goes up to 72GB for 15K and 300GB for 10K currently. Though I am hearing that 144GB 15K models are right around the corner.

    Thanks for an interesting article!
  • MrSAballmer - Thursday, March 26, 2009 - link

    SDS with ATA!">">
  • marraco - Wednesday, March 25, 2009 - link

    The comparison is not fair, but can be fairer:

    If the RAID of SATA/SAS disks is restricted to the same storage capacity than the SSD, limiting the partition to the fastest external tracks/cilynders, the latency is significantly reduced, and average read/write speed is significantly increased, so


    Repeat the benchmarcks, but with short stroking for magnetic disks.
  • JohanAnandtech - Friday, March 27, 2009 - link

    May I ask what the difference with the fact that we created a relatively small partition across our RAID-5 raidset? Also, you can imagine that our 23 GB database was at the outer tracks of the disks. I have to verify, but that seems logical.

    This kind of testing should give the same effects as short stroking. I personally think Short stroking can not be good for your actuator, while a small partition should be no problem.
  • marraco - Friday, March 27, 2009 - link

    See this link.">

    Clearly, you results are orders of magnitude than those showed on that benchmark.

    As I understand, short stroking increase actuator health, because reduces physical acceleration on the actuator.

    Anything necessary, is to use a small partition on the fastest external track.

    you utilized a raid 0 of 16 disks, with less than 1000 gb/second.

    On Tomshardware, a raid of only 4 disk achieved average (not maximun) 1400 to 1600 Mb/s. (of course, the test are not the same; for that reason, I ask for new test)

    About the RAID 5: I would love to see RAID 0.

    I are interesed on comparing a fast SSD as the intels, (or OCZ Vostro/Summit), with what can be achieved at the same cost, with magnetic media, if the partition size is restricted to the same total capacity than the SSD.

    Anyway, thanks for the article. Good work.

    So good, I want to see more :)
  • marraco - Sunday, April 05, 2009 - link

    Please, tell me you are preparing such article :) Reply
  • JohanAnandtech - Tuesday, April 07, 2009 - link

    We are investigating the issue. I like to have some second opinions before I start heavy benchmarking on THG article. They tend to be sensational... Reply
  • araczynski - Wednesday, March 25, 2009 - link

    wow, color me impressed. all the more reason to upgrade everything to gigabit and fiber. Reply
  • BailoutBenny - Tuesday, March 24, 2009 - link

    Can we get any updates on the future of chalcogenide glass (phase change) based drive technologies? IBM's Millipede and other MEMS probe storage devices? Any word about Intel and STMicroelectronics' shipments of PRAM samples to customers that happened last year? What do the rumor mills say? Are these technologies proving viable? It is difficult to formulate a coherent picture for these technologies without being an industry insider. Reply
  • Black Jacque - Tuesday, March 24, 2009 - link

    RAID 5 in Action

    ... However, it is rarely if ever used for any serious application.

    You are obviously not a SAN Admin or know too much about enterprise level storage.

    RAID 5 is the mainstay of block-level storage systems by companies like EMC.

    In addition, the article mentions STEC EFDs used by EMC. On the EMC CLARiiON line, those EFDs are provisioned in RAID 5 groups.

  • spikespiegal - Wednesday, March 25, 2009 - link

    [quote]RAID 5 is the mainstay of block-level storage systems by companies like EMC. [/quote]

    Which thus explains why in this day in age I see so many SANs blowing entire volumes and costing days of restoration when the room temp gets a few degrees above ambient.

    Corrupted RAID 5 arrays have cost me more lost enterprise data than all the non-RAID client side disks I've ever replaced; iSeries, all brands of x386, etc. EMC has a great script to account for this in which they always blame the drives first, then only when cornered by an enraged CIO will they admit it's their controllers. Been there...done that...for over a decade in many different industries.

    If you haven't been burned by RAID 5, or dare claim a drive controller in RAID 5 mode has a better MTBF than the drives it's hosting, then it's time to quite your day job at the call center in India. RAID 5 saves you the cost of one drive every four, which was logical in 1998 but not today. At least span across multiple redundant controllers in RAID 10 or something....
  • JohanAnandtech - Tuesday, March 24, 2009 - link

    I fear you misread that sentence:

    "RAID 0 is good way to see how adding more disks scales up your writing and reading performance. However, it is rarely if ever used for any serious application."

    So we are talking about RAID-0 not RAID-5.">

  • Rasterman - Monday, March 23, 2009 - link

    since the controller is the bottleneck for ssd and you have very fast cpus, did you try testing a full software raid array, just leave the controllers out of it all together?. Reply
  • Snarks - Sunday, March 22, 2009 - link

    reading the comments made my brain asplode D:!

    Damn it, it's way to late for this!
  • pablo906 - Saturday, March 21, 2009 - link

    I've loved the stuff you put out for a long long time. This another piece of quality work. I definitely appreciate the work you put into this stuff. I was thinking about how I was going to build the storage back end for a small/medium virtualization platform and this is definitely swaying some of my previous ideas. It really seems like an EMC enclosure may be in our future instead of a something built by me on a 24 Port Areca Card.

    I don't know what all the hubub was about at the beginning of the article but I can tell you that I got what I needed. I'd like to see some follow ups in Server Storage and definitely more Raid 6 info. Any chance you can do some serious Raid Card testing, that enclosure you have is perfect for it (I've built some pretty serious storage solutions out of those and 24 port Areca cards) and I'd really like to see different cards and different configurations, numbers of drives, array types, etc. tested.
  • rbarone69 - Friday, March 20, 2009 - link

    Great work on these benchmarks. I have found very few other sources that provided me with the answers to my questions regarding exaclty what you tested here (DETAILED ENOUGH FOR ME). This report will be referenced when we size some of our smaller (~40-50GB but heavily read) central databases we run within our enterprise.

    It saddens me to see people that simply will NEVER be happy, no matter what you publish to them for no cost to them. Fanatics have their place but generally cost organizations much more than open minded employees willing to work with what they have available.
  • JohanAnandtech - Saturday, March 21, 2009 - link

    Thanks for your post. A "thumbs up" post like yours is the fuel that Tijl and I need to keep going :-). Defintely appreciated!

  • classy - Friday, March 20, 2009 - link

    Nice work and no question ssds are truly great performers, but I don't see them being mainstream for several more years in the enterprise world. One is no one knows how relaible they are? They are not tried and tested. Two and three go hand in hand, capapcity and cost. With the need for more and more storage, the cost for ssd makes them somewhat of a one trick pony, a lot of speed, but cost prohibitive. Just at our company we are looking at a seperate data domain just for storage. When you start tallking the need for several terabytes, ssd just isn't going to be considered. Its the future, but until they drastically reduce in cost and increase in capacity, their adoption will be minimal at best. I don't think speed right now trumps capacity in the enterprise world. Reply
  • virtualgeek - Friday, March 27, 2009 - link

    They are well past being "untried" in the enterprise - and we are now shipping 400GB SLC drives. Reply
  • gwolfman - Friday, March 20, 2009 - link

    [quote]Our Adaptec controller is clearly not taking full advantage of the SLC SSD's bandwidth: we only see a very small improvement going from four to eight disks. We assume that this is a SATA related issue, as eight SAS disks have no trouble reaching almost 1GB/s. This is the first sign of a RAID controller bottleneck.[/quote]
    I have an Adaptec 3805 (previous generation as to the one you used) that I used to test 4 of OCZ's first SSDs when they came out and I noticed this same issue as well. I went through a lengthy support ticket cycle and got little help and no answer to the explanation. I was left thinking it was the firmware as 2 SAS drives had a higher throughput than the 4 SSDs.
  • supremelaw - Friday, March 20, 2009 - link

    For the sake of scientific inquiry primarily, but not exclusively,
    another experimental "permutation" I would also like to see is
    a comparison of:

    (1) 1 x8 hardware RAID controller in a PCI-E 2.0 x16 slot

    (2) 1 x8 hardware RAID controller in a PCI-E 1.0 x16 slot

    (3) 2 x4 hardware RAID controllers in a PCI-E 2.0 x16 slot

    (4) 2 x4 hardware RAID controllers in a PCI-E 1.0 x16 slot

    (5) 2 x4 hardware RAID controllers in a PCI-E 2.0 x4 slot

    (6) 2 x4 hardware RAID controllers in a PCI-E 1.0 x4 slot

    (7) 4 x1 hardware RAID controllers in a PCI-E 2.0 x1 slot

    (8) 4 x1 hardware RAID controllers in a PCI-E 1.0 x1 slot

    * if x1 hardware RAID controllers are not available,
    then substitute x1 software RAID controllers instead,
    to complete the experimental matrix.

    If the controllers are confirmed to be the bottlenecks
    for certain benchmarks, the presence of multiple I/O
    processors -- all other things being more or less equal --
    should tell us that IOPs generally need more horsepower,
    particularly when solid-state storage is being tested.

    Another limitation to face is that x1 PCI-E RAID controllers
    may not work in multiples installed in the same motherboard
    e.g. see Highpoint's product here:">

    Now, add different motherboards to the experimental matrix
    above, because different chipsets are known to allocate
    fewer PCI-E lanes even though slots have mechanically more lanes
    e.g. only x4 lanes actually assigned to an x16 PCI-E slot.


  • supremelaw - Friday, March 20, 2009 - link

    More complete experimental matrix (see shorter matrix above):

    (1) 1 x8 hardware RAID controller in a PCI-E 2.0 x16 slot

    (2) 1 x8 hardware RAID controller in a PCI-E 1.0 x16 slot

    (3) 2 x4 hardware RAID controllers in a PCI-E 2.0 x16 slot

    (4) 2 x4 hardware RAID controllers in a PCI-E 1.0 x16 slot

    (5) 1 x8 hardware RAID controllers in a PCI-E 2.0 x8 slot

    (6) 1 x8 hardware RAID controllers in a PCI-E 1.0 x8 slot

    (7) 2 x4 hardware RAID controllers in a PCI-E 2.0 x8 slot

    (8) 2 x4 hardware RAID controllers in a PCI-E 1.0 x8 slot

    (9) 2 x4 hardware RAID controllers in a PCI-E 2.0 x4 slot

    (10) 2 x4 hardware RAID controllers in a PCI-E 1.0 x4 slot

    (11) 4 x1 hardware RAID controllers in a PCI-E 2.0 x1 slot

    (12) 4 x1 hardware RAID controllers in a PCI-E 1.0 x1 slot

  • JohanAnandtech - Friday, March 20, 2009 - link

    If you happen to find out more, please mail me ( I think this might have to do something with the inner working of SATA being sligthly less efficient than the SAS protocol and the Adaptec firmware, but we have to do a bit of research on this.

    Thanks for sharing.
  • supremelaw - Friday, March 20, 2009 - link

    Found this searching for "SAS is bi-directional" --">


    > Remember, one of the main differences between (true) SAS and SATA is
    > that in the interface, SAS is bi-directional, while SATA can only
    > send Data in one direction at a time... More of a bottle-neck than
    > often thought.

    Thus, even though an x1 PCI-E lane has a theoretical
    bandwidth of 250MB/sec IN EACH DIRECTION, the SATA
    protocol may be preventing simultanous transmission
    in both directions.

    I hope this helps.

  • supremelaw - Friday, March 20, 2009 - link


    [begin quoteS]

    Among the offerings are a 16GB RIMM module and an 8GB RDIMM module. The company introduced 50nm 2Gb DDR3 for PC applications last September. The 16GB modules operate at 1066Mbps and allow for a total memory density of 192GB in a dual socket server.


    In late January 2009, Samsung announced an even higher density DRAM chip at 4Gb that needs 1.35 volts to operate and will be used in 16GB RDIMM modules as well as other applications for desktop and notebook computers in the future. The higher density 4Gb chips can run at 1.6Gbps with the same power requirements as the 2Gb version running at 1066Mbps.

    [end quote]
  • supremelaw - Friday, March 20, 2009 - link">

    The ACARD ANS-9010 can be populated with 8 x 2GB DDR2 DIMMS;
    and, higher density SDRAM DIMMS should be forthcoming from
    companies like Samsung in the coming months (not coming years).

    4GB DDR2 DIMMs are currently available from G.SKILL, Kingston
    et al. e.g.:">

    I would like to see a head-to-head comparison of Intel's SSDs
    with various permutations of ACARD's ANS-9010, particularly
    now that DDR2 is so cheap.

    Lifetime warranties anybody? No degradation in performance either,
    after high-volume continuous WRITEs.

    Random access anybody? You have heard of the Core i7's triple-
    channel memory subsystems, yes? starting at 25,000 MB/second!!

    Yes, I fully realize that SDRAM is volatile, and flash SSDs are not:
    there are solutions for that problem, some of which are cheap
    and practical e.g. dedicate an AT-style PSU and battery backup
    unit to power otherwise volatile DDR2 DIMMs.

    Heck, if the IT community cannot guarantee continuous
    AC input power, where are we after all these years?

    Another alternative is to bulk up on the server's memory
    subsystem, e.g. use 16GB DIMMs from MetaRAM, and implement
    a smart database cache using SuperCache from SuperSpeed LLC:">

    Samsung has recently announced larger density SDRAM chips,
    not only for servers but also for workstations and desktops.
    I predict that these higher density modules will also
    show up in laptop SO-DIMMs, before too long. There are a
    lot of laptop computers in the world presently!

    After investigating the potential of ramdisks for myself
    and the entire industry, I do feel it is time that their
    potential be taken much more seriously and NOT confined
    to the "bleeding edge" where us "wild enthusiasts"
    tend to spend a lot of our time.

    And, sadly, when I attempted to share some original ideas and
    drawings with Anand himself, AT THE VERY SAME TIME
    an attempt was made to defame me on the Internet --
    by blaming me for some hacker who had penetrated
    the homepage of our website. Because I don't know
    JAVASCRIPT, it took me a few days to isolate that
    problem, but it was fixed as soon as it was identified.

    Now, Anand has returned those drawings, and we're
    back where we started before we approached him
    with our ideas.

    I conclude that some anonymous person(s) did NOT
    want Anand seeing what we had to share with him.

    Your comments are most appreciated!

    Paul A. Mitchell, B.A., M.S.
    dba MRFS (Memory-Resident File Systems)
  • masterbm - Friday, March 20, 2009 - link

    Still looks like their is bottleneck in raid controller in 66% read and 33% write test. The one ssd two ssd comperseion Reply
  • mcnabney - Friday, March 20, 2009 - link

    The article does seem to skew at the end. The modeled database had to fit in a limited size for the SSD solution to really shine.

    So the database size that works well with SSD is one that requires more space than can fit into RAM and less than 500GB when the controllers top-out. That is a pretty tight margin and potentially can run into capacity issues as databases grow.

    Also, in the closing cost chart there is no cost per GB per year line. The conclusion indicates that you can save a couple thousand with a 500GB eight SSD solution versus a twenty SAS solution, but how money eight SSD servers will you need to buy to equal the capacity and functionality of one SAS server.

    I would have liked to see the performance/cost of a server that could hold the same size database in RAM.
  • JohanAnandtech - Friday, March 20, 2009 - link

    "So the database size that works well with SSD is one that requires more space than can fit into RAM and less than 500GB when the controllers top-out. That is a pretty tight margin and potentially can run into capacity issues as databases grow. "

    That is good and pretty astute observation, but a few remarks. First of all, EMC and others use Xeons as storage processor. Nothing is going to stop the industry from using more powerful storage processors for the SLC drives than the popular IOP348.

    Secondly, 100 GB SLC (and more) drives are only a few months away.

    Thirdly, AFAIK keeping the database in RAM does not mean that a transactional database does not have to write to the storage system, if you want a fully ACID compliant database.

  • mikeblas - Friday, March 20, 2009 - link

    I can't seem to find anything about the hardware where the tests were run. We're told the database is 23 gigs, but we don't know the working set size of the data that the benchmark touches. Is the server running this test already caching a lot in memory? How much?

    A system that can hold the data in memory is great--when the data is in memory. When the cache is cold, then it's not doing you any good and you still need to rely on IOPS to get you through.
  • JohanAnandtech - Friday, March 20, 2009 - link">

    Did you miss the info about the bufferpool being 1 GB large? I can try to get you the exact hitrate, but I can tell you AFAIK the test goes over the complete database (23 GB thus), so the hitrate of the 1 GB database will be pretty low.

    Also look at the scaling of the SAS drives: from 1 SAS to 8, we get 4 times better performance. That would not be possible if the database was running from RAM.
  • mikeblas - Friday, March 20, 2009 - link

    I read it, but it doesn't tell the whole story. The scalability you see in the results would be attainable if a large cache were not being filled, or if the cache was being dumped within the test.

    The graphs show transactions per second, which tell us how well the database is performing. The charts don't show us how well the drives are performing; to understand that, we'd need to see IOPS, among some other information. I'd expect that the maximum IOPS in the test is not nearly the maximum IOPS of the drive array. Since the test wasn't run for any sustained time, the weaknesses of the drive are not being exposed.
  • JohanAnandtech - Friday, March 20, 2009 - link

    Ok, good feedback. On monday, I'll check the exact length of the test (it is several minutes), and we do have a follow up which shows you quite a bit of what is happening. Disk queue lengths are quite high, so that should tell you also that it is not just a "fill cache", "dump cache" thing. We did see this behavior with small databases though (2 GB etc.)

    Just give me a bit of time, After the Nehalem review, I'll explore these kind of things. We also noticed that the deadline scheduler is the best for the SAS disks, but noop for the SSD. I'll explore the more in depth stuff in a later article.
  • JarredWalton - Friday, March 20, 2009 - link

    As stated in several areas in the article, SSDs clearly don't make sense if you need a large database - which is why Google as an example wouldn't consider them at present. The current size requirements are quite reasonable (less than 512GB if you use 8x64GB SSDs... and of course you can bump that up to 16x64GB if necessary, though you'd have to run more SATA cards, use software RAID, or some other changes), but there will certainly be businesses that need more storage.

    However, keep in mind that some companies will buy larger SCSI/SAS drives and then partition/format them as a smaller drive in order to improve performance - i.e. if you only use 32GB on a 300GB disk, the seek times will improve because you only need to seek over a smaller portion of the platters - and transfer rates will improve because all of the data will be on the outer sectors.

    At one point I worked for a corporation that purchased the then top-of-the-line 15k 32GB disks and they were all formatted to 8GB. We had a heavily taxed database - hundreds of concurrent users working in a warehouse, plus corporate accesses and backups - but the total size of the database was small enough that we didn't need tons of storage. Interestingly enough, we ran the database on an EMC box that probably cost close to $1 million (using IBM POWER5 or POWER6 servers that added another couple million I think). I wonder if they have looked at switching to SSDs instead of SCSI/SAS now? Probably not - they'll just do whatever IBM tells them they should do!
  • virtualgeek - Friday, March 20, 2009 - link

    The key is that as a general statement:

    Lowest capital cost optimization happen at the application tier (query optimization),

    Next lowest capital cost optimization happens at the database tier (proper up-front DB design)

    Next lowest capital cost optimization happens by adding RAM to the database tier.

    Next lowest capital cost optimization happens by adding database server horsepower or storage performance (depending on what is the gate to performance).

    But - in various cases, sometimes the last option is the only one (for lots of reasons - legacy app, database structure is extremely difficult to change, etc).
  • icrf - Friday, March 20, 2009 - link

    "Capital cost" is a bit of a misnomer. It tends to be far cheaper to buy some memory than pay a DBA to tune queries. Reply
  • Dudler - Friday, March 20, 2009 - link


    Thx for the article,

    1. But your cost comparison is only valid until you have to buy new disks. Would be interesting to have an assumption on how long the SSD's would survive i a server environment, since they would be written to a lot. Even with all the wear levelling algorithms their lifespan may be short. Would the SAS disks live longer?
    2. How did you test the write speed/latency? In the great article by Anand it was pretty clear that the performance of SSD's started to degrade when they got full and they wrote many small blocks. Did you simulate a "used" drive or only "fresh secure erase" it beforehand?
  • JarredWalton - Friday, March 20, 2009 - link

    Intel states a higher MTBF for their SSD than for any conventional HDD. We have no real way of determining how long they will truly last, but Intel suggests they will last for five years running 24/7 doing constant 67% read/33% write operations. Check back in five years and we'll let you know how they're doing. ;-)

    As for the degraded performance testing, remember that Anand's article showed the X25 was the least prone to degraded performance, and the X25-E is designed to be even better than the X25-M. Anand">didn't test new performance of the X25-E, but even in the degraded state it was still faster than any other SSD with the exception of the X25-M in its "new" state. Given the nature of the testing, I would assume that the drives are at least partially (and more likely fully) degraded in Johan's benchmarks - I can't imagine he spent the time to secure erase all the SSDs before each set of benchmarks, but I could be wrong.
  • IntelUser2000 - Sunday, March 22, 2009 - link

    Unfortunately, I'd have to agree with mikeblas even for X25-E.

    Here look at this site:">

    The point in that article is that the SSD can outperform similarly priced RAID 10 setup by 5x, but due to data loss risks they have to turn off the write cache which degrades the X25-Es performance to 1/5x and ends up in the same level.
  • JohanAnandtech - Monday, March 23, 2009 - link

    We will look into this, but the problem does not seem to occur with the ext3 filesystem. Could this be something XFS specific?

    It is suggested here that this is the case:">

    We'll investigate the issue.

    That is good feedback, but just like reviewers should be cautious to jump to conclusions, readers should too. The blog test was quick as the blogger admits, this does not mean that the X25-E is not reliable. Also notice that he notes that he should also test with a BBU enabled RAID-card, something we did.

  • IntelUser2000 - Monday, March 23, 2009 - link

    Thanks for the reply. While I'm not an expert on the settings for servers, they do have a point.

    Intel's IOP results are done with write cache on. Several webpages and posts have said they turn off write caches to prevent data loss.

    And I have X25-M using Windows XP. On a simple Crystaldiskmark, my random write 4K result goes from 35MB/s to 4MB/s when the write cache setting is disabled on disk settings. Of course I have NO reason to turn write caching off.

    It's something not to be ignored. If this is true the X25-E is really only suitable for extreme enthusiast PC than servers as Intel claims.
  • JarredWalton - Monday, March 23, 2009 - link

    No enterprise setup I've ever encountered runs without a hefty backup power system, so I don't think it's as critical a problem as some suggest. If power fails and the UPS doesn't kick in to help out, you're in a world of hurt regardless.

    That said, there was one time where one of the facility operations team did some "Emergency Power Off" testing at my old job. Unfortunately, they didn't put the system into test mode correctly, so when they hit the switch to "test" the system, the whole building went dark!

    LOL. You never saw the poop hit the fan so hard! My boss was getting reamed for letting anyone other than the computer people into the datacenter; meanwhile we're trying to get everything back up and running, and the GM of the warehouse is wondering how this all happened.

    That last one is easy to answer: your senior FacOps guy somehow forgot to put the warehouse into test mode. That's hard to do since it's listed as the second or third step in the test procedures. Not surprisingly, he was in a hurry because the testing was supposed to be done two weeks earlier and somehow slipped through the cracks. Needless to say, FacOps no longer got to hold the key that would allow them to "test" that particular item.

    Bottom line, though, is that in almost four years of working at that job, that was the only time where we lost power to the datacenter unexpectedly. Since we were running an EMC box for storage, we also would have had their "super capacitor" to allow the cache to be flushed to flash, resulting in no data loss.
  • JarredWalton - Monday, March 23, 2009 - link

    Judging by the content and the comments on that blog, it seems as though there are some software specific settings that may be causing problems (i.e. specifically barriers/nobarriers is mentioned several times). The end result appears to be that a single X25-E is capable of matching a RAID 10 disk in performance, but at a higher cost? I don't know, as I don't see any specific hardware listed and he only mentions receiving one drive.

    He did figure out a workaround to the issue by modifying the parameters, but concludes the performance isn't worth the cost. However, if a single X25-E matches a RAID 10 setup, what happens on RAID 10 X25-E? Also, what about power? Johan shows TCO with power favoring SSD by a significant margin. Even if performance is equal, if power is greatly in favor of SSD you might want to go that route.

    Of course, the bigger question is whether this is software or hardware related. As one user puts it:

    "Sorry, but you realize that nobarrier is the likely cause for the data loss, right? With barriers XFS fsync (but not necessarily ext3 fsync) would wait for a write barrier on the log commit, and thus also for the data. O_SYNC might be different though. Basically you specified the “please go unsafe but faster” option and then complain that it is actually unsafe. I would recommend to do the power off test without nobarriers but write cache on. -Andi"

    The response: "I wrote that in post. With barrier and write cache we have 50 writes / s, which I consider “not just slower” but disaster which I would not put on production system."

    Sounds to me like software/configuration problems more than anything. If he can get 1200 write/s with safe function, but only 50 with what should be a usable setting, something is wrong. More details on what hardware/software was used would be nice, naturally.
  • virtualgeek - Friday, March 20, 2009 - link

    We've done a lot of work on this at EMC - and STEC drives have been put through the same wringers (and meet the same specs) we demand of enterprise FC and SATA drives from the 3rd parties. They are SLC-based, like the Intel X25-E, but have some of the difference noted at the tail end of the article. We continue to work with Intel (and others) - it's only goodness to have more vendors in that space.

    We've deployed a LOT of Enterprise Flash Disk or EFD (what we call this "sub category" of Enterprise-class solid state disk)

    BUT - I can say with authority that the MTBF issue is being used at this point as FUD. MLC is a different story - which is why it's absolutely an option in the consumer space, today it's not for these applications.
  • mikeblas - Friday, March 20, 2009 - link

    You would assume, sure. But we don't know, either way.

    The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go.

    Next, Databases run 24x7. The testing done here started, ended, and that was that. At a site that depends on their database system, they probably rewrite all the data in the database very often--between once per day and once per week, say.

    If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. That takes too much effort for a free review site, I suppose--when we did that exercise in house, it wasn't even a couple of hours before then Intel drives were bricked, unusable.
  • JohanAnandtech - Friday, March 20, 2009 - link

    "The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go. "

    You would be amazed how many people are running DB systems with RAID-5. And we performed the database system with RAID-10.

    "If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. "

    It is a good suggestion, but that doesn't mean our testing is not meaningful. Considering how many times we performed the benchmarks, it is clear that we were not using a "virgin SLC". The numbers you are seeing are the measurements we took after a few days of testing. (Especially the RAID-5 ones)

    We'll try out a very long test, but these SLC drives are quite a bit more robust than a typical MLC drive, also when it comes to performance degradation. Let us not blown this out of proportion, Anand measured a 10% performance degradation on an MLC drive. That is hardly an issue, when you get 13 times more performance than one of the best SAS drives.
  • mikeblas - Friday, March 20, 2009 - link

    You would be amazed how many people are running DB systems with RAID-5.

    Probably not. People do dumb stuff all the time; it doesn't surprise me anymore. I mean, something subtle like using RAID-5 instead of RAID-10 on a database server is an easy mistake to make. I can be surprised at deeper dumbness, though.

    Anyway, I don't see how the number of people making a mistake justifies the same mistake in a review.

    > And we performed the database system with RAID-10.

    I don't see any RAID 10 results in the SQL Server SQLIO test results.

    > but that doesn't mean our testing is not meaningful.

    Of course it does. The test doesn't stress the biggest concern with the drives in enterprise applications; it also indicates that the tester doesn't understand how the drives work.

    On write-leveling SSDs, write requests take a variable amount of time; they take longer the more writes the drive has seen recently. You're worried about the drive being "virgin" or not. That's not the issue; far past the loss of "virginity", past the time degredation is first noticed, the Intel drives take many hundreds of milliseconds to perform write operations. They take so long they might even fall off the bus, and might be flagged by the RAID controller as failed. The problems show themselves after being exposed to high IOPS rates. The problem it that not only the response time increases, the latency increases, too. Eventually, the latency overcomes the ability to keep up with the incoming rate, and the device effectively fails.

    Anyone can promptly demonstrate this to yourself with longer, more aggressive tests. (We did that, and we also spoke with the Intel support engineers. Other sites document similar problems. We were using Intels' MLC drives.)

    Point is, though, that the article is about enterprise applications, but fails to adequately simulate a large class of enterprise applications. Running a database benchmark for just a few minutes doesn't adequately stress the drives. This makes it meaningless; it's not telling readers anything more than the consumer-level reviews have, since it's not stressing the drive in the way enterprise applications would use it.

    Spinning drives eat sustained high IOPS rates up, particularly enterprise class drives, which are engineered for such application. SSDs fail, or exhibit erratic performance that makes the predictability an reliability guarantees required of enterprise applications impossible to deliver. They're not 10% slower as you claim; they're 100% slower, or they're DNF -- or however you want to represent divide-by-zero slower.
  • virtualgeek - Friday, March 20, 2009 - link

    Gang - you can't do a test on an MLC drive, and compare it to an SLC test - it's totally different.

    RAID-5 configs for EFDs in CLARiiON and DMX arrays are not uncommon at all, and through much testing - did absolutely fine.

    The traditional RAID penalty logic of rotating media and parity RAID write impact is not entirely applicable here either.

    There are LOADS of detailed performance tests at different workloads here:">

    I posted links to docs with the big database workloads and exchange.

    Literally - we've been doing this for more than a year (shipping STEC-based EFDs into enteprises). The comments are partially right - but not all write-levelling algorithms are the same, and not all SSDs have the same internal architecture.
  • mikeblas - Friday, March 20, 2009 - link

    Britney Spears albums aren't uncommon, either. But that doesn't mean they are any good and "did fine" is far from "optimal". Reply
  • RagingDragon - Thursday, March 26, 2009 - link

    For enterprise systems "optimal" is sufficient performance at the lowest possible price, not highest possible performance at any cost.

    For a given amount of storage, RAID5 requires fewer disks (and thus costs less) than RAID10, so if RAID5 can provide sufficient performance it is more optimal than RAID10. For workloads where RAID10 provides adequate peformance, but RAID5 does not, obviously RAID10 is more optimal. And for workloads where RAID10 cannot deliver the required performance there are in memory databases
  • RagingDragon - Thursday, March 26, 2009 - link

    Also, the high end systems virtualgeek refers to have far more RAM cache and processing power than any RAID card, so experience with RAID cards may not be applicable to them. Reply
  • JarredWalton - Friday, March 20, 2009 - link

    You make a lot of claims, but as far as I can tell you have not tested with the enterprise X25-E, which costs three times as much as the X25-M. Intel wouldn't release something for the enterprise at that price without at least trying to make it handle the situation properly.

    As for the testing, Johan *benchmarks* a test run that lasts several minutes. That doesn't mean that the test was only run for several minutes, but rather that the final benchmark score is from a test run of a couple minutes (120 seconds to be exact). Knowing how Johan tests, retests, changes tests, etc. often dozens of times in the course of writing an article, this is definitely not a "meaningless" test result. Rather, it is a look at the best we can do with simulating a real world environment without actually doing everything in the real world (because the real world doesn't usually lend itself to repeatable benchmarks).

    Do the drives have long-term reliability issues? Can performance drop off substantially in certain situations? Anand suggests that it can happen with the X25-M and pretty much every other SSD out there, but I don't know if he actually tested the same thing with X25-E. It sounds as though it's a possibility it will occur after certain event sequences (involving high I/O levels), but you'd really have to test with the enterprise class drives to know for sure. Hopefully Anand and Johan can work on that a bit to verify whether or not that problem exists; if it does, I'm fairly confident that Intel will update the firmware to fix the issue - otherwise, as you say the SSDs would be useless in the enterprise.
  • mikeblas - Friday, March 20, 2009 - link

    I'm not sure how you can tell what I have or haven't tested. We tested both the X25-E and X25-M models. The issue isn't with MLC or SLC technology, but with the load-leveling algorithm the drive uses. As far as I can tell, the algorithm is fundamentally the same across both models.

    While they've got a great track record, Intel isn't beyond shipping products with defects. I fully agree that it's remarkable that they bill the product as enterprise-ready, when it so clearly isn't.

    Running a test for 120 seconds tells us something, but at this point it doesn't tell us much more than what the other reviews have told us. We get a little more information here because of the RAID configuration, but we don't get enough details about the test methodology to have something repeatable or verifiable.

    It doesn't, however, meet its claim of telling us about real-world performance. In my testing, it was a trivial matter to brick the drives after testing with Intel's own IOMeter, as well as with SQLIO, using suites we had put together using rates and ratios we had observed in our production systems.

    Perhaps, since then, Intel has mitigated the problem with better firmware. At the time--only a few months ago--they were unable to provide an upgrade or a remedy when we talked with them. But I think that, if you do a test that tries to mimic real-world use both in volume and duration, you should see similar results. Even if you don't, we will learn a lot more about the drives than this review tells us.
  • JohanAnandtech - Sunday, March 22, 2009 - link

    So basically a review is meaningless because it doesn't redo the Quality Assurance and validation tests that are done by the manufacturer? You can not honestly expect me to perform these, and they would be "meaningless" anyway as I would probably need to perform this on a batch of a few 100s of drives to get some statistical meaningful information.

    We have been testing these drives for weeks now, deleting 23 GB databases on 32 GB drives and creating them again, performing tests for hours and hours. We are currently running an 8 hour long tests on MS Exchange and others for our Nehalem EP Review. I have not seen the behavior you describe. EMC does not see the behavior you describe. So you were probably unfortunate to get a few bad drives... That is what the 3 year guarantee is for.

  • The0ne - Monday, March 23, 2009 - link

    While I don't disagree with what Mikeblas is trying to say this really isn't the place for Anandtech to be the same test Manufacturers are doing or should be doing. Asking for Anandtech to prove or come up with MTBF’s and “real” life server tests aren’t valid. One very reason for this is time.

    For example, do we expect the banks that installs and uses our POS products to repeat the same type of testing we have? Of course not; it’s redundant, time consuming and there be no benefit for anyone. You either trust the Manufacturer’s specifications or you don’t. Seriously, you don’t want to run something for so long that eventually the product itself starts to exhibit failures and defects that that test has now introduced. The result you get wouldn’t be accurate.

    And while you can trust the specs that doesn’t mean there aren’t defects in the products. There will always be defects. And with that just because you see certain issues with your test doesn’t necessarily mean they’re not secluded to just the drives you have. Heck, it might be the way your tests are set up, ran or analyze.

    I believe this discussion is really pointless. If you, Mikeblas, aren’t happy there aren’t enough time involved in the testing it’s your choice. But please don’t complain on something that is, to me, subjective on how you go about doing your test.
  • JarredWalton - Friday, March 20, 2009 - link

    I figured you didn't use the X25-E the moment you said, "We were using Intels' MLC drives." AFAIK, X25-E is SLC and not MLC, so you tested the non-enterprise version. Does the problem exist on the X25-E or not? Well, I can't answer that, and hopefully Johan will investigate the issue further. Then again, as virtualgeek states below, EMC has done extensive testing with SLC drives and hasn't seen the issues you're discussing.

    Anyway, this particular article was in the works for weeks if not months, so I seriously dispute any claims that it is meaningless and doesn't tell people anything new. Johan tried to do the tests in as real-world of a setup as possible. However, it's like a 60 second FRAPS run for testing a game's performance: that tells us more or less the same as if we did a 600 second or 6 hour FRAPS test, in that the average frame rates should scale similarly. What it doesn't tell is performance in every possible situation - i.e. level X may be more CPU or GPU intensive than level Y - but it gives us a repeatable baseline measurement.

    At any rate, your comments are noted. Feel free to email Johan to discuss your testing and his findings further. If there's a problem that he missed, I'm sure he'll be interested in looking into the matter further. After all, he does this stuff (i.e. server configuration and testing) for work outside AnandTech, and if he's making a bad decision he would certainly want to know.
  • mikeblas - Saturday, March 21, 2009 - link

    Sorry; that was a typo. We used both models, and reproduced the problem I describe both models.

    We were in direct contact with Intel, who provided the same utility they've given to other people to try and reset the drives. For some drives, it worked; one of the drives were unrecoverable. It's entirely possible we had a bad batch of six or eight drives, or that the firmware has been redesigned since. I suppose it's also possible that some incompatibility between the controllers we used and the drives exists, too. But the controllers were the ones we'd use in some production applications, and given the issues, we're just not going to move to SSDs for our online systems.

    It's great that you stand behind your authors, but just because an article was in the works for a long time doesn't mean it's good or worthwhile. I just can't find the logic there.

    Similarly, I'm not sure why you'd conclude that testing a game is the same as testing enterprise-level hardware. I guess that shows the fundamental flaw in both your argument about spending time on the article, and the article itself. A bad starting assumption makes a bad result, no mater how much time you spend on actually doing the work. Since the bad assumption has tainted the process, the end result will be compromised no matter how long you spend on it.

    In this particular case, it should be foreseeable that the results from a short test run is not scalable to long-term, high-scale usage. At the very least, it should be obvious that there's the strong potential for difference. This is because of the presence of write-leveling technology. What are the differences between spinning media and a the SSD under test? Seek time, transfer rate, volatility, power draw, physical shock resistance, and the write-leveling algorithm. One way to stress the write-leveling algorithm is to try to overwhelm it; throw a high amount of IOPS at it for a sustained duration. What happens? Throw writes at it, only, and see what happens.

    In a production system OLTP system--what you purport to be simulating--the data drive is write-mostly. A well-tuned OLTP database in steady-state operation is going to do the majority of its reads from cache. (Otherwise, it's not going to be fast enough.) All of its writes are going to disk, however, because they have to be transactionally committed. When one of these drives sees a 100% workload with a very high throughput, how does it behave compared to a spinning drive? Is the problem with tests that didn't reveal problems simply that they didn't try testing the drive in this application? That they didn't adequately simulate this application?

    I'm not sure email would reveal anything I haven't already posted here.
  • JarredWalton - Saturday, March 21, 2009 - link

    My comment re: gaming tests is that you don't need to test the game for hours to know how it behaves. Similarly, you don't need to collect results from a one week test run to know how the database setup behaves. You certainly should run for more than 120 seconds to "break in" the drives. Once that is done, you don't need to run the tests for hours at a time. A snapshot of performance for 120 seconds should look like a snapshot for two hours or two days... as long as there isn't some problem that causes performance to eventually collapse, which is what you assert.

    You seem to have the opinion that all he did was slap in the drives, run a 120 second test, and call it quits. I know from past experience that he put together a test system, designed tests to stress the system, ran some of the tests for hours (or days or weeks) to verify things were running as expected, and then after all that he collected data to show a snapshot of how the setup performed. At that point, whether you collect data for a short time or a long time doesn't really matter, because the drives should be well seasoned (i.e. degraded) as far as they'll go.

    It is entirely possible that in the several months since you performed your tests, things have changed. It is also possible that some other hardware variable is to blame for the performance issues you experienced. A detailed list of the hardware you ran would certainly help in looking into things from our end - what server, SATA controller, drives, etc. did you utilize? What firmware version on the drives? What was the specific setup you used for SQLIO or IOMeter? Those are all useful pieces of information, as opposed to the following:

    "Anyone can promptly demonstrate this to yourself with longer, more aggressive tests. (We did that, and we also spoke with the Intel support engineers. Other sites document similar problems. We were using Intels' MLC drives.)"

    Links and specifics for sites that show the data you're talking about would be great, since a few quick Google searches turned up very little of use. I see non-RAID tests, Linux tests, tests using NVIDIA RAID controllers (ugh...), RAID 0 running desktop applications, etc. With further searching, I'm sure there's an article somewhere showing RAID 0, 5, and 10 performance with server hardware, but since you already know where it is a quick post with direct links would be a lot more useful.

Log in

Don't have an account? Sign up now