AnandTech Home IT Portal Home Increase Font Size Decrease Font Size Change Page Size
SSD versus Enterprise SAS and SATA disks
SSD versus Enterprise SAS and SATA disks
Date: March 20th, 2009
Topic: IT Computing
Manufacturer: Intel
Author: Johan De Gelas
Buy the Kingston SNE125-S2/32GB 32GB SSDNow
Blank
 Newegg $379.99
 Dell Home $549.99
30.00 rebate
 PC Connection $429.95
 
 

Introduction

The introduction of "enterprise SATA" disks a few years ago was an excellent solution for all the companies craving storage space. With capacities up to 1TB per drive, "RAID Enabled" SATA disks offer huge amounts of magnetic disk space with decent reliability. Magnetic disks have been a very cheap solution if you want storage space, with prices at 20 cents per gigabyte (and falling). Performance is terrible, however, with seek times and latency adding a few milliseconds over faster and more expensive alternatives (i.e. SCSI). That's 10,000 to 100,000 times slower than the speed of CPUs and RAM, where access times are expressed in nanoseconds. Even worse is the fact that seek times and latency have been improving at an incredibly slow pace. According to several studies[1], the time (seek time + latency) to get one block of random information has only improved by a factor 2.5 over the last decade while bandwidth has been improved a tenfold. Meanwhile, CPUs have become over 60 times faster! (As a quick point of reference, ten years ago state-of-the-art servers were running 450MHz Xeon processors with up to 2MB of L2 cache.)

The result of this lopsided performance improvements is a serious performance bottleneck, especially for OLTP databases and mail servers that are accessed randomly. A complex combination of application caches, RAID controller caches, hard disk caches, and RAID setups can partially hide the terrible performance shortcomings of the current hard disks, but note the word "partially". Caches will not always contain the right data and it has taken a lot of research and software engineering to develop database management systems that produce many independent parallel I/O threads. Without a good I/O thread system you would not even be able to use RAID setups to increase disk performance.

Let's face it: buying, installing, and powering lots of fast spinning disks just to meet the Monday morning spike on mail servers and transactional applications is frequently a waste of disk space, power, and thus money. In addition, it's not just hard disk space and power that make this a rather expensive and inefficient way to solve the ancient disk seek time problem: you need a UPS (Uninterruptible Power Source) to protect all those disks from power failures, plus expensive software and man-hours to manage all those disks. The Intel X25-E, an SLC SSD drive, holds the potential to run OLTP applications at decent speeds much simpler. Through a deep analysis of its low level and real world performance, we try to find out when this new generation of SSDs will make sense. Prepare for an interesting if complex story….

Disk strategies   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

66 Comments - Last by shady28, 5 days ago
Username:
Password:
Cost strategy by Dudler, 245 days ago
Hi,

Thx for the article,

1. But your cost comparison is only valid until you have to buy new disks. Would be interesting to have an assumption on how long the SSD's would survive i a server environment, since they would be written to a lot. Even with all the wear levelling algorithms their lifespan may be short. Would the SAS disks live longer?
2. How did you test the write speed/latency? In the great article by Anand it was pretty clear that the performance of SSD's started to degrade when they got full and they wrote many small blocks. Did you simulate a "used" drive or only "fresh secure erase" it beforehand?

Reply
RE: Cost strategy by JarredWalton, 245 days ago
Intel states a higher MTBF for their SSD than for any conventional HDD. We have no real way of determining how long they will truly last, but Intel suggests they will last for five years running 24/7 doing constant 67% read/33% write operations. Check back in five years and we'll let you know how they're doing. ;-)

As for the degraded performance testing, remember that Anand's article showed the X25 was the least prone to degraded performance, and the X25-E is designed to be even better than the X25-M. Anand didn't test new performance of the X25-E, but even in the degraded state it was still faster than any other SSD with the exception of the X25-M in its "new" state. Given the nature of the testing, I would assume that the drives are at least partially (and more likely fully) degraded in Johan's benchmarks - I can't imagine he spent the time to secure erase all the SSDs before each set of benchmarks, but I could be wrong.

Reply
RE: Cost strategy by mikeblas, 245 days ago
You would assume, sure. But we don't know, either way.

The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go.

Next, Databases run 24x7. The testing done here started, ended, and that was that. At a site that depends on their database system, they probably rewrite all the data in the database very often--between once per day and once per week, say.

If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. That takes too much effort for a free review site, I suppose--when we did that exercise in house, it wasn't even a couple of hours before then Intel drives were bricked, unusable.

Reply
RE: Cost strategy by JohanAnandtech, 245 days ago
"The testing done here is naieve. First, RAID5 is the wrong RAID. Because of the way it behaves (even with a good, hardeware card) it's not really a good idea for a performant database system. RAID10 is generally the way to go. "

You would be amazed how many people are running DB systems with RAID-5. And we performed the database system with RAID-10.

"If this test was intended to be meaningful, it would have run the test constantly, showing a graph of performance over time. "

It is a good suggestion, but that doesn't mean our testing is not meaningful. Considering how many times we performed the benchmarks, it is clear that we were not using a "virgin SLC". The numbers you are seeing are the measurements we took after a few days of testing. (Especially the RAID-5 ones)

We'll try out a very long test, but these SLC drives are quite a bit more robust than a typical MLC drive, also when it comes to performance degradation. Let us not blown this out of proportion, Anand measured a 10% performance degradation on an MLC drive. That is hardly an issue, when you get 13 times more performance than one of the best SAS drives.


Reply
RE: Cost strategy by mikeblas, 245 days ago
>
You would be amazed how many people are running DB systems with RAID-5.

Probably not. People do dumb stuff all the time; it doesn't surprise me anymore. I mean, something subtle like using RAID-5 instead of RAID-10 on a database server is an easy mistake to make. I can be surprised at deeper dumbness, though.

Anyway, I don't see how the number of people making a mistake justifies the same mistake in a review.

> And we performed the database system with RAID-10.

I don't see any RAID 10 results in the SQL Server SQLIO test results.

> but that doesn't mean our testing is not meaningful.

Of course it does. The test doesn't stress the biggest concern with the drives in enterprise applications; it also indicates that the tester doesn't understand how the drives work.

On write-leveling SSDs, write requests take a variable amount of time; they take longer the more writes the drive has seen recently. You're worried about the drive being "virgin" or not. That's not the issue; far past the loss of "virginity", past the time degredation is first noticed, the Intel drives take many hundreds of milliseconds to perform write operations. They take so long they might even fall off the bus, and might be flagged by the RAID controller as failed. The problems show themselves after being exposed to high IOPS rates. The problem it that not only the response time increases, the latency increases, too. Eventually, the latency overcomes the ability to keep up with the incoming rate, and the device effectively fails.

Anyone can promptly demonstrate this to yourself with longer, more aggressive tests. (We did that, and we also spoke with the Intel support engineers. Other sites document similar problems. We were using Intels' MLC drives.)

Point is, though, that the article is about enterprise applications, but fails to adequately simulate a large class of enterprise applications. Running a database benchmark for just a few minutes doesn't adequately stress the drives. This makes it meaningless; it's not telling readers anything more than the consumer-level reviews have, since it's not stressing the drive in the way enterprise applications would use it.

Spinning drives eat sustained high IOPS rates up, particularly enterprise class drives, which are engineered for such application. SSDs fail, or exhibit erratic performance that makes the predictability an reliability guarantees required of enterprise applications impossible to deliver. They're not 10% slower as you claim; they're 100% slower, or they're DNF -- or however you want to represent divide-by-zero slower.

Reply
RE: Cost strategy by JarredWalton, 245 days ago
You make a lot of claims, but as far as I can tell you have not tested with the enterprise X25-E, which costs three times as much as the X25-M. Intel wouldn't release something for the enterprise at that price without at least trying to make it handle the situation properly.

As for the testing, Johan *benchmarks* a test run that lasts several minutes. That doesn't mean that the test was only run for several minutes, but rather that the final benchmark score is from a test run of a couple minutes (120 seconds to be exact). Knowing how Johan tests, retests, changes tests, etc. often dozens of times in the course of writing an article, this is definitely not a "meaningless" test result. Rather, it is a look at the best we can do with simulating a real world environment without actually doing everything in the real world (because the real world doesn't usually lend itself to repeatable benchmarks).

Do the drives have long-term reliability issues? Can performance drop off substantially in certain situations? Anand suggests that it can happen with the X25-M and pretty much every other SSD out there, but I don't know if he actually tested the same thing with X25-E. It sounds as though it's a possibility it will occur after certain event sequences (involving high I/O levels), but you'd really have to test with the enterprise class drives to know for sure. Hopefully Anand and Johan can work on that a bit to verify whether or not that problem exists; if it does, I'm fairly confident that Intel will update the firmware to fix the issue - otherwise, as you say the SSDs would be useless in the enterprise.

Reply
RE: Cost strategy by mikeblas, 245 days ago
I'm not sure how you can tell what I have or haven't tested. We tested both the X25-E and X25-M models. The issue isn't with MLC or SLC technology, but with the load-leveling algorithm the drive uses. As far as I can tell, the algorithm is fundamentally the same across both models.

While they've got a great track record, Intel isn't beyond shipping products with defects. I fully agree that it's remarkable that they bill the product as enterprise-ready, when it so clearly isn't.

Running a test for 120 seconds tells us something, but at this point it doesn't tell us much more than what the other reviews have told us. We get a little more information here because of the RAID configuration, but we don't get enough details about the test methodology to have something repeatable or verifiable.

It doesn't, however, meet its claim of telling us about real-world performance. In my testing, it was a trivial matter to brick the drives after testing with Intel's own IOMeter, as well as with SQLIO, using suites we had put together using rates and ratios we had observed in our production systems.

Perhaps, since then, Intel has mitigated the problem with better firmware. At the time--only a few months ago--they were unable to provide an upgrade or a remedy when we talked with them. But I think that, if you do a test that tries to mimic real-world use both in volume and duration, you should see similar results. Even if you don't, we will learn a lot more about the drives than this review tells us.

Reply
RE: Cost strategy by JarredWalton, 245 days ago
I figured you didn't use the X25-E the moment you said, "We were using Intels' MLC drives." AFAIK, X25-E is SLC and not MLC, so you tested the non-enterprise version. Does the problem exist on the X25-E or not? Well, I can't answer that, and hopefully Johan will investigate the issue further. Then again, as virtualgeek states below, EMC has done extensive testing with SLC drives and hasn't seen the issues you're discussing.

Anyway, this particular article was in the works for weeks if not months, so I seriously dispute any claims that it is meaningless and doesn't tell people anything new. Johan tried to do the tests in as real-world of a setup as possible. However, it's like a 60 second FRAPS run for testing a game's performance: that tells us more or less the same as if we did a 600 second or 6 hour FRAPS test, in that the average frame rates should scale similarly. What it doesn't tell is performance in every possible situation - i.e. level X may be more CPU or GPU intensive than level Y - but it gives us a repeatable baseline measurement.

At any rate, your comments are noted. Feel free to email Johan to discuss your testing and his findings further. If there's a problem that he missed, I'm sure he'll be interested in looking into the matter further. After all, he does this stuff (i.e. server configuration and testing) for work outside AnandTech, and if he's making a bad decision he would certainly want to know.

Reply
RE: Cost strategy by mikeblas, 244 days ago
Sorry; that was a typo. We used both models, and reproduced the problem I describe both models.

We were in direct contact with Intel, who provided the same utility they've given to other people to try and reset the drives. For some drives, it worked; one of the drives were unrecoverable. It's entirely possible we had a bad batch of six or eight drives, or that the firmware has been redesigned since. I suppose it's also possible that some incompatibility between the controllers we used and the drives exists, too. But the controllers were the ones we'd use in some production applications, and given the issues, we're just not going to move to SSDs for our online systems.

It's great that you stand behind your authors, but just because an article was in the works for a long time doesn't mean it's good or worthwhile. I just can't find the logic there.

Similarly, I'm not sure why you'd conclude that testing a game is the same as testing enterprise-level hardware. I guess that shows the fundamental flaw in both your argument about spending time on the article, and the article itself. A bad starting assumption makes a bad result, no mater how much time you spend on actually doing the work. Since the bad assumption has tainted the process, the end result will be compromised no matter how long you spend on it.

In this particular case, it should be foreseeable that the results from a short test run is not scalable to long-term, high-scale usage. At the very least, it should be obvious that there's the strong potential for difference. This is because of the presence of write-leveling technology. What are the differences between spinning media and a the SSD under test? Seek time, transfer rate, volatility, power draw, physical shock resistance, and the write-leveling algorithm. One way to stress the write-leveling algorithm is to try to overwhelm it; throw a high amount of IOPS at it for a sustained duration. What happens? Throw writes at it, only, and see what happens.

In a production system OLTP system--what you purport to be simulating--the data drive is write-mostly. A well-tuned OLTP database in steady-state operation is going to do the majority of its reads from cache. (Otherwise, it's not going to be fast enough.) All of its writes are going to disk, however, because they have to be transactionally committed. When one of these drives sees a 100% workload with a very high throughput, how does it behave compared to a spinning drive? Is the problem with tests that didn't reveal problems simply that they didn't try testing the drive in this application? That they didn't adequately simulate this application?

I'm not sure email would reveal anything I haven't already posted here.


Reply
RE: Cost strategy by JarredWalton, 244 days ago
My comment re: gaming tests is that you don't need to test the game for hours to know how it behaves. Similarly, you don't need to collect results from a one week test run to know how the database setup behaves. You certainly should run for more than 120 seconds to "break in" the drives. Once that is done, you don't need to run the tests for hours at a time. A snapshot of performance for 120 seconds should look like a snapshot for two hours or two days... as long as there isn't some problem that causes performance to eventually collapse, which is what you assert.

You seem to have the opinion that all he did was slap in the drives, run a 120 second test, and call it quits. I know from past experience that he put together a test system, designed tests to stress the system, ran some of the tests for hours (or days or weeks) to verify things were running as expected, and then after all that he collected data to show a snapshot of how the setup performed. At that point, whether you collect data for a short time or a long time doesn't really matter, because the drives should be well seasoned (i.e. degraded) as far as they'll go.

It is entirely possible that in the several months since you performed your tests, things have changed. It is also possible that some other hardware variable is to blame for the performance issues you experienced. A detailed list of the hardware you ran would certainly help in looking into things from our end - what server, SATA controller, drives, etc. did you utilize? What firmware version on the drives? What was the specific setup you used for SQLIO or IOMeter? Those are all useful pieces of information, as opposed to the following:

"Anyone can promptly demonstrate this to yourself with longer, more aggressive tests. (We did that, and we also spoke with the Intel support engineers. Other sites document similar problems. We were using Intels' MLC drives.)"

Links and specifics for sites that show the data you're talking about would be great, since a few quick Google searches turned up very little of use. I see non-RAID tests, Linux tests, tests using NVIDIA RAID controllers (ugh...), RAID 0 running desktop applications, etc. With further searching, I'm sure there's an article somewhere showing RAID 0, 5, and 10 performance with server hardware, but since you already know where it is a quick post with direct links would be a lot more useful.

Reply
Comments Page 1 of 7

Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
DOWNLOAD vWire Today - FREE TRIAL
Take Control of Your Virtual Infrastructure. Manage VI Data & Prevent Problems.
Report Unlicensed Business Software Use
Earn Up to $1 Million by Reporting Unlicensed Software Use. Fill Out Our Form!
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!
Supermicro Barebone Servers
We Carry Everything Supermicro. Low Price, Top Service, FREE Shipping, and more.




Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank




pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information