A Look at Enterprise Performance of Intel SSDs

Name: A Look at Enterprise Performance of Intel SSDs
Item: A Look at Enterprise Performance of Intel SSDs
Author: Anand Lal Shimpi

by Anand Lal Shimpi on February 8, 2012 6:36 PM EST

55 Comments | Add A Comment

55 Comments

Measuring How Long Your Intel SSD Will Last

Earlier in this review I talked about Intel's SMART attributes that allow you to accurately measure write amplification for a given workload. If you fire up Intel's SSD Toolbox or any tool that allows you to monitor SMART attributes you'll notice a few fields of interest. I mentioned these back in our 710 review, but the most important for our investigations here are E2 (226) and E4 (228):

The raw value of attribute E2, when divided by 1024, gives you an accurate report of the amount of wear on your NAND since the last timer reset. In this case we're looking at an Intel X25-M G2 (the earliest drive to support E2 reporting) whose E2 value is at 9755. Dividing that by 1024 gives us 9.526% (the field is only accurate to three decimal points).

I mentioned that this data is only accurate since the last timer reset, that's where the value stored in E4 comes into play. By executing a SMART EXECUTE OFFLINE IMMEDIATE subcommand 40h to the drive you'll reset the timer in E4 and the data stored in E2 and E3. The data in E2/E3 will then reflect the wear incurred since you reset the timer, giving you a great way of measuring write amplification for a specific workload.

How do you reset the E4 timer? I've always used smartmontools to do it. Download the appropriate binary from sourceforge and execute the following command:

smartctl -t vendor,0x40 /dev/hdX where X is the drive whose counter you're trying to reset (e.g. hda, hdb, hdc, etc...).

Doing so will reset the E4 counter to 65535. The counter then begins at 0 and will count up in minutes. After the first 60 minutes you'll get valid data in E2/E3. While E2 gives you an indication of how much wear your workload puts on the NAND, E3 gives you the percentage of IO operations that are reads since you reset E4. E3 is particularly useful for determining how write heavy your workload is at a quick glance. Remember, it's the process of programing/erasing a NAND cell that is most destructive - read heavy workloads are generally fine on consumer grade drives.

I reset the workload timer (E4) on all of the Intel SSDs that supported it and ran a loop of our MS SQL Weekly Maintenance benchmark that resulted in 320GB of writes to the drive. I then measured wear on the NAND (E2) and used that to calculate how many TBs we could write to these drives, using this workload, before we'd theoretically wear out their NAND. The results are below:

Drive Lifespan - MS SQL Weekly Maintenance Benchmark

There are a few interesting takeaways from this data. For starters, Intel's SSD 710 uses high endurance MLC (aka eMLC, MLC-HET) which is good for a significant increase in p/e cycles. With tens of thousands of p/e cycles per NAND cell, the Intel SSD 710 offers nearly an order of magnitude better endurance than the Intel SSD 320. Part of this endurance advantage is delivered through an incredible amount of spare area. Remember that although the 710 featured here is a 200GB drive it actually has 320GB of NAND on board. If you set aside a similar amount of spare area on the 320 you'd get a measurable increase in endurance. We actually see an example of that if you look at the gains the 300GB SSD 320 offers over the 160GB drive. Both drives are subjected to the same sized workload (just under 60GB), but the 300GB drive has much more unused area to use for block recycling. The result is an 85% increase in estimated drive lifespan for an 87.5% increase in drive capacity.

What does this tell us for how long these drives would last? If all they were doing was running this workload once a week, even the 160GB SSD 320 would be just fine. In reality our SQL server does far more than this but even then we'd likely be ok with a consumer drive. Note that the 800TBs of writes for the 160GB 320 is well above the 15TB Intel rates the drive for. The difference here is that Intel is calculating lifespan based on 4KB random writes with a very high write amplification. If we work backwards from these numbers for the MLC drives you'll end up with around 4000 - 5000 p/e cycles. In reality, even 25nm Intel NAND lasts longer than what it's rated for so what you're seeing is that this workload has a very low write amplification on these drives thanks to its access pattern and small size relative to the capacity of these drives.

Every workload is going to be different but what may have been a brutal consumer of IOs in the past may still be right at home on consumer SSDs in your server.

Enterprise Storage Bench - Microsoft SQL WeeklyMaintenance Final Words

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

55 Comments

View All Comments

ckryan - Thursday, February 9, 2012 - link
Very true. And again, many 60/64GB could do 1PB with an entirely sequential workload. Under such conditions, most non-SF drives typically experience a WA of 1.10 to 1.20.

Reality has a way of biting you in the ass, so in reality, be conservative and reasonable about how long a drive will last.

No one will throw a parade if a drive lasts 5 years, but if it only lasts 3 you're gonna hear about it.
ckryan - Thursday, February 9, 2012 - link
The 40GB 320 failed with almost 700TB, not 400. Remember though, the workload is mostly sequential. That particular 320 40GB also suffered a failure of what may have been an entire die last year, and just recently passed on to the SSD afterlife.

So that's pretty reassuring. The X25-V is right around 700TB now, and it's still chugging along.
eva2000 - Thursday, February 9, 2012 - link
Would be interesting to see how consumer drives in the tests and life expectancy if they are configured with >40% over provisioning.
vectorm12 - Thursday, February 9, 2012 - link
Thanks for the insight into this subject Anand.

However I am curios as to why controller manufacturers haven't come up with a controller to manage cell-wear across multiple drives without raid.

Basically throw more drives at a problem. As you would be to some extent be mirroring most of your P/E cycles in a traditional raid I feel there should be room for an extra layer of management. For instance having a traditional raid 1 between two drive and keeping another one or two as "hot spare" for when cells start to go bad.

After all if you deploy SSD in raid you're likely to be subjecting them to a similar if not identical number of P/E cycles. This would force you to proactively switch out drives(naturally most would anyway) in order to guarantee you won't be subjected to a massive, collective failure of drives risking loss of data.

Proactive measures are the correct way of dealing with this issue but in all honesty I love "set and forget" systems more than anything else. If a drive has exhausted it's NAND I'd much rather get an email from a controller telling me to replace the drive and that it's already handled the emergency by allocating data to a spare drive.

Also I'm still seeing 320 8MB-bugg despite running the latest firmware in a couple of servers hosting low access-rate files for some strange reason. It seems as though they behave fine as long as the are constantly stressed but leave them idle for too long and things start to go wrong. Have you guys observed anything like this behavior?
Kristian Vättö - Thursday, February 9, 2012 - link
I've read some reports of the 8MB bug persisting even after the FW update. Your experience sounds similar - problems start to occur when you power off the SSD (i.e. power cycling). A guy I know actually bought the 80GB model just to try this out but unfortunately he couldn't make it repeatable.
vectorm12 - Monday, February 13, 2012 - link
Unfortunately I'm in the same boat. 320s keep failing left and right(up to three now) all running latest firmware. However the issues aren't directly related to powercycles as these drives run 24/7 without any offtime.

I've made sure drive-spinndown is deactivated as well as all other powermanagement features I could think of. I've also move the RAIDs from Adaptec controllers to the integrated SAS-controllers and still had a third drive fail.

I've actually switched out the remaining 320s for Samsung 830s now to see how they behave in this configuration.
DukeN - Thursday, February 9, 2012 - link
One with RAID'd drives, whether on a DAS or a high end SAN?

Would love to see how 12 SSDs in (for argument's sake) an MSA1000 compare to 12 15K SAS drives.

TIA
ggathagan - Thursday, February 9, 2012 - link
Compare in what respect?
FunBunny2 - Thursday, February 9, 2012 - link
Anand:

I've been thinking about the case where using SSD, which has calculable (sort of, as this piece describes) lifespan, as swap (linux context). Have you done (and I can't find) or considering doing, such an experiment? From a multi-user, server perspective, the bang for the buck might be very high.
varunkrish - Thursday, February 9, 2012 - link
I have recently seen 2 SSDs fail without warning and they are completely not detected currently. While I love the performance gains from an SSD , lower noise and cooler operation, i feel you have to be more careful while storing critical data on a SSD as recovery is next to impossible.

I would love to see an article which addresses SSDs from this angle.

A Look at Enterprise Performance of Intel SSDs

Measuring How Long Your Intel SSD Will Last

Post Your Comment

55 Comments

View All Comments

ckryan - Thursday, February 9, 2012 - link

ckryan - Thursday, February 9, 2012 - link

eva2000 - Thursday, February 9, 2012 - link

vectorm12 - Thursday, February 9, 2012 - link

Kristian Vättö - Thursday, February 9, 2012 - link

vectorm12 - Monday, February 13, 2012 - link

DukeN - Thursday, February 9, 2012 - link

ggathagan - Thursday, February 9, 2012 - link

FunBunny2 - Thursday, February 9, 2012 - link

varunkrish - Thursday, February 9, 2012 - link

Log in

Don't have an account? Sign up now