Original Link: https://www.anandtech.com/show/2739



Introduction

The introduction of "enterprise SATA" disks a few years ago was an excellent solution for all the companies craving storage space. With capacities up to 1TB per drive, "RAID Enabled" SATA disks offer huge amounts of magnetic disk space with decent reliability. Magnetic disks have been a very cheap solution if you want storage space, with prices at 20 cents per gigabyte (and falling). Performance is terrible, however, with seek times and latency adding a few milliseconds over faster and more expensive alternatives (i.e. SCSI). That's 10,000 to 100,000 times slower than the speed of CPUs and RAM, where access times are expressed in nanoseconds. Even worse is the fact that seek times and latency have been improving at an incredibly slow pace. According to several studies[1], the time (seek time + latency) to get one block of random information has only improved by a factor 2.5 over the last decade while bandwidth has been improved a tenfold. Meanwhile, CPUs have become over 60 times faster! (As a quick point of reference, ten years ago state-of-the-art servers were running 450MHz Xeon processors with up to 2MB of L2 cache.)

The result of this lopsided performance improvements is a serious performance bottleneck, especially for OLTP databases and mail servers that are accessed randomly. A complex combination of application caches, RAID controller caches, hard disk caches, and RAID setups can partially hide the terrible performance shortcomings of the current hard disks, but note the word "partially". Caches will not always contain the right data and it has taken a lot of research and software engineering to develop database management systems that produce many independent parallel I/O threads. Without a good I/O thread system you would not even be able to use RAID setups to increase disk performance.

Let's face it: buying, installing, and powering lots of fast spinning disks just to meet the Monday morning spike on mail servers and transactional applications is frequently a waste of disk space, power, and thus money. In addition, it's not just hard disk space and power that make this a rather expensive and inefficient way to solve the ancient disk seek time problem: you need a UPS (Uninterruptible Power Source) to protect all those disks from power failures, plus expensive software and man-hours to manage all those disks. The Intel X25-E, an SLC SSD drive, holds the potential to run OLTP applications at decent speeds much simpler. Through a deep analysis of its low level and real world performance, we try to find out when this new generation of SSDs will make sense. Prepare for an interesting if complex story….



Disk strategies

With magnetic disks, there are two strategies to get good OLTP or mail server performance. The "traditional way" is to combine a number of 15000RPM SAS "spindles", all working in parallel. The more "rebellious way" or "Google way" is to use a vast number of cheaper SATA drives. This last strategy is based on the observation that although SATA drives come with higher access times, you can buy more SATA spindles than SAS spindles for the same price. While Google opted for desktop drives, we worked with what we had in the lab: 16 enterprise 1TB Western Digital drives. Since these are one of the fastest 7200RPM drives that can be found on the market, it should give you a good idea what an array with lots of SATA drives can do compared to one with fewer fast spinning SAS drives.

SSDs add a new strategy: if space is not your primary problem, you can trade in storage space for huge amounts of random I/O operations per second, requiring fewer but far more expensive drives to obtain the same performance. SSDs offer superb read access times but slightly less impressive write access times.

As Anand has pointed out, a cheap SSD controller can really wreak havoc on writing performance, especially in a server environment where many requests are issued in parallel. EMC solved this with their high-end Enterprise Flash Disks, produced by STEC, which can store up to 400GB and come with a controller with excellent SRAM caches and a super capacitor. The super capacitor enables the controller to empty the relatively large DRAM caches and write the date to the flash storage in the event of a sudden power failure.

Intel went for the midrange market, and gave its controller less cache (16MB). The controller is still intelligent and powerful enough to crush the competition with the cheap JMicron JMF602-controllers. We check out the SLC version, the Intel X25-E SLC 32GB.

The newest Intel Solid State Disks with their access times of 0.075 ms and 0.15W power consumption could change the storage market for OLTP databases. However, the SLC drives have a few disadvantages compared to the best SAS drives out there:

  • No dual ports
  • The price per GB is 13 times higher

You can see the summary in the table below.

Enterprise Drive Pricing
Drive Interface Capacity Pricing Price per GB
Intel X25-E SLC SATA 32GB $415-$470 $13
Intel X25-E SLC SATA 64GB $795-$900 $12
Seagate Cheetah 15000RPM SAS 300GB $270-$300 $0.90
Western Digital 1000FYPS SATA 1000GB $190-$200 $0.19

If you really need capacity, SATA or even SAS drives are probably the best choice. On the other hand, if you need spindles to get more I/O per second, it will be interesting to see how a number of SAS or SATA drives compares to the SLC drives. The most striking advantages of the Intel X25-E SLC drive are extremely low random access times, almost no power consumption at idle, low power consumption at full load, and high reliability.

Enterprise Drive Specifications
Drive Read Access Time Write Access Time Idle Power Full Power MTBF
(hours)
Intel X25-E SLC 32GB 0.075 ms 0.085 ms 0.06 W 2.4 W 2 million
Intel X25-E SLC 64GB 0.075 ms 0.085 ms 0.06 W 2.6 W 2 million
Seagate Cheetah 15000RPM 5.5 ms (*) 6 ms 14.3 W 17 W 1.4 million
Western Digital 1000FYPS 13 ms (**) n/a 4 W 7.4 W 1.2 million

(*) 5.5 ms = 3.5 ms seek time + 2 ms latency (rotation)
(**) 13 ms = 8.9 ms seek time + 4.1 ms latency (rotation)

Reliability testing is outside the scope of this document, but if only half of the Intel claims are true, the x25-E SLC drives will outlive the vast majority of magnetic disks. First is the 2 million MTBF specification, which is far better than the best SAS disks on the market (1.6 million hour MTBF). Intel also guarantees that if the X25-E performs 7000 8KB random access per second, consisting of 66% reads and 33% writes, the drive will continue to do so for 5 years! That is 2.9TB of written data per day, and it can sustain this for about 1800 days. That is simply breathtaking as no drive has to sustain that kind of IOPS 24 hours per day for such a long period.



Configuration and Benchmarking Setup

First, a word of thanks. The help of several people was crucial in making this review happen:

  • My colleague Tijl Deneut of the Sizing Server Lab, who spend countless hours together with me in our labs. Sizing Servers is an academic lab of Howest (University Ghent, Belgium).
  • Roel De Frene of "Tripple S" (Server Storage Solutions), which lent us a lot of interesting hardware: the Supermicro SC846TQ-R900B, 16 WD enterprise drives, an Areca 1680 controller and more. S3S is a European company that focuses on servers and storage.
  • Deborah Paquin of Strategic Communications, inc. and Nick Knupffer of Intel US.

As mentioned, S3S sent us the Supermicro SC846TQ-R900B, which you can turn into a massive storage server. The server features a 900W (1+1) power supply to power a dual Xeon ("Harpertown") motherboard and up to 24 3.5" hot-swappable drive bays.


We used two different controllers to avoid letting the controller color this review too much. When you are using up to eight SLC SSDs in RAID 0, where each disk can push up to 250 MB/s through the RAID card, it is clear that the HBA can make a difference. Our two controllers are:

Adaptec 5805 SATA-II/SAS HBA
Firmware 5.2-0 (16501) 2009-02-18
1200MHz IOP348 (Dual-core)
512MB 533MHz/ECC (Write-Back)

ARECA 1680 SATA/SAS HBA
Firmware v1.46 2009-1-6
1200MHz IOP348 (Dual-core)
512MB 533MHz/ECC (Write-Back)

Both controllers use the same I/O CPU and more or less the same cache configuration, but the firmware will still make a difference as you will see further. Below you can see the inside of our Storage server, featuring:

  • 1x quad-core Harpertown E5420 2.5GHz and X5470 3.3GHz
  • 4x2GB 667MHz FB-DIMM (the photo shows it equipped with 8x2GB)
  • Supermicro X7DBN mainboard (Intel 5000P "Blackford" Chipset)
  • Windows 2003 SP2

The small 2.5" SLC drives are plugged in the large 3.5" cages:


We used the following disks:

  • Intel SSD X25-E SLC SSDSA2SH032G1GN 32GB
  • WDC WD1000FYPS-01ZKB0 1TB (SATA)
  • Seagate Cheetah 15000RPM 300GB ST3300655SS (SAS)

Next is the software setup.



IOMeter/SQLIO Software Setup

Before we start with the "closer to the real world" OLTP tests, we decided to measure the disk component part of database performance with IOMeter and SQLIO. For these tests, we started with RAID 0 as we didn't want our RAID controller to be the bottleneck. Some benchmark scenarios showed that our hopes were in vain, as sometimes the RAID controller can still be the bottleneck as you will see shortly. We selected a 64KB stripe size as we assumed the intended use was for a database application that has to perform both sequential and random reads/writes.

As we test with SQLIO, Microsoft's I/O stress tool for MS SQL server 2005, it is important to know that if the SQL server database accesses the disks in random fashion this happens in blocks of 8KB. Sequential accesses (read-ahead) can use I/O sizes from 16KB up to 1024KB, so we used a stripe size of 64KB as a decent compromise. All tests are done with Disk Queue Length (DQL) at 2 per drive (16 for an eight drive array). DQL indicates the number of outstanding disk requests as well as requests currently being serviced for a particular disk. A DQL that averages 2 per drive or more means that the disk system is the bottleneck.

Next we aligned (more info here) our testing partition with a 64KB offset with the diskpart tool. There has been some discussion on the ideal alignment (512KB, one block?), but even Intel is not sure yet. So we chose the relatively safe 64KB boundary (which aligns with 4KB pages).

To keep the number of benchmarks reasonable, we use the following:

  • RAID 0, RAID 10, or RAID 5; stripe size 64KB (always)
  • Adaptive Read Ahead and Write-Back always configured on the RAID controller
  • NTFS, 64KB cluster size
  • Access block size 8KB (Random) and 64KB (Sequential)

This should give you a good idea of how we tested. The controllers available only support eight drives. As we wanted to test the "use more cheaper SATA disks" philosophy, we use two controllers combined with Microsoft's software RAID to test 16 drives. Software RAID is at least as fast as hardware RAID as we use the mighty 2.5GHz quad-core Xeon instead of the small dual-core Intel IOP 348 controller.

To make the graphs easier to read, we made all SATA disks measurements orange, all SAS disks measurements green, and all SATA-SSD benchmarks are blue.



I/O Meter Performance

IOMeter is an open source (originally developed by Intel) tool that can measure I/O performance in almost any way you can imagine. You can test random or sequential accesses (or a combination of the two), read or write operations (or a combination of the two), in blocks from a few KB to several MB. Whatever the goal, IOMeter can generate a workload and measure how fast the I/O system performs.

First, we evaluate the best scenario a magnetic disk can dream of: purely sequential access to a 20GB file. We are forced to use a relatively small file as our SLC SSD drives are only 32GB. Again, that's the best scenario imaginable for our magnetic disks, as we use only the outer tracks that have the most sectors and thus the highest sustained transfer rates.

IOMeter Sequential Read

The Intel SLC SSD delivers more than it promised: we measure 266 MB/s instead of the promised 250 MB/s. Still, purely sequential loads do not make the expensive and small SSD disks attractive: it takes only two SAS disks or four SATA disks to match one SLC SSD. As the SAS disks are 10 times larger and the SATA drives 30 times, it is unlikely that we'll see a video streaming fileserver using SSDs any time soon.

Our Adaptec controller is clearly not taking full advantage of the SLC SSD's bandwidth: we only see a very small improvement going from four to eight disks. We assume that this is a SATA related issue, as eight SAS disks have no trouble reaching almost 1GB/s. This is the first sign of a RAID controller bottleneck. However, you can hardly blame Adaptec for not focusing on reaching the highest transfer rates with RAID 0: it is a very rare scenario in a business environment. Few people use a completely unsafe eight drive RAID 0 set and it is only now that there are disks capable of transferring 250 MB/s and more.

The 16 SATA disks reach the highest transfer rate with two of our Adaptec controllers. To investigate the impact of the RAID controller a bit further, we attached four of our SLC drives to one Adaptec controller and four on another one. First is a picture of the setup, and then the results.


IOMeter Sequential Read, 2 versus 1 RAID controllers

The results are quite amazing: performance improves more than 60% with four SSDs on two controllers compared to eight X25-E SSDs on one controller. We end up with a RAID system that is capable of transferring 1.2GB/s.



I/O Meter Performance (Cont'd)

Next, we test with sequential read and writes. Some storage processors (For example Netapp) write sequentially even if the original writes are random, so it is interesting to see how the disks cope with a mixed read/write scenario.

IOMeter Sequential 66% Read and 33% write

Where four SAS disks could read almost as fast as eight SATA disks, once we mix read and writes the SAS disks are slightly slower than the SATA disks. That is not very surprising: both the SAS disks and SATA disks use four platters. That means that the WD 1TB disk has a much higher data density, which negates the higher RPM of the SAS drive. Since the accesses are still sequential, areal density wins out.

As we have stated before, the SSD are especially attractive for mail and OLTP database servers. The real test consists mostly of random writes and reads. Typically, there are about twice as many reads as writes, so we used a 66% random read and 33% random write scenario to mimic OLTP database performance.

IOMeter Random 66% Read and 33% write

The superiority of the Intel SSD drives is simply astonishing. Even eight of the fastest SAS drives are not enough to keep up with one (!) SLC SSD drive. The high seek time of our Western Digital (8.9 ms) also kills performance: 16 drives are slower than four 15000RPM SAS drives. The eight drive score of the Western Digital setup gives us an idea of how many SATA drives you need. It will take about 26-30 SATA drives to get the performance of eight SAS drives… and it will probably take about 40 SATA drives to beat one SLC SSD disk! The more your applications read and/or write randomly, the worse the "get a lot of cheap SATA spindles" plan becomes.
 
Did you notice something weird in the results? Good, we are glad you are paying attention :-).We'll explain this once we get to the RAID-5 tests. No? Get a good cup of coffee and look again at the benchmark chart...
 


SQLIO Performance

SQLIO is a tool provided by Microsoft that can determine the I/O capacity of a given disk subsystem. It should simulate somewhat how MS SQL 2000/2005 accesses the disk subsystem. The following tests are all done with RAID 0. We ran tests for as long as 1000 seconds but there was very little difference from our standard 120s testing. Thus, all tests run for 120 seconds.

SQLIO Sequential Read (64KB)

There are no real surprises here: SQLIO confirms our findings with IOMeter.

SQLIO Sequential Write (64KB)

Intel promised a 170 MB/s transfer rate when writing sequentially, and our measurements show that the drive is capable of delivering even more. Again, we see a RAID controller limitation pop up when we use eight drives. Let's look at the random read numbers.

SQLIO Random Read (8KB)

While our IOMeter tests showed that the eight SAS drives could come close to a single SLC SSD drive when performing a mix of random reads and writes, this is no longer the case when we perform only random reads. This is clearly the best-case scenario for the SLC drive and it completely crushes any magnetic disk competition.

SQLIO Random Write (8KB)

Random writes are slightly slower than random reads. Still, this kind of performance is nothing short of amazing as this used to be a weak point of SSDs.



RAID 5 in Action

RAID 0 is good way to see how adding more disks scales up your writing and reading performance. However, it is rarely if ever used for any serious application. What happens if we use RAID 5?

RAID 5 IOMeter Sequential Read

We see a picture very similar to what we saw with our RAID 0 numbers. The SAS protocol is more efficient than the SATA interface. Four SLC SSDs already come close to the highest bandwidth that our Adaptec HBA is capable of extracting out of SATA disks. Add four more and you bump into a 700 MB/s bottleneck. At first, our eight drive results look weird, as they are more than twice as fast as the four drives results. In the case of four drives, our controller will write three striped blocks of data and one parity block on our RAID array. With eight drives, we are striping seven blocks and we only have to write one parity block. So we can expect a maximum scaling of 2.3 (7/3), which explains the "the better than doubling" in performance.

RAID 5 IOMeter Sequential 66% Read and 33% write

Once we interleave some writing with reading, something remarkable happens: eight SLC SSDS are (slightly) slower than four are. We have no real explanation for this, other than that this might have something to do with the choices Adaptec made when writing the firmware. We tried out the Areca 1680 controller, which uses the same controller (Intel Dual-core IOP348 1.2GHz) but with different firmware.


The Areca 1680 controller is almost 20% slower with four SLC SSD drives in RAID 0, but scales well from four to eight drives (+78%). In comparison, the Adaptec controller is only 19% faster with an additional four drives. In RAID 5 the difference is even more dramatic. Adding four extra drives (for a total of eight) leads to a performance decrease on the Adaptec card, while there is a decent 47% scale up in performance in the same situation on the Areca 1680. Thus, we conclude that is likely that the Adaptec firmware is to blame. We used both the 1.44 (August 2008) and 1.46 (February 2009) firmware, and although the latest firmware gave a small boost, we made the same observations.



More RAID 5

The Random 66% read + 33% write is still the most interesting scenario, as it mimics an OLTP database.

RAID 5 IOMeter Random 66% Read and 33% write

Again, we get some weird results. We suspected that the cause of this phenomenon was different from the previous benchmark. With eight drives, for each write the RAID controller has to perform seven additional reads to calculate the parity block, which makes writes more costly than with four drives.
 
Update:  Made the wrong assumption here. A read of the parity block and one other drive is enough, you don't need 7 reads. So two reads followed by 2 writes are enough for an 8 drive RAID-5. Thanks to Tobias for pointing this out. So we are not completely sure where this performance decreases (4 to 8 drives) comes from. 
 
Once again, we tried out the Areca 1680 to provide us with more insight.

While it is true that the Adaptec shows some bad scaling in RAID 0, both controllers fail to show any decent performance increase with more than four disks in RAID 5. This only confirms our suspicion: the SLC drives are capable of reading and writing at blazing speed, so the RAID controller is swarmed by the massive amount of reads per second that need to happen for each write. As the amount of reads for one write that need to happen on the eight drive array is more than twice the amount on the four drive array, performance decreases. This is not an issue with the magnetic disks, as they are performing the same task at much lower speeds and thus require much less reads per second. We may assume that the RAID controller is holding back the SSD array, but proof is better than conjecture. We attached the four drives to each controller and made the Xeon at 3.33GHz the RAID controller (i.e. software RAID in Windows 2003):

RAID 5 with Eight X25-E SSDs
  One IOP348 controller and
Hardware RAID
Two Controllers and
Software RAID
Sequential Read 688 MB/s 1257 MB/s
Sequential Read/Write 221 MB/s 257 MB/s
Random Read/Write 169 MB/s 235 MB/s

Once we use the Xeon as the storage processor, the bottleneck is gone (or at least less narrow). Sequential read performance is twice as high as with four disks, and although the scaling of random read/write performance is nothing to write home about (19%), at least it is not negative anymore.

Should we blame Adaptec? No. The bandwidth and crunching power available on our Adaptec card far outstrips the demands of any array of eight magnetic disks (the maximum of drives supported on this controller). You might remember from our older storage articles that even the much more complex RAID 6 calculations were no problem for the modern storage CPUs. However, the superior performance of the Intel X25-E drives makes long forgotten bottlenecks rear their ugly head once again.



SQL Server and RAID 5

The RAID 5 IOMeter results were interesting and peculiar enough to warrant another testing round with SQLIO. First, we start with the least useful but "pedal to the metal" benchmark: sequential reads and writes.

RAID 5 SQLIO Sequential Read (64KB)

RAID 5 SQLIO Sequential Write (64KB)

Although hardly surprising, both results are another confirmation that the SLC drives are limited by the SATA interface and the RAID controller combination.

RAID 5 SQLIO Random Read (8KB)

Random reads perform as expected. The SLC SSD drives completely annihilate the magnetic disk competition.

RAID 5 SQLIO Random Write (8KB)

Random writes in RAID 5 are not only a complete disaster, they also confirm our theory that adding more X25-E SLC drives does not help as the storage processor cannot deliver the necessary RAID-5 processing that the SLC drives demand. The more drives you add, the worse the random writing performance becomes.



Testing in the Real World

As interesting as the SQLIO and IOMeter results are, those benchmarks focus solely on the storage component. In the real world, we care about the performance of our database, mail, or fileserver. The question is: how does this amazing I/O performance translate into performance we really care about like transactions or mails per second? We decided to find out with 64-bit MySQL 5.1.23 and SysBench on SUSE Linux SLES 10 SP2.

We utilize a 23GB database and carefully optimized the my.cnf configuration file. Our goal is to get an idea of performance for a database that cannot fit completely in main memory (i.e. cache); specifically, how will this setup react to the fast SLC SSDs. The Innodb buffer pool that contains data pages, adaptive hash indexes, insert buffers, and locks is set to 1GB. That is indeed rather small, as most servers contain 4GB to 32GB (or even more) and MySQL advises you to use up to 80% of your RAM for this buffer. Our test machine has 8GB of RAM, so we should have used 6GB or more for this buffer. However, we really wanted our database to be about 20 times larger than our buffer pool to simulate a large database that can only partially fit within the memory cache. With our 32GB SLC SSDs, using a 6.5GB buffer pool and a 130GB large database was not an option. Hence, the slightly artificial limitation of our buffer pool size.

We let SysBench perform all kinds of inserts and updates on this 23GB database. As we want to be fully ACID compliant our database is configured with:

innodb_flush_log_at_trx_commit = 1

After each transaction is committed, there is a "pwrite" first followed by an immediate flush to the disk. So the actual transaction time is influenced by the disk write latency even if the disk is nowhere near its limits. That is an extremely interesting case for SSDs to show their worth. We came up with four test configurations:

  1. "Classical SAS": We use six SAS disks for the data and two for the logs.
  2. "SAS data with SSD logging": perhaps we can accelerate our database by simply using very fast log disks. If this setup performs much better than "classical SAS", database administrators can boost the performance of their OLTP applications with a small investment in log SSDs. From an "investment" point of view, this would definitely be more interesting than having to replace all your disks with SSDs.
  3. "SSD 6+2": we replace all of our SAS disks with SSDs. We stay with two SSDs for the logs and six disks for the data.
  4. "SSD data with SAS logging": maybe we just have to replace our data disks, and we can keep our logging disks on SAS. This makes sense as logging is sequential.

Depending on how many random writes we have, RAID 5 or RAID 10 might be the best choice. We did a test with SysBench on six Intel X25-E SSDs. The logs are on a RAID 0 set of two SSDs to make sure they are not the bottleneck.

SysBench OLTP (MySQL 5.1.23) on 6 SSD data

As RAID 10 is about 23% faster than RAID 5, we placed the database on a RAID 10 LUN.

SysBench OLTP (MySQL 5.1.23)

Transactional logs are written in a sequential and synchronous manner. Since SAS disks are capable of delivering very respectable sequential data rates, it is not surprising that replacing the SAS "log disks" with SSDs does not boost performance at all. However, placing your database data files on an Intel X25-E is an excellent strategy. One X25-E is 66% faster than eight (!) 15000RPM SAS drives. That means if you don't need capacity, you can replace about 13 SAS disks with one SSD to get the same performance. You can keep the SAS disks as your log drives as they are a relatively cheap way to obtain good logging performance.



Energy Consumption

For our performance testing we used a 3.3GHz (120W TDP) Core 2 X5470; we admit to being a bit paranoid and we wanted the CPU to have plenty of processing power in reserve. In the case of purely storage related tasks, the CPU never achieved more than 15% CPU load with software RAID. Only SysBench was capable of pushing it up to 80%, but if we want to measure the power consumption of our SC-836TQ storage enclosure the SysBench value is unrealistic. In most cases, the server will run the database and perform the transactions. The storage enclosure attached to the server will perform only the I/O processing. Therefore we measure the power consumption of our storage enclosure using IOMeter, and we use a more sensible (80W) 2.5GHz Core 2 E5420 CPU. High performance enclosures (such as those of EMC) also use Xeons to perform the I/O processing.

The SC-836TQ uses one Ablecom PWS-902-1R 900W 75A power supply, one Xeon E5420 "Harpertown", 4x2GB 667MHz FB-DIMM, and one Adaptec 5085 RAID controller. "Full Load" means that the storage enclosure is performing the IOMeter Random Read/Write tests. The difference between sequential reads and random writes is only a few watts (with both SSD and SAS).

Drive Power Consumption
  Idle Full Load Idle
(Drives Only)
Full Load
(Drives Only)
Idle
(per Drive)
Full Load
(per Drive)
8 x SSD X25-E 257 275 6 24 0.75 3
4 x SSD X25-E 254 269 3 18 0.75 4.5
8 x SAS (Seagate) 383 404 132 153 16.5 19.125
4 x SAS (Seagate) 316 328 65 77 16.25 19.25
No disks at all
(One system disk)
251 n/a n/a n/a n/a n/a

While the Intel SLC X25-E consumes almost nothing in idle (0.06W), the reality is that the drive is attached to a RAID controller. That RAID controller consumes a little bit of energy to keep the connection to the idle drive alive. Still, the fact that eight SLC drives need 129W less power than eight SAS drives while offering 3 to 13 times better OLTP performance is a small revolution in storage land.

Let us do a small thought experiment. Assume that you have a 100GB database that is performance limited. Our SysBench benchmark showed that eight SLC X25-E drives perform at least three times (up to 13 times) better than ten 15000RPM SAS drives. You need at least 30 SAS drives to achieve the same performance as the SSDs. We'll ignore the fact that you would probably need another enclosure for the 30 drives and simply look at the costs associated with an eight SLC SSD setup versus a 30 drive 15000RPM SAS setup.

We base our KWh price on the US Department of Energy numbers which states that on average 1 KWh costs a little more than 10 cents[2]; the real price is probably a bit higher, but that's close enough. It is important to note that we add 50% more power to account for the costs of air conditioning for removing the heat that the disks generate. We assume that the drives are working eight hours under full load and 16 under light load.

TCO Comparison
  X25-E SAS 15000RPM Comment
Power per drive 1.5 17.375 16 hours idle, 8 hours full load
years 3 3  
KWh per drive (3 years) 38.88 450.36 360 days, 24 hours
Number of drives 8 30 Based on SysBench performance measurements
Total KWh for disks 311.04 13510.8  
Cooling (50%) 155.52 6755.4 to remove heat from array
.
Total KWh in datacenter 466.56 20266.2 disks power + cooling
Price per KW $0.10 $0.10  
Total Power costs (3 years) $46.656 $2026.62  
TCA $6400 $6000 Eight 64GB SLC drives at $800
Thirty 15000RPM SAS drives at $200
.
Savings $1579.964    

If you use six drives for the RAID 10 data LUN (two drives for the logs), you need the 64GB SLC drives. That is why we use those in this calculation. Note that our calculation is somewhat biased in favor of the SAS drives: the SLC drives probably run at idle much more than the SAS drives, and it is very likely that even 30 SAS drives won't be able to keep with our eight SSDs. Even with the bias, the conclusion is crystal clear: if you are not space limited but you are performance limited, SSDs are definitely a better deal and will save you quite a bit of money as they lower the TCO.



Conclusion

The system administrators with high-end, ultra mission critical applications will still look down their nose at the Intel X25-E Extreme SLC drive: it is not dual ported (SATA interface) and it does not have a "super capacitor" to allow the controller to write its 16MB cache to the flash array in the event of a sudden power outage. For those people, the Enterprise Flash Drives (EFD) of EMC make sense with capacities up to 400GB but prices ten times as high as the Intel X25-E SLC drive.

For the rest of us, probably 90% of the market, the Intel X25-E is nothing short of amazing: it offers at least 3 to 13 times better OLTP performance at less than a tenth of the power consumption of the classical SAS drives. We frankly see no reason any more to buy SAS or FC drives for performance critical OLTP databases unless the database sizes are really huge. From the moment you are using lots of spindles and most of your hard disks are empty, Intel's SLC SSDs make a lot more sense.

However, be aware that these ultra fast storage devices cause bottlenecks higher in the storage hierarchy. The current storage processors seem to have to trouble scaling well from four to eight drives. We have witnessed negative scaling only in some extreme cases, 100% random writes in RAID 5 for example. It is unlikely that you will witness this kind of behavior in the real world. Still, the trend is clear: scaling will be poor if you attach 16 or more SLC SSDs on products like the Adaptec 51645, 51645, and especially the 52445. Those RAID controllers allow you to attach up to 24 drives, but the available storage processor is the same as our Adaptec 5805 (IOP348 at 1.2GHz). We think it is best to attach no more than eight SLC drives per IOP348, especially if you are planning to use the more processor intensive RAID levels like RAID 5 and 6. Intel and others had better come up with faster storage processors soon, because these fast SLC drives make the limits of the current generation of storage processors painfully clear.

Our testing also shows that choosing the "cheaper but more SATA spindles" strategy only makes sense for applications that perform mostly sequential accesses. Once random access comes into play, you need two to three times more SATA drives - and there are limits to how far you can improve performance by adding spindles. Finally, to get the best performance out of your transactional applications, RAID 10 is still king, especially with the Intel X25-E.

References

[1] Dave Fellinger, "Architecting Storage for Petascale Clusters". http://www.ccs.ornl.gov/workshops/FallCreek07/presentations/fellinger.pdf

[2] US Department of Energy. Average retail price of electricity to ultimate customers by end-use sector, by state. http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_a.html

Log in

Don't have an account? Sign up now