Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
MySQL and the power of Intel SSDs
MySQL and the power of Intel SSDs
Date: November 9th, 2008
Author: Johan De Gelas
 
 

More and more of the database vendors are talking about the wonders that SSD can do for transactional (OLTP) databases. So I read Anand's latest SSD article with more than usual interest. If many of the cheaper MLC SSD's write small blocks 20 times slower than a decent harddrive, these SSD's are an absolute nightmare for OLTP databases. 
 
In our last Dunnington review, we showed our latest virtualization test which includes 4 concurrent OLTP ("Sysbench") tests on four separate MySQL 5.1.23 databases in four ESX virtual machines. We were fairly confident that our 6 disks RAID-0 for data and 1 separate disk for logging were capable of keeping up. After all, each disk is a 300 GB Cheetah Seagate at 15000 rpm, probably one of the fastest (mechanical) disks on this planet as it can deliver up to 400 I/O per second (and 125 MB/s sequential data rate).
 
But it is better to be safe than to be sorry. We did extensive monitoring with IOstat (on a "native" SLES 10 SP2) and found the following numbers on the disk that performs the logging transactions: 
  • queue length is about 0.22 (More than 2 indicates that the harddisk can not keep up)
  • typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms (we measure the average over the past 2 seconds) 
That reassured us that our transaction log disk was not a bottleneck. On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.
 
So we had to delve a little deeper. I first thought that as long as the harddisk is not the bottleneck, the number of transactions would be more or less the same with a faster disk. It turned out that I was somewhat wrong. 

In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
 innodb_flush_log_at_trx_commit = 1
 
So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.
 
We still have to investigate this further but this seems to go a bit against the typical sizing advice that is given for OLTP databases: make sure your log disks achieve a certain numbers of I/Os or put otherwise: "make sure you have enough spindles". That doesn't seem to paint the complete picture: as each write to disk action seems to be in the "critical speed path" of your transaction, each individual access latency seems to influence performance.
 
We monitored the same Sysbench benchmark on our Intel X25-M disk: 
  • Queue length is lower: 0.153 (but 0.2 was already very low)
  • typical access latency: an average 0.1 with very few spikes of 0.5 ms.
  • 1900 instead of 1400 tr/s
 So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency. We welcome your feedback!
 
 
 



33 Comments
Username:
Password:
that´s perfectly expectable by bollux78, 376 days ago
seems like software will never be optimized enough, in this case for the SSDs write latencies, and in other cases for the raw throughput of HDDs. We will always find new bottlenecks and weak points, as development will never cover all aspects possible.

Reply
RE: that´s perfectly expectable by JohanAnandtech, 375 days ago
The main objective of my Blogpost was not to point out a "weakness" of MySQL. More importantly was the fact that we were surprised to find that the mindset of "make sure you have enough spindles" for your transactions logs is not really accurate.

It is also important for our CPU tests: as CPUs get faster, it gets harder to keep the CPU load high enough to make a meaningful comparison. It is so horribly easy to screw up servertesting :-).

Reply
Is this MySQL spesific? by Devzero, 375 days ago
Is this a problem only on MySQL or would MSSQL server/Oracle/DB2/.. have the same problems?

Reply
SLC vs. MLC by davepermen, 375 days ago
I'd like to see how the Intel SLC SSD would perform.

Reply
RE: SLC vs. MLC by JohanAnandtech, 375 days ago
I am not able to answer that question accurately, but my first estimate would be that the difference would be rather small. I don't really see how SLC would improve the already spectacular low write latencies much more.

Reply
RE: SLC vs. MLC by JohanAnandtech, 374 days ago
I stand corrected: apparantely: SLC drives have up to 3 times lower write latency.

Reply
an anonymous CPU for now????? by CU, 375 days ago
Is this a new AMD cpu? I thought all the Core i7 stuff had already been released.

Reply
RE: an anonymous CPU for now????? by JarredWalton, 375 days ago
Probably not for the server market... I can't say for sure.

Reply
RE: an anonymous CPU for now????? by RagingDragon, 375 days ago
The desktop i7's have been released, but the not the server variants haven't been released yet. And of course, AMD's Shanghai is coming soon too, so we can't be completely certain which processor it is... ;)

Reply
surprising by rmlarsen, 375 days ago
Yes, the combination of low latency and non-volatility of SSDs is really a game changer for database applications.

I does trouble me, that you seem surprised to (re-)discover that low write latency to the WAL is absolutely critical to the performance of a transactional database. In fact, the proof is right in your own numbers:

quote: "* typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms

That reassured us that our transaction log disk was not a bottleneck"

No, that shows exactly that your disk latency is the limit: If these number are in the right ball park, the average latency is at least (0.9*0.23 + 0.1*7) ~= 0.9 ms, which limits the number of transactions per second to ~1100. Your performance is limited by the 10% of transactions that actually incur a disk related latency.

Anyhow, thanks for posting these measurements.


Reply
RE: surprising by JohanAnandtech, 375 days ago
I am trying to understand every step of the transaction, but I don't think it is that simple.

I can not imagine that a thread would actually wait for the disk to say "it is finished" ? After all, for the thread, the disk is just a block device it writes too, and there is no further communication?

The surprise was that the current advice of "get enough spindles" is not accurate. I would have expected that as long as the disks can keep up, the user thread would sent their transaction to the disk and forget about it. Kind of an async operation.

Secondly, I would have expected our BBU protected RAID card which is a really fast card(*) to take care of disk latency as long as the disks can keep up.

(*)http://it.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3414&p=7


Reply
RE: surprising by rmlarsen, 375 days ago
I think your assumption about asynchronous writes is correct for the data part (on some databases), but not for the transaction log data. To ensure ACID compliance the thread writing the transaction log HAS to wait for the disk controller to say "finished". I believe that is what the "flush" part of "innodb_flush_log_at_trx_commit" refers to.

As you and others point out, the controller cache can mostly hide the latency, but if the cache runs full your thread will have to wait for one or more rotational periods of the disk (4ms). This appears to happen ~10% of the time for you. As you increase the size of the controller cache and/or the rate at which the controller can flush it to disk (which depends on both transfer speed and latency of the disk(s) itself), the fraction of transactions having to wait drops, as you also see in your experiment. I wonder if allocating more than one spindle to the transaction log file (in your non-SSD setup) would actually help in this regard?

Reply
RE: surprising by JohanAnandtech, 374 days ago
"I wonder if allocating more than one spindle to the transaction log file (in your non-SSD setup) would actually help in this regard? "

Do you have a suggestion on how this could be done best? RAID-10 with a stripe size of 16 KB (= log writes)?

Reply
RE: surprising by rmlarsen, 374 days ago
Sorry, but I don't have concrete experience with this setup, so your guess is probably better than mine. Anything involving RAID-1 (and by extension RAID-10) might not be optimal since the controller has to wait for both disks in the pair to commit a block and therefore the average latency is higher. I think you'll just have to experiment...RAID-0, RAID-10, different block sizes etc.

RAID-0 is probably not acceptable from a reliability standpoint, but at least it should give you an idea whether anything can be gained by going this route. I look forward to hearing what you find.

Reply
Raid controller with battery write cache ? by Goto 10, 375 days ago
I would Imagine the raid controller would hide the write latency as long you have full write cache with battery backup running.
Since the price has dropped on "real" raid controllers and the cache is now 256/512 MB per card, going with with 2 controllers, 1 for data, and 1 for logging. This should give you the write speed you need, but it wouldn't really help with the low access speed of the reads if it's not cached though.
I think you would get alot more if you swapped to x25-M on the data part than on the logging part.

But I'm eagerly waiting for a full SSD / Nehalem database article, hopefully it will change the battlefield like AMD did with it's opterons when they where launched and got to fight the P4 based Xeons.


Reply
RE: Raid controller with battery write cache ? by JohanAnandtech, 375 days ago
"I think you would get alot more if you swapped to x25-M on the data part than on the logging part. "

I don't think so. Average latency on the data disk is much lower, as the RAID controller is accessing the disks a lot more sequential. We are seeing very low activity (a few KB/s) and 50 MB/s at the end of the test. I am pretty sure those 50 MB/s are very sequential writes to our RAID-0 of 6 disks. That RAID-0 set (capable of 500 MB/s probably *) would not have any problem with a sequential 50 MB/s. I don't see how the SSD could make a difference there. In this case more spindles do help.

(*) http://it.anandtech.com/IT/showdoc.aspx?i=3147&p=7 (gives a rough idea)

Reply
MLC wear by torsteinowich, 375 days ago
Are MLC flash devices even worth considering for database applications with their 10000 cycle erase limit? I'm no database expert, but 10000 sounds low for a database application.

Even SLCs might not live that long on databases with much write activity. For mostly read access applications SSDs are great, though.


Reply
RE: MLC wear by JohanAnandtech, 375 days ago
Well, the MLC was the only one that was available to me, you are probably right. I do think the SLC "wear levelling" is capable of making the drive survive longer than a mechanical disk, especially Intel quality wear levelling.

But I was quite surprised to hear how many IT people (*) were using these cheap MLC drives for databases and quite happy about it. They might be in for a bad surprise.

(*) Just "hear say", no real statistical data :-)

Reply
RE: MLC wear by RagingDragon, 375 days ago
Well, hopefully they backup their databases regularly. No matter what HD or SSD they use.

Reply
RE: MLC wear by vol7ron, 372 days ago
Intel's MLC has a wear-leveling controller that reduces wear of a cell and extends the hard disk life. Because the location of where the data is stored doesn't matter (unlike a hard disk, where closer is faster) the algorithm makes sure to use the whole drive. Additionally, it reduces the amount of times a cell is used, to minimize erases.

Notice first that it is not the re-writes that kill the MLCs, it is the erase and then write again. So long as a cell is not erased, it will not be damaged. I'm not sure how this will affect the storage, but if they're using a good logging system, no DELETEs/UPDATEs will occur, only inserts with adjustments, which would mean that data is hardly ever being changed (minus the indexes and logs).


Reply
exact setup by hieuu, 375 days ago
hi

please explain your exact setup / drive configuration / controller configuration.

was the log drive after swap on the same controller? what was your caching scheme.

thanks

Reply
RE: exact setup by JohanAnandtech, 374 days ago
This should answer your question:

Innodbpool of 950 MB, 258 MB Database (so all indexes are cached), Adaptec 5805 RAID card with 512 MB of DDR-2.

http://it.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3414&p=7

Basically the SSD was plugged into the backplane of our Supermicro server. So it is running on a relatively low end LSI 1068 controller.

Reply
First sentence does not make sense by ggordonliddy, 375 days ago
> The last time, more and more of the database vendors are talking
> about the wonders that SSD can do for transactional (OLTP) databases.

That statement does not make sense. Did you mean "were talking"?

Reply
RE: First sentence does not make sense by JohanAnandtech, 374 days ago
Does it make sense now? :-)

Reply
RE: First sentence does not make sense by ggordonliddy, 374 days ago
Yes, thanks.

Reply
RAM disk performance? by overzealot, 375 days ago
If the performance improved significantly with low latency, would running the log on a RAMdrive increase performance? Might be worth checking while you're testing.

Reply
RE: RAM disk performance? by RagingDragon, 375 days ago
Yeah, why not use a RAM drive for benchmarking? While it might not be a good idea for a production database (*), it would maximize the chances of a CPU bottleneck - and thus a good CPU test. Rather like benchmarking games at minimum resolution and detail on top of the line graphics cards.

* Gigabyte make battery backed RAM disks, and there are at least a few enterprise grade systems too, for example:

http://www.superssd.com/products/ramsan-440/

Reply
RE: RAM disk performance? by RagingDragon, 375 days ago
For benchmarking, I don't think the expense of the enterprise system would be justified (but if you can convince a vendor to "donate" one for you benchmarking, so much the better :D). However, something like the cheap Gigabyte cards or just, a server with tons of RAM and a simple operating system RAM drive should be OK for benchmarking (i.e. no serious consequences if you loose power/reboot and have to restore DB).

Reply
If your interested in further MySQL Benchmarks on SSD by bigdbahead, 374 days ago
I have been doing MySQL benchmarks on SSd for the past several months. I do normally use sysbench, but find DBT2 a little more reliable in terms of mirroring real world oltp performance. Reguardless take a look here: http://www.bigdbahead.com/ for some interesting MySQL SSD stuff.

Reply
Low latency and high throughput by jand, 374 days ago
I had a system where we have large number of files with sizes ~60kB
The disk subsystem could not run at full speed due to the 'bursty' nature of the file writes. If it is a large file chunk, it is a sequential write so everything goes much faster.

With that as comparison, the transaction logging is essentially bursty traffic and relatively smaller compared to data writes hence benefits more from lower latency than higher throughput.

You can use a ramdisk for the transaction log as you will have high throughput and low latency. That will be your gold standard for comparisons.


Reply
Cheap MLC? by vol7ron, 372 days ago
Intel does not have a cheap MLC. If $600+ is cheap, you must be crazy. Other manufacturers produce cheap MLCs, which cost under $200 for the same size.

Those cheap MLCs also have random-write problems are not adequate for anything really. Intel has produced one of the best MLCs, almost equivalent in performance to a SLC, for slightly less money. It is not good to identify one of the best MLCs, call it cheap, and compare it to top of the line hard disk systems. That misleads the reader into thinking the actual cheap MLCs (like OCZ or MTrons) are being used, which is not the case.

Reply
Try MFT with the drive by mafj, 371 days ago
To really test the SSD performance one would need driver implementing Managed Flash Technology.
http://managedflash.com/
It can in practice boost write efficiency to nearly meet read one (10k IOPS) on SSD drives not mentioning improving SSD's life.
You can try MFT drivers evaluation version on the given site.



Reply
Great info - Raid 10 w\ SSD?? by cgsaben, 366 days ago
Great test, this is exactly why I visit this site. I am an architect for a large online backup company, these articles are great resources.

I realize you might not have many samples, but it would be great to see what results you would get, if you setup your raid 10 with SSD disks. I don't plan to use a single stranded SSD for critical data storage in my environment.

Reply
Comments Page 1 of 1





AnandTech.com Blog Categories
All categories
Anand's Macdates
Anand's Theater Construction
Anand's Updates
Cases and Power Supplies
CeBIT 2008
CES 2008
Computex 2009
Derek Decanted
Eddie's Got Game
Gary's First Looks
IT Computing general
Jarred's Musings
Kris's Corner
Raja's Ramblings
Rob's Experiences...
Ryan's Ramblings
Virtualization
What's New with Wes
Blank
Blank

Blank

Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank


more Blogs Discussions



pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information