More and more of the database vendors are talking about the wonders that SSD can do for transactional (OLTP) databases. So I read Anand's latest SSD article with more than usual interest. If many of the cheaper MLC SSD's write small blocks 20 times slower than a decent harddrive, these SSD's are an absolute nightmare for OLTP databases. 
In our last Dunnington review, we showed our latest virtualization test which includes 4 concurrent OLTP ("Sysbench") tests on four separate MySQL 5.1.23 databases in four ESX virtual machines. We were fairly confident that our 6 disks RAID-0 for data and 1 separate disk for logging were capable of keeping up. After all, each disk is a 300 GB Cheetah Seagate at 15000 rpm, probably one of the fastest (mechanical) disks on this planet as it can deliver up to 400 I/O per second (and 125 MB/s sequential data rate).
But it is better to be safe than to be sorry. We did extensive monitoring with IOstat (on a "native" SLES 10 SP2) and found the following numbers on the disk that performs the logging transactions: 
  • queue length is about 0.22 (More than 2 indicates that the harddisk can not keep up)
  • typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms (we measure the average over the past 2 seconds) 
That reassured us that our transaction log disk was not a bottleneck. On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.
So we had to delve a little deeper. I first thought that as long as the harddisk is not the bottleneck, the number of transactions would be more or less the same with a faster disk. It turned out that I was somewhat wrong. 

In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
 innodb_flush_log_at_trx_commit = 1
So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.
We still have to investigate this further but this seems to go a bit against the typical sizing advice that is given for OLTP databases: make sure your log disks achieve a certain numbers of I/Os or put otherwise: "make sure you have enough spindles". That doesn't seem to paint the complete picture: as each write to disk action seems to be in the "critical speed path" of your transaction, each individual access latency seems to influence performance.
We monitored the same Sysbench benchmark on our Intel X25-M disk: 
  • Queue length is lower: 0.153 (but 0.2 was already very low)
  • typical access latency: an average 0.1 with very few spikes of 0.5 ms.
  • 1900 instead of 1400 tr/s
 So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency. We welcome your feedback!



View All Comments

  • Devzero - Monday, November 10, 2008 - link

    Is this a problem only on MySQL or would MSSQL server/Oracle/DB2/.. have the same problems? Reply
  • bollux78 - Sunday, November 9, 2008 - link

    seems like software will never be optimized enough, in this case for the SSDs write latencies, and in other cases for the raw throughput of HDDs. We will always find new bottlenecks and weak points, as development will never cover all aspects possible. Reply
  • JohanAnandtech - Monday, November 10, 2008 - link

    The main objective of my Blogpost was not to point out a "weakness" of MySQL. More importantly was the fact that we were surprised to find that the mindset of "make sure you have enough spindles" for your transactions logs is not really accurate.

    It is also important for our CPU tests: as CPUs get faster, it gets harder to keep the CPU load high enough to make a meaningful comparison. It is so horribly easy to screw up servertesting :-).

Log in

Don't have an account? Sign up now