ZFS - Building, Testing, and Benchmarking

Name: ZFS - Building, Testing, and Benchmarking
Item: ZFS - Building, Testing, and Benchmarking
Author: Matt Breitbach

by Matt Breitbach on October 5, 2010 4:33 PM EST

102 Comments | Add A Comment

102 Comments

After the exhaustive building and testing process, we've found several areas where we could have improved the original build.

Improved CPU

When we initially decided which hardware components to use, we thought we would not need very much CPU. While we are not doing any type of parity with our storage, we neglected to account for the checksumming that ZFS does to maintain data integrity. This checksumming consumes significantly more processor time than we had originally anticipated. Many tests were using 70% or more of the CPU. We believe that at this high of CPU utilization that there is significant IO contention. Our next ZFS based storage system will probably be based on a dual socket platform and higher clocked (possibly more cores also) CPU's, giving additional headroom for the checksumming and allowing you to use more advanced features that consume CPU resources like Deduplication and Compression. It is not a noticeable problem when testing with gigabit Ethernet speeds. We have been doing some additional benchmarking using 20Gbps InfiniBand, and we have been able to max out the CPU in the ZFS server well before we approached the limits of 20Gbps networking.

More Memory

Going into this project, we did not really know how much main memory we would need in the ZFS SAN, or how well the system would perform with more main memory. After doing some tests on smaller datasets that fit entirely into main memory, we decided that our next build would be 48GB of RAM or more. As a general rule, ZFS will benefit from as much RAM as you can afford to give it. The ARC (main memory) cache of Nexenta and OpenSolaris both function great when the dataset fits entirely into the main cache, and the performance benefits gained from having significant amounts of main memory are huge. At some point you will run into diminishing returns. If you're working with a dataset that is able to fit into main memory and is mainly reads, having more memory for the ARC cache will significantly improve performance. We saw numbers in the 100's of thousands of IOPS when working just out of main memory for random reads. On the flip side of the coin, if your workload is mainly writes then adding 48GB of RAM or more may not give you any noticeable performance advantage.

SAS drives

We thought ZFS's advanced software could overcome some of the inherent problems with slow spindle speeds, and it did up to a certain point. ZFS on OpenSolaris was able to outperform the Promise M610i at basically the same price point. However, we feel we left a lot more performance on the table. Next time we deploy a ZFS server, we plan to use 15k RPM SAS drives instead of 7200 RPM SATA drives as the primary storage. We suspect that we could have easily doubled the performance of our ZFS box in certain tests by using 15k RPM SAS drives. The downside of the SAS drives will be increased cost and decreased capacity, but those tradeoffs will be worthwhile for us if we can double the IOPS, especially on write operations where all transactions have to be committed to disk as quickly as possible. Reads may not be affected as much since many of the reads are coming from SSD storage already, and having SAS drives feed the SSD's would probably not increase overall performance unless your working set is large enough to exceed the total capacity of the SSD drives.

SSD Drives

In the ZFS project, we used SLC style SSD drives for ZIL and MLC style SSD drives for L2ARC. If the price on MLC style SSD drives continues to fall, we will eventually omit the L2ARC and simply use MLC style SSD drives for all of the primary storage. When that day comes, we will also need to use multiple SAS controllers and a much faster CPU in each ZFS box to keep up with all of the IO that it will be able to deliver. Our only concern would be the wear leveling on the MLC drives and the ability of the drives to sustain writes over an extended period of time. Only time will tell if the drives will be able to handle the sustained writes in an L2ARC role or as a primary storage role.

If you decide to use MLC SSD drives for actual storage instead of using SATA or SAS hard drives, then you don’t need to use cache drives. Since all of the storage drives would already be ultra fast SSD drives, there would be no performance gained from also running cache drives. You would still need to run SLC SSD drives for ZIL drives, though, as that would reduce wear on the MLC SSD drives that were being used for data storage.

If you plan to attach a lot of SSD drives, remember to use multiple SAS controllers. The SAS controller in the motherboard for our ZFS Build project is based on the LSI 1068e chipset. We could not find specific numbers for our integrated SAS controller, but another LSI 1068 based standalone card the LSI SAS3080X-R is able sustain 140,000 IOPS. If you use enough SSD drives, you could actually saturate the SAS controller. As a general rule of thumb, you may want to have one additional SAS controller for every 24 MLC style SSD drives. Of course, we have not tested with 24 MLC style SSD's, that number could be higher or lower, but based on our initial performance numbers and the percieved performance of our SAS controller, we believe that 24 would be a good starting point.

Shortcomings of OpenSolaris Conclusion

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

102 Comments

View All Comments

diamondsw2 - Tuesday, October 5, 2010 - link
You're not doing your readers any favors by conflating the terms NAS and SAN. NAS devices (such as what you've described here) are Network Attached Storage, accessed over Ethernet, and usually via fileshares (NFS, CIFS, even AFP) with file-level access. SAN is Storage Area Network, nearly always implemented with Fibre Channel, and offers block-level access. About the only gray area is that iSCSI allows block-level access to a NAS, but that doesn't magically turn it into a SAN with a storage fabric.

Honestly, given the problems I've seen with NAS devices and the burden a well-designed one will put on a switch backplane, I just don't see the point for anything outside the smallest installations where the storage is tied to a handful of servers. By the time you have a NAS set up *well* you're inevitably going to start taxing your switches, which leads to setting up dedicated storage switches, which means... you might as well have set up a real SAN with 8Gbps fibre channel and been done with it.

NAS is great for home use - no special hardware and cabling, and options as cheap as you want to go - but it's a pretty poor way to handle centralized storage in the datacenter.
cdillon - Tuesday, October 5, 2010 - link
The terms NAS and SAN have become rightfully mixed, because modern storage appliances can do the jobs of both. Add some FC HBAs to the above ZFS storage system and create some FC Targets using Comstar in OpenSolaris or Nexenta and guess what? You've got a "SAN" box. Nexenta can even do active/active failover and everything else that makes it worthy of being called a true "Enterprise SAN" solution.

I like our FC SAN here, but holy cow is it expensive, and its not getting any cheaper as time goes on. I foresee iSCSI via plain 10G Ethernet and also FCoE (which is 10G Ethernet + FC sharing the same physical HBA and data link) completely taking over the Fibre Channel market within the next decade, which will only serve to completely erase the line between "NAS" and "SAN".
mbreitba - Tuesday, October 5, 2010 - link
The systems as configured in this article are block level storage devices accessed over a gigabit network using iSCSI. I would strongly consider that a SAN device over a NAS device. Also, the storage network is segregated onto a separate network already, isolated from the primary network.

We also backed this device with 20Gbps InfiniBand, but had issues getting the IB network stable, so we did not include it in the article.
Maveric007 - Tuesday, October 5, 2010 - link
I find iscsi is closer to a NAS then a SAN to be honest. The performance difference between iscsi and san are much further away then iscsi and nas.
Mattbreitbach - Tuesday, October 5, 2010 - link
iSCSI is block based storage, NAS is file based. The transport used is irrelevent. We could use iSCSI over 10GbE, or over InfiniBand, which would increase the performance significantly, and probably exceed what is available on the most expensive 8Gb FC available.
mino - Tuesday, October 5, 2010 - link
You are confusing the NAS vs. SAN terminology with the interconnects terminology and vice versa.

SAN, NAS, DAS ... are abstract methods how a data client accesses the stored data.
--Network Attached Storage (NAS), per definition, is an file/entity-based data storage solution.
- - - It is _usually_but_not_necessarily_ connected to a general-purpose data network
--Storage Area Network(SAN), per definition, is a block-access-based data storage solution.
- - - It is _usually_but_not_necessarily_THE_ dedicated data network.

Ethernet, FC, Infiniband, ... are physical data conduits, they are the ones who define in which PERFORMANCE class a solution belongs

iSCSI, SAS, FC, NFS, CIFS ... are logical conduits, they are the ones who define in which FEATURE CLASS a solution belongs

Today, most storage appliances allow for multiple ways to access the data, many of the simultaneously.

Therefore, presently:

Calling a storage appliance, of whatever type, a "SAN" is pure jargon.
- It has nothing to do with the device "being" a SAN per se
Calling an appliance, of whatever type, a "NAS" means it is/will be used in the NAS role.
- It has nothing to do with the device "being" a NAS per se.
mkruer - Tuesday, October 5, 2010 - link
I think there needs to be a new term called SANNAS or snaz short for snazzy.
mmrezaie - Wednesday, October 6, 2010 - link
Thanks, I learned a lot.
signal-lost - Friday, October 8, 2010 - link
Depends on the hardware sir.

My iSCSI Datacore SAN, pushes 20k iops for the same reason that their ZFS does it (Ram cacheing).

Fibre Channel SANs will always outperform iSCSI run over crappy switching.
Currently Fibre Channel maxes out at 8Gbps in most arrays. Even with MPIO, your better off with an iSCSI system and 10/40Gbps Ethernet if you do it right. Much cheaper, and you don't have to learn an entire new networking model (Fibre Channel or Infiniband).
MGSsancho - Tuesday, October 5, 2010 - link
while technically a SAN you can easily make it a NAS with a simple zfs set sharesmb=on as I am sure you are aware.

ZFS - Building, Testing, and Benchmarking

Post Your Comment

102 Comments

View All Comments

diamondsw2 - Tuesday, October 5, 2010 - link

cdillon - Tuesday, October 5, 2010 - link

mbreitba - Tuesday, October 5, 2010 - link

Maveric007 - Tuesday, October 5, 2010 - link

Mattbreitbach - Tuesday, October 5, 2010 - link

mino - Tuesday, October 5, 2010 - link

mkruer - Tuesday, October 5, 2010 - link

mmrezaie - Wednesday, October 6, 2010 - link

signal-lost - Friday, October 8, 2010 - link

MGSsancho - Tuesday, October 5, 2010 - link

Log in

Don't have an account? Sign up now