ZFS - Building, Testing, and Benchmarking

Name: ZFS - Building, Testing, and Benchmarking
Item: ZFS - Building, Testing, and Benchmarking
Author: Matt Breitbach

by Matt Breitbach on October 5, 2010 4:33 PM EST

102 Comments | Add A Comment

102 Comments

After the exhaustive building and testing process, we've found several areas where we could have improved the original build.

Improved CPU

When we initially decided which hardware components to use, we thought we would not need very much CPU. While we are not doing any type of parity with our storage, we neglected to account for the checksumming that ZFS does to maintain data integrity. This checksumming consumes significantly more processor time than we had originally anticipated. Many tests were using 70% or more of the CPU. We believe that at this high of CPU utilization that there is significant IO contention. Our next ZFS based storage system will probably be based on a dual socket platform and higher clocked (possibly more cores also) CPU's, giving additional headroom for the checksumming and allowing you to use more advanced features that consume CPU resources like Deduplication and Compression. It is not a noticeable problem when testing with gigabit Ethernet speeds. We have been doing some additional benchmarking using 20Gbps InfiniBand, and we have been able to max out the CPU in the ZFS server well before we approached the limits of 20Gbps networking.

More Memory

Going into this project, we did not really know how much main memory we would need in the ZFS SAN, or how well the system would perform with more main memory. After doing some tests on smaller datasets that fit entirely into main memory, we decided that our next build would be 48GB of RAM or more. As a general rule, ZFS will benefit from as much RAM as you can afford to give it. The ARC (main memory) cache of Nexenta and OpenSolaris both function great when the dataset fits entirely into the main cache, and the performance benefits gained from having significant amounts of main memory are huge. At some point you will run into diminishing returns. If you're working with a dataset that is able to fit into main memory and is mainly reads, having more memory for the ARC cache will significantly improve performance. We saw numbers in the 100's of thousands of IOPS when working just out of main memory for random reads. On the flip side of the coin, if your workload is mainly writes then adding 48GB of RAM or more may not give you any noticeable performance advantage.

SAS drives

We thought ZFS's advanced software could overcome some of the inherent problems with slow spindle speeds, and it did up to a certain point. ZFS on OpenSolaris was able to outperform the Promise M610i at basically the same price point. However, we feel we left a lot more performance on the table. Next time we deploy a ZFS server, we plan to use 15k RPM SAS drives instead of 7200 RPM SATA drives as the primary storage. We suspect that we could have easily doubled the performance of our ZFS box in certain tests by using 15k RPM SAS drives. The downside of the SAS drives will be increased cost and decreased capacity, but those tradeoffs will be worthwhile for us if we can double the IOPS, especially on write operations where all transactions have to be committed to disk as quickly as possible. Reads may not be affected as much since many of the reads are coming from SSD storage already, and having SAS drives feed the SSD's would probably not increase overall performance unless your working set is large enough to exceed the total capacity of the SSD drives.

SSD Drives

In the ZFS project, we used SLC style SSD drives for ZIL and MLC style SSD drives for L2ARC. If the price on MLC style SSD drives continues to fall, we will eventually omit the L2ARC and simply use MLC style SSD drives for all of the primary storage. When that day comes, we will also need to use multiple SAS controllers and a much faster CPU in each ZFS box to keep up with all of the IO that it will be able to deliver. Our only concern would be the wear leveling on the MLC drives and the ability of the drives to sustain writes over an extended period of time. Only time will tell if the drives will be able to handle the sustained writes in an L2ARC role or as a primary storage role.

If you decide to use MLC SSD drives for actual storage instead of using SATA or SAS hard drives, then you don’t need to use cache drives. Since all of the storage drives would already be ultra fast SSD drives, there would be no performance gained from also running cache drives. You would still need to run SLC SSD drives for ZIL drives, though, as that would reduce wear on the MLC SSD drives that were being used for data storage.

If you plan to attach a lot of SSD drives, remember to use multiple SAS controllers. The SAS controller in the motherboard for our ZFS Build project is based on the LSI 1068e chipset. We could not find specific numbers for our integrated SAS controller, but another LSI 1068 based standalone card the LSI SAS3080X-R is able sustain 140,000 IOPS. If you use enough SSD drives, you could actually saturate the SAS controller. As a general rule of thumb, you may want to have one additional SAS controller for every 24 MLC style SSD drives. Of course, we have not tested with 24 MLC style SSD's, that number could be higher or lower, but based on our initial performance numbers and the percieved performance of our SAS controller, we believe that 24 would be a good starting point.

Shortcomings of OpenSolaris Conclusion

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

102 Comments

View All Comments

Mattbreitbach - Tuesday, October 5, 2010 - link
Indeed you can, which is one of the most exciting parts about using software based storage appliances. Nexenta really excels in this area, offering iSCSI, NFS, SMB, and WebDAV with simple mouse clicks.
MGSsancho - Tuesday, October 5, 2010 - link
or a single command!
FransUrbo - Wednesday, January 11, 2012 - link
Would be really nice to see how ZoL compares. It's in no way optimized yet (current work is on getting the core functionality stable - which IMHO it is) so it would have no chanse against OpenSolaris or Nexenta, but hopfully it's comparative to the Promise rack.

http://zfsonlinux.org/
gfg - Tuesday, October 5, 2010 - link
NAS is extremely cost effective in a data center if a large majority of NFS/CIFS users are more interested in capacity, not performance. NDMP can be very efficent for backups, and the snapshots/multi-protocol aspects of NAS systems are fairly easy to manage. Some of the larger Vendor NAS systems can support 100+TB's per NAS fairly effectively.
bhigh - Wednesday, October 6, 2010 - link
Actually, OpenSolaris and Nexenta can act as a SAN device using COMSTAR. You can attach to them with iSCSI, FC, Infiniband, etc. and use any zvols as raw scsi targets.
JGabriel - Wednesday, October 6, 2010 - link
Also, "Testing and Benchmarking"?

Doesn't that mean the same thing and isn't it redundant? See what I did there?

.
Fritzr - Thursday, October 7, 2010 - link
This is similar to the NAS<>SAN argument. They are used in a similar manner, but have very different purposes.

Testing. You are checking to see if the item performance meets your need & looking for bugs or other problems including documentation and support.

Benchmarking. You are running a series of test sets to measure the performance. Bugs & poor documentation/support may abort some of the measuring tools, but that simply goes into the report of what the benchmarks measured.

Or in short:
Test==does it work?
Benchmark==What does it score on standard performance measures?
lwatcdr - Friday, October 8, 2010 - link
I am no networking expert so please bear with me.
What are the benfits of a SAN over local drivers and or a NAS?
I would expect a NAS to have better performance since it would send less data over the wire than a SAN if they both had the same physical connection.
A local drive/array I would expect to be faster than a SAN since it will not need to go through a network.
Does it all come down to management? I can see the benefit of having your servers boot over the network and having all your drives in one system. If you set up the servers to boot over the network it would be really easy to replace a server.
Am I missing something or are the gains all a matter of management?
JohanAnandtech - Sunday, October 10, 2010 - link
A NAS has most of the time worse performance than a similar SAN since there is a file system layer on the storage side. A SAN only manages block and has thus less layers, and is more efficient.

A local drive array is faster, but is less scalable and depending on the setup, it is harder to give a large read/write cache: you are limited by the amount of RAM your cache controller supports. In a software SAN you can use block based caches in the RAM of your storage server.

Management advantages over Local drives are huge: for example you can plug a small ESXi/Linux flash drive which only contains the hypervisor/OS, and then boot everything else from a SAN. That means that chances are good that you never have to touch your server during its lifetime and handle all storage and VM needs centrally. Add to that high availability, flexibility to move VMs from one server to another and so on.
lwatcdr - Monday, October 11, 2010 - link
I but that layer must be executed somewhere I thought that decrease in data sent over the physical wire would make up for the extra software cost on the server side.
Besides you would still want a NAS even with a SAN for shared data. I am guessing that you could have a NAS served data from the SAN if you needed shared directories.
I also assume that since most SAN are on a separate storage network that the SAN is mainly used to provide storage to servers and than the servers provide data to clients on the lan.
The rest of it seems very logical to me in a large setup. I am guessing that if you have a really high performance data base server that one might use a DAS instead of SAN or dedicate a SAN server just to the database server.
Thanks I am just trying to educate myself on SANs vs NAS vs DAS.
Since I work at a small software development firm our sever setup is much simpler than the average Data center so I don't get to deal this level of hardware often.
However I am thinking that maybe we should build a SAN and storage network just for our rack.

ZFS - Building, Testing, and Benchmarking

Post Your Comment

102 Comments

View All Comments

Mattbreitbach - Tuesday, October 5, 2010 - link

MGSsancho - Tuesday, October 5, 2010 - link

FransUrbo - Wednesday, January 11, 2012 - link

gfg - Tuesday, October 5, 2010 - link

bhigh - Wednesday, October 6, 2010 - link

JGabriel - Wednesday, October 6, 2010 - link

Fritzr - Thursday, October 7, 2010 - link

lwatcdr - Friday, October 8, 2010 - link

JohanAnandtech - Sunday, October 10, 2010 - link

lwatcdr - Monday, October 11, 2010 - link

Log in

Don't have an account? Sign up now