ZFS - Building, Testing, and Benchmarking

Name: ZFS - Building, Testing, and Benchmarking
Item: ZFS - Building, Testing, and Benchmarking
Author: Matt Breitbach

by Matt Breitbach on October 5, 2010 4:33 PM EST

102 Comments | Add A Comment

102 Comments

Other Cool ZFS Features

There are many items that we have not touched on in this article, and those are worthy of mentioning at this time simply because they are enterprise features that are available with OpenSolaris and with Nexenta. These are features that the Promise M610i cannot compete with in any way.

Block Level Deduplication - ZFS can employ block level deduplication, which is to say it can detect identical blocks, and simply keep one copy of the data. This can significantly reduce storage costs, and possibly improve performance when the circumstances allow. One group that recently deployed a Nexenta instance had originally configured the system for 2TB of storage. They were using 1.4TB at the time and wanted to have room to grow. By enabling deduplication they were able to shrink the actual used space on the drives to just under 800GB. This also has implications when randomly accessing data. If you have multiple copies of the same data spread out all over a hard drive, it has to seek to find that data. If it's actually only stored in one place, you can potentially reduce the number of seeks that your drives have to do to retrieve the data.

Compression - ZFS also offers native compression similar to gzip compression. This allows you to save space at the expense of CPU and memory usage. For a system that is simply used for archiving data, this could be a great money and space saver. For a system that is being actively used as a database server, compression may not be the best idea.

Snapshot Shipping - OpenSolaris and Nexenta also offer snapshot shipping. This allows you to snapshot the entire storage array and back it up via SSH to a remote server. Once you ship the initial snapshot, only incremental data changes are shipped, so you can conserve bandwidth while still replicating your data to a remote location. Keep in mind that this is not a block level replication, but a point in time snapshot, so as soon as the snapshot is taken, any new data is not shipped to the remote system.

ZFS Features Nexenta

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

102 Comments

View All Comments

MGSsancho - Tuesday, October 5, 2010 - link
I haven't tried this myself yet but how about using 8kb blocks and using jumbo frames on your network? possibly lower through padding to fill the 9mb packet in exchange for lower latency? I have no idea as this is just a theory. dudes in the #opensolaris irc chan have always recommended 128K or 64K depending on the data.
solori - Wednesday, October 20, 2010 - link
One easy way to check this would be to export the pool from OpenSolaris and directly import it to NexentaStor and re-test. I think you'll find that the differences - as your benchmarks describe - are more linked to write caching at the disk level than partition alignment.

NexentaStor is focused on data integrity, and tunes for that very conservatively. Since SATA disks are used in your system, NexentaStor will typically disable disk write cache (write hit) and OpenSolaris may typically disable device cache flush operations (write benefit). These two feature differences can provide the benchmark differences you're seeing.

Also, some "workstation" tuning includes the disabling of ZIL (performance benefit). This is possible - but not recommended - in NexentaStor but has the side effect of risking application data integrity. Disabling the ZIL (in the absence of SLOG) will result in synchronous writes being committed only with transaction group commits - similar performance to having a very fast SLOG (lots of ARC space helpful too).
fmatthew5876 - Tuesday, October 5, 2010 - link
I'd be very interested to see how FreeBSD ZFS benchmark results would compare to Nexenta and Open Solaris.
mbreitba - Tuesday, October 5, 2010 - link
We have benchmarked FreeNAS's implimentation of ZFS on the same hardware, and the performance was abysmal. We've considered looking into the latest releases of FreeBSD but have not completed any of that testing yet.
jms703 - Tuesday, October 5, 2010 - link
Have you benchmarked FreeBSD 8.1? There were a huge number of performance fixes in 8.1.

Also, when was this article written? OpenSolaris was killed by Sun on August 13th, 2010.
mbreitba - Tuesday, October 5, 2010 - link
There was a lot of work on this article just prior to the official announcement. The development of the Illumos foundation and subsequent OpenIndiana has been so rapidly paced that we wanted to get this article out the door before diving in to OpenIndiana and any other OpenSolaris derivatives. We will probably add more content talking about the demise of OpenSolaris and the Open Source alternatives that have started popping up at a later date.
MGSsancho - Tuesday, October 5, 2010 - link
Not to mention that projects like illumos are currently not recommended for production, Currently only meant as a base for other distros (OpenIndiana.) Then there is Solaris 11 due soon. I'll try out the express version when its released.
cdillon - Tuesday, October 5, 2010 - link
FreeNAS 0.7.x is still using FreeBSD 7.x, and the ZFS code is a bit dated. FreeBSD 8.x has newer ZFS code (v15). Hopefully very soon FreeBSD 9.x will have the latest ZFS code (v24).
piroroadkill - Tuesday, October 5, 2010 - link
This is relevant to my interests, and I've been toying with the idea of setting up a ZFS based server for a while.

It's nice to see the features it can use when you have the hardware for it.
cgaspar - Tuesday, October 5, 2010 - link
You say that all writes go to a log in ZFS. That's just not true. Only synchronous writes below a certain size go into the log (either built into the pool, or a dedicated log device). All writes are held in memory in a transaction group, and that transaction group is written to the main pool at least every 10 seconds by default (in OpenSolaris - it used to be 30 seconds, and still is in Solaris 10 U9). That's tunable, and commits will happen more frequently if required, based on available ARC and data churn rate. Note that _all_ writes go into the transaction group - the log is only ever used if the box crashes after a synchronous write and before the txg commits.

Now for the caution - you have chosen SSDs for your SLOG that don't have a backup power source for their on board caches. If you suffer power loss, you may lose data. Several SLC SSDs have recently been released that have a supercapacitor or other power source sufficient to write cache data to flash on power loss, but the current Intel like up doesn't have it. I believe the next generation Intel SSDs will.

ZFS - Building, Testing, and Benchmarking

Other Cool ZFS Features

Post Your Comment

102 Comments

View All Comments

MGSsancho - Tuesday, October 5, 2010 - link

solori - Wednesday, October 20, 2010 - link

fmatthew5876 - Tuesday, October 5, 2010 - link

mbreitba - Tuesday, October 5, 2010 - link

jms703 - Tuesday, October 5, 2010 - link

mbreitba - Tuesday, October 5, 2010 - link

MGSsancho - Tuesday, October 5, 2010 - link

cdillon - Tuesday, October 5, 2010 - link

piroroadkill - Tuesday, October 5, 2010 - link

cgaspar - Tuesday, October 5, 2010 - link

Log in

Don't have an account? Sign up now