ZFS - Building, Testing, and Benchmarking
by Matt Breitbach on October 5, 2010 4:33 PM EST- Posted in
- IT Computing
- Linux
- NAS
- Nexenta
- ZFS
If you are in the IT field, you have no doubt heard a lot of great things about ZFS, the file system originally introduced by Sun in 2004. The ZFS file system has features that make it an exciting part of a solid SAN solution. For example, ZFS can use SSD drives to cache blocks of data. That ZFS feature is called the L2ARC. A ZFS file system is built on top of a storage pool made up of multiple devices. A ZFS file system can be shared through iSCSI, NFS, and CFS/SAMBA.
We need a lot of reliable storage to host low cost websites at No Support Linux Hosting. In the past, we have used Promise iSCSI solutions for SAN based storage. The Promise SAN solutions are reliable, but they tend to run out of disk IO long before they run out of disk space. As a result, we have been intentionally under-utilizing our current SAN boxes. We decided to investigate other storage options this year in an effort to improve the performance of our storage without letting costs get completely out of hand.
We decided to spend some time really getting to know OpenSolaris and ZFS. Our theory was that we could build a custom ZFS based server for roughly the same price as the Promise M610i SAN, and the ZFS based SAN could outperform the M610i at that price point. If our theory proved right, we would use the ZFS boxes in future deployments. We also tested the most popular OpenSolaris based storage solution, Nexenta, on the same hardware. We decided to blog about our findings and progress at ZFSBuild.com, so others could benefit from anything we learned throughout the project.
102 Comments
View All Comments
L. - Wednesday, March 16, 2011 - link
Too bad you already have the 15k drives.2) I wanted to say this earlier, but I'm quite confident that SLC is NOT required for a SLOG device, as with current wear leveling, unless you actually write more than <MLC disk capacity> / day there is no way you'll ever need the SLC's extended durability.
3) Again, MLC SSD's, good stuff
4) Yes again
5) not too shabby
6) Why use 15k or 7k2 rpm drives in the first place
All in all nice project, just too bad you have to start from used equipment.
In my view, you can easily trash both your similar system and Anandtech's test system and simply go for what the future is going to be anyway :
Raid-10 MLC drives, 48+RAM, 4 CPU's (yes those MLC's are going to perform so much faster you will need this - quite a fair chance you'll need AMD stuff on that as 4-socket is their place) and mainly and this is the hardest part, sata 6 Gb/s * many with a controller that can actually handle the bandwidth.
Overall you'd get a much simpler, faster and cleaner solution (might need to upgrade your networking though to match with the rest).
L. - Wednesday, March 16, 2011 - link
Of course, 6 months later .. .its not the same equation ;) Sorry for the necroB3an - Tuesday, October 5, 2010 - link
I like seeing stuff like this on Anand. It's a shame it dont draw as much interest as even the poor Apple articles.Tros - Tuesday, October 5, 2010 - link
Actually, I was just hoping to see a ZFS vs HFS+ comparison for the higher-end Macs. But with the given players (Oracle, Apple), I don't know if the drivers will ever be officially released.Taft12 - Wednesday, October 6, 2010 - link
Doesn't it? This interests me greatly and judging by the number of comments is as popular as any article about the latest video or desktop CPU techgreenguy - Wednesday, October 6, 2010 - link
I have to say, kudos to you Anand for featuring an article about ZFS! It is truly the killer app for filesystems right now, and nothing else is going to come close to it for quite some time. What use is performance if you can't automatically verify that your data (and the system files that tells your system how to manipulate that data) was what it was the last time you checked?You picked up on the benefits of the SSD (low latency) before anyone else, it is no wonder you've figured out the benefits of ZFS too earlier than most of your compatriots as well. Well done.
elopescardozo - Tuesday, October 5, 2010 - link
Hi Matt,Thank you for the extensive report. In your testing results there are a few unexpected results. I find the difference between Nexenta and Open Solaris hard to understand, unless it is due to misalignment of the IO in the case of Nexenta.
A zvol (the basis for an iSCSI volume) is created on top of the ZFS pool with a certain block size. I believe the default is 8kB. Next you initialize the volume and format it with NTFS. By default the NTFS structure starts at sector 63 (sixty three, not a typo!), which means that every other 4kB cluster (the NTFS allocation size) falls over a zvol block boundary. That has a serious impact on performance. I saw a report of 70% improvement after properly alignment.
Is it possible that the Open Solaris and Nexenta pools were different in this respect, either because of different zvol block size (e.g. 8kB for Nexenta, 128kB for Open Solaris – larger blocks means less “boundary cases”) or differences in how the volumes were initialized and formatted?
mbreitba - Tuesday, October 5, 2010 - link
It's possible that the sector alignment could be a problem, but I believe the build that we tested, the default sector size was set to 128kB, which was identical to OpenSolaris. If that has changed, then we should re-test with the newest build to see if that makes any differences.cdillon - Tuesday, October 5, 2010 - link
Windows Server 2008 aligns all newly created partitions at 1MB, so his NTFS block access should have been properly aligned by default.Mattbreitbach - Tuesday, October 5, 2010 - link
I was unaware that Windows 2008 correctly aligned NTFS partitions now. Thanks for the info!