After the exhaustive building and testing process, we've found several areas where we could have improved the original build.

Improved CPU

When we initially decided which hardware components to use, we thought we would not need very much CPU.  While we are not doing any type of parity with our storage, we neglected to account for the checksumming that ZFS does to maintain data integrity.  This checksumming consumes significantly more processor time than we had originally anticipated.  Many tests were using 70% or more of the CPU.  We believe that at this high of CPU utilization that there is significant IO contention.  Our next ZFS based storage system will probably be based on a dual socket platform and higher clocked (possibly more cores also) CPU's, giving additional headroom for the checksumming and allowing you to use more advanced features that consume CPU resources like Deduplication and Compression.  It is not a noticeable problem when testing with gigabit Ethernet speeds.  We have been doing some additional benchmarking using 20Gbps InfiniBand, and we have been able to max out the CPU in the ZFS server well before we approached the limits of 20Gbps networking.

More Memory

Going into this project, we did not really know how much main memory we would need in the ZFS SAN, or how well the system would perform with more main memory.  After doing some tests on smaller datasets that fit entirely into main memory, we decided that our next build would be 48GB of RAM or more.  As a general rule, ZFS will benefit from as much RAM as you can afford to give it.  The ARC (main memory) cache of Nexenta and OpenSolaris both function great when the dataset fits entirely into the main cache, and the performance benefits gained from having significant amounts of main memory are huge.  At some point you will run into diminishing returns.  If you're working with a dataset that is able to fit into main memory and is mainly reads, having more memory for the ARC cache will significantly improve performance.  We saw numbers in the 100's of thousands of IOPS when working just out of main memory for random reads.  On the flip side of the coin, if your workload is mainly writes then adding 48GB of RAM or more may not give you any noticeable performance advantage.

SAS drives

We thought ZFS's advanced software could overcome some of the inherent problems with slow spindle speeds, and it did up to a certain point.  ZFS on OpenSolaris was able to outperform the Promise M610i at basically the same price point.  However, we feel we left a lot more performance on the table.  Next time we deploy a ZFS server, we plan to use 15k RPM SAS drives instead of 7200 RPM SATA drives as the primary storage.  We suspect that we could have easily doubled the performance of our ZFS box in certain tests by using 15k RPM SAS drives.  The downside of the SAS drives will be increased cost and decreased capacity, but those tradeoffs will be worthwhile for us if we can double the IOPS, especially on write operations where all transactions have to be committed to disk as quickly as possible.  Reads may not be affected as much since many of the reads are coming from SSD storage already, and having SAS drives feed the SSD's would probably not increase overall performance unless your working set is large enough to exceed the total capacity of the SSD drives.

SSD Drives

In the ZFS project, we used SLC style SSD drives for ZIL and MLC style SSD drives for L2ARC.  If the price on MLC style SSD drives continues to fall, we will eventually omit the L2ARC and simply use MLC style SSD drives for all of the primary storage.  When that day comes, we will also need to use multiple SAS controllers and a much faster CPU in each ZFS box to keep up with all of the IO that it will be able to deliver.  Our only concern would be the wear leveling on the MLC drives and the ability of the drives to sustain writes over an extended period of time.  Only time will tell if the drives will be able to handle the sustained writes in an L2ARC role or as a primary storage role.

If you decide to use MLC SSD drives for actual storage instead of using SATA or SAS hard drives, then you don’t need to use cache drives. Since all of the storage drives would already be ultra fast SSD drives, there would be no performance gained from also running cache drives. You would still need to run SLC SSD drives for ZIL drives, though, as that would reduce wear on the MLC SSD drives that were being used for data storage.

If you plan to attach a lot of SSD drives, remember to use multiple SAS controllers. The SAS controller in the motherboard for our ZFS Build project is based on the LSI 1068e chipset.  We could not find specific numbers for our integrated SAS controller, but another LSI 1068 based standalone card the LSI SAS3080X-R is able sustain 140,000 IOPS. If you use enough SSD drives, you could actually saturate the  SAS controller. As a general rule of thumb, you may want to have one additional SAS controller for every 24 MLC style SSD drives.  Of course, we have not tested with 24 MLC style SSD's, that number could be higher or lower, but based on our initial performance numbers and the percieved performance of our SAS controller, we believe that 24 would be a good starting point.

Shortcomings of OpenSolaris Conclusion
Comments Locked

102 Comments

View All Comments

  • Penti - Wednesday, October 6, 2010 - link

    And a viable alternative still isn't available how is Nexenta and the community suppose to get driver support and support for new hardware there, when Oracle has closed the development kernel (SXDE is closed source), meaning that they maybe just maybe can use the retail Solaris 11 kernel if it's released in a functioning form that can be piped in with existing software and distro. They aren't going to develop it themselves and the vendors have no reason giving the code/drivers to anybody but Oracle. Continuing the OpenSolaris kernel means creating a new operating system. It means you won't get the latest ZFS updates and tools any more, at least not till they are in the normal S11 release. Means you can't expect the latest driver updates and so on either. You can continue to use it on todays hardware, but tomorrow it might be useless, you might not find working configurations.

    It's not clear that Nexenta actually can develop their own operating system, rather then just a distro, it means they have to create their own OS with their own kernel eventually. With their own drivers and so on. And it's not clear how much code Oracle will let slip out. It's just clear that they will keep it under wraps till official releases. It's however clear that there won't be any distro for them to base it on and any and all forks would be totally dependent on what Nexenta (Illuminos) manage to do. It will quickly get outdated without updates flowing all the time, and they came from Sun.
  • andersenep - Wednesday, October 6, 2010 - link

    OpenIndiana/Illumos runs the same latest and greatest pool/zfs versions as the most recent Solaris 10 update.

    Work continues on porting newer pool/ZFS versions to FreeBSD which has plenty of driver support (better than OpenSolaris ever did).

    A stated goal of the Illumos project is to maintain 100% binary compatibility with Solaris. If Oracle decides the break that compatibility, intentionally or not, it will truly become a fork. Development will still continue.

    Even if no further development is made on ZFS, it's still an absolutely phenomenal filesystem. How many years now has Apple been using HFS+? FAT is still around in everything. If all development on ZFS stopped today, it would still remain an absolutely viable filesystem for many years to come. There is nothing else currently out there that even comes close to its feature set.

    I don't see how ZFS being under Oracle's control makes it any worse than any other open source filesystem. The source is still out there, and people are free to do what they want with it within the CDDL terms.

    This idea that just because the OpenSolaris DISTRO has been discontinued, that everything that went into it is no longer viable is silly. It is like calling Linux dead because Mandriva is dead.
  • Guspaz - Wednesday, October 6, 2010 - link

    Thanks for mentioning OpenIndiana. I've been eagerly awaiting IllumOS to be built into an actual distribution to give me an upgrade path for my home OpenSolaris file server, and I look forward to upgrading to the first stable build of OpenIndiana.

    I'm currently running a dev build of OpenSolaris since the realtek network driver was broken in the latest stable build of OpenSolaris (for my chipset, at least).
  • Mattbreitbach - Wednesday, October 6, 2010 - link

    I believe all of the current Hypervisors support this. Hyper-V does, as does XenServer. I have not done extensive testing with ESXi, but I would imagine that it supports it also.
  • joeribl - Wednesday, October 6, 2010 - link

    "Nexenta is to OpenSolaris what OpenFiler or FreeNAS is to Linux."

    FreeNAS has always been FreeBSD based, not Linux. It does however provide ZFS support.
  • Mattbreitbach - Wednesday, October 6, 2010 - link

    I should have caught that - thanks for the info. I've edited the article to reflect as such.
  • vermaden - Wednesday, October 6, 2010 - link

    ... with deduplication and other features, here You can grab an ISO build or a VirtualBox apliance here: http://blog.vx.sk/archives/9-Pomozte-testovat-ZFS-...

    It would be great to see how FreeBSD performs (8.1 and 9-CURRENT) on that hardware, I can help You configure FreeBSD for these tests if You would like to, for example, by default FreeBSD does not enables AHCI mode for SATA drives which increases random performance a lot.

    Anyway, great article about ZFS performance on nice piece of hardware.
  • Mattbreitbach - Wednesday, October 6, 2010 - link

    In Hyper-V it is called a Differencing disk - you have a parent disk that you build, and do not modify. You then create a "differencing disk". That disk uses the parent disk as it's source, and writes any changes out to the differencing disk. This way you can maintain all core OS files in one image, and write any changes out to child disks. This allows the storage system to cache any core OS components once, and any access to those core components comes directly from the cache.

    I believe that Xen calls it a differencing disk also, but I do not currently have a Xen Hypervisor running anywhere that I can check quickly.
  • gea - Wednesday, October 6, 2010 - link

    new: Version 0.323
    napp-it ZFS appliance with Web-UI and online-installer for NexentaCore and Openindiana

    Napp-it, a project to build a free "ready to run" ZFS- Web und NAS-Appliance with Web-UI and Online-Installer now supports NexentaCore and OpenIndiana (free successor of OpenSolaris) up from Version 0.323. With its online Installer, you will have your ZFS-Server running with all services and tools within minutes.

    Features
    NAS Fileserver with AFP (incl. Time Maschine and Zero Config), SMB with ACLs, AD-Support and User/ Groups
    SAN Server with iSCSI (Comstar) and NFS forr XEN or Vmware esxi
    Web-Server, FTP
    Database-Server
    Backup-Server
    newest ZFS-Features (highest security with parity and Copy On Write, Deduplication, Raid-Z3, unlimited Snapshots via Windows previous Version, working ACLs, Online Pooltest with Datarefresh, Hybridpools, expandable Datapools=simply add Controller or Disks,............)

    included Tools:
    bonnie Pool-Performancetest
    iperf Net-Performancetest
    midnight commander
    ndmpcopy Backup
    rsync
    smartmontools
    socat
    unzip

    Management:
    remote via Web-UI and Browser

    Howto with NexentaCore:
    1. insert NexentaCore CD and install
    2. login as root and enter:

    wget -O - www.napp-it.org/nappit | perl

    During First-Installation you have to enter a mySQL Passwort angeben and select Apache with space-key

    Howto with OpenIndiana (free successor of OpenSolaris):
    1. Insert OpenIndiana CD and install
    2. login as admin, open a terminal and enter su to get root permissions and enter:

    wget -O - www.napp-it.org/nappit | perl

    AFP-Server is currently installed only on Nexenta.

    thats all, no step 3!
    You can now remotely manage this Mac/PC NAS appliance via Browser

    Details
    www.napp-it.org

    running Installation
    www.napp-it.org/pop_en.html
  • Mattbreitbach - Wednesday, October 6, 2010 - link

    Very neat - I am installing OpenIndiana on our hardware right now and will test out the Napp-it application.

Log in

Don't have an account? Sign up now