Benchmarks

After running our tests on the ZFS system (both under Nexenta and OpenSolaris) and the Promise M610i, we came up with the following results.  All graphs have IOPS on the Y-Axis, and Disk Que Lenght on the X-Axis.

4k Sequential Reads

 

In the 4k Sequential Read test, we see that the OpenSolaris and Nexenta systems both outperform the Promise M610i by a significant margin when the disk queue is increased.  This is a direct effect of the L2ARC cache.  Interestingly enough the OpenSolaris and Nexenta systems seem to trend identically, but the Nexenta system is measurably slower than the OpenSolaris system.  We are unsure as to why this is, as they are running on the same hardware and the build of Nexenta we ran was based on the same build of OpenSolaris that we tested.  We contacted Nexenta about this performance gap, but they did not have any explanation.  One hypothesis that we had is that the Nexenta software is using more memory for things like the web GUI, and maybe there is less ARC available to the Nexenta solution than to a regular OpenSolaris solution.   

 

4k Random Write

 

In the 4k Random Write test, again the OpenSolaris and Nexenta systems come out ahead of the Promise M610i.  The Promise box seems to be nearly flat, an indicator that it is reaching the limits of its hardware quite quickly.  The OpenSolaris and Nexenta systems write faster as the disk queue increases.  This seems to indicate a better re-ordering of data to make the writes more sequential the disks.

  

4k Random 67% Write 33% Read

 

The 4k 67% Write 33% Read test again gives the edge to the OpenSolaris and Nexenta systems, while the Promise M610i is nearly flat lined.  This is most likely a result of both re-ordering writes and the very effective L2ARC caching.

  

4k Random Reads

 

4k Random Reads again come out in favor of the OpenSolaris and Nexenta systems.  While the Promise M610i does increase its performance as the disk queue increases, it's nowhere near the levels of performance that the OpenSolaris and Nexenta systems can deliver with their L2ARC caching.

  

8k Random Read

 

8k Random Reads indicate a similar trend to the 4k Random Reads with the OpenSolaris and Nexenta systems outperforming the Promise M610i.  Again, we see the OpenSolaris and Nexenta systems trending very similarly but with the OpenSolaris system significantly outperforming the Nexenta system.

  

8k Sequential Read

 

 8k Sequential reads have the OpenSolaris and Nexenta systems trailing at the first data point, and then running away from the Promise M610i at higher disk queues.  It's interesting to note that the Nexenta system outperforms the OpenSolaris system at several of the data points in this test.

   

8k Random Write

 

  8k Random writes play out like most of the other tests we've seen with the OpenSolaris and Nexenta systems taking top honors, with the Promise M610i trailing.  Again, OpenSolaris beats out Nexenta on the same hardware.

  

8k Random 67% Write 33% Read

 

8k Random 67% Write 33% Read again favors the OpenSolaris and Nexenta systems, with the Promise M610i trailing.  While the OpenSolaris and Nexenta systems start off nearly identical for the first 5 data points, at a disk queue of 24 or higher the OpenSolaris system steals the show.

  

16k Random 67% Write 33% Read

 

 16k Random 67% Write 33% read gives us a show that we're familiar with.  OpenSolaris and Nexenta both soundly beat the Promise M610i at higher disk ques.  Again we see the pattern of the OpenSolaris and Nexenta systems trending nearly identically, but the OpenSolaris system outperforming the Nexenta system at all data points.

  

16k Random Write

 

 16k Random write shows the Promise M610i starting off faster than the Nexenta system and nearly on par with the OpenSolaris system, but quickly flattening out.  The Nexenta box again trends higher, but cannot keep up with the OpenSolaris system.

  

16k Sequential Read

 

 The 16k Sequential read test is the first test that we see where the Promise M610i system outperforms OpenSolaris and Nexenta at all data points.  The OpenSolaris system and the Nexenta system both trend upwards at the same rate, but cannot catch the M610i system.

  

16k Random Read

 

The 16k Random Read test goes back to the same pattern that we've been seeing, with the OpenSolaris and Nexenta systems running away from the Promise M610i.  Again we see the OpenSolaris system take top honors with the Nexenta system trending similarly, but never reaching the performance metrics seen on the OpenSolaris system.

  

32k Random 67% Write 33% Read

 

 32k Random 67% Write 33% read has the OpenSolaris system on top, with the Promise M610i in second place, and the Nexenta system trailing everything.  We're not really sure what to make of this, as we expected the Nexenta system to follow similar patterns to what we had seen before.

  

32k Random Read

 

 32k Random Read has the OpenSolaris system running away from everything else.  On this test the Nexenta system and the Promise M610i are very similar, with the Nexentaq system edging out the Promise M610i at the highest queue depths.

  

32k Sequential Read

 

 32k Sequential Reads proved to be a strong point for the Promise M610i.  It outperformed the OpenSolaris and Nexenta systems at all data points.  Clearly there is something in the Promise M610i that helps it excel at 32k Sequential Reads.  

 

32k Random Write

 

  

32k random writes have the OpenSolaris system on top again, with the Promise M610i in second place, and the Nexenta system trailing far behind.  All of the graphs trend similarly, with little dips and rises, but not ever moving much from the initial reading. 

 After all the tests were done, we had to sit down and take a hard look at the results and try to formulate some ideas about how to interpret this data.  We will discuss this in our conclusion.

Test Blade Configuration Demise of OpenSolaris
Comments Locked

102 Comments

View All Comments

  • cdillon - Tuesday, October 5, 2010 - link

    I've been working on getting the additional parts necessary to build a similar system out of a slightly used HP DL380 G5 with a bunch of 15K SAS drives and an MSA20 shelf full of 750GB SATA drives. Here's what I'm going to be doing a little differently from what you've done:

    1) More CPU (already there, it has dual Xeon X5355 if I recall correctly)

    2) Two mirrored OCZ Vertex2 EX 50GB drives for the SLOG device (the ZIL write cache). Even though the Vertex2 claims a highly impressive 50,000 random-write IOPS, the ZIL is written sequentially, and the Vertex2 EX claims to sustain 250MB/sec writes, so it should make a very good SLOG device.

    3) Two OCZ Vertex2 100G (the cheaper MLC models) for L2ARC.

    4) The SSDs will be put on a separate SAS HBA card from the HDDs to prevent I/O starvation due to the HBA I/O queue filling up because of the relatively slow I/O service-times of the HDDs.

    5) Quad Gigabit Ethernet or 10G Ethernet link. The latter will require an upgrade to our datacenter switches, which is probably going to happen soon anyway.
  • mbreitba - Tuesday, October 5, 2010 - link

    I would love to see performance results for your setup. The IOMeter ICF file that we have linked to in the article would help you run the exact same tests as we ran if you would be interested in running them.
  • cdillon - Tuesday, October 5, 2010 - link

    I forgot to mention it might also be running FreeBSD (which I'm very familiar with) rather than Nexenta or OpenSolaris, but I'm just kind of playing it by ear. I may try all three. The goal is for it to eventually become a production storage server, but I'm going to do a bit of experimentation first. I still haven't gotten around to ordering the SSDs and the extra SAS HBAs, so it'll be a while before I have any benchmarks for you.
  • Maveric007 - Tuesday, October 5, 2010 - link

    You should throw Linux into the mix. You find your performance will increase over the other selections ;)
  • MGSsancho - Tuesday, October 5, 2010 - link

    ZFS on linux is terrible. also ZFS on FreeBSD is decent. recent ZFS features such as deduplication and iSCSI are not available on FreeBSD. just grab a copy of the latest build of opensolaris (134), compile it from build 157. use solaris 10 (got to pay now), or use one of the mentioned Nexenta distros.

    From personal experience, use fast SSD drives. I made the mistake of using a pair of the Intel 40GB Value drives for a home box with 8 x 1.5 TB drives. terrible performance. Yes it is cool for latency but I cant get more than 40MBs from it. I have tried using them just for ZIL or just for L2ARC and performance is abyssal. Get the fastest possible drives you can afford.

    Matt, have you tested with using for example realtek nics (dont, pain in the ass), intel desktop nics (stable) or the more fancy server grade nics that have reported iSCSI offload? also have you tried using dedup/compresion for increased performance/space savings? this will use up lots of memory for indexies but if your cpus are fast enough along with network, less IO hits the discs. I hear it has worked assuming you have the memory, CPU, network. One last bit, try using the Sun 40GBs infiniband cards? I know they will work with solaris 10 and opensolaris and thus I would assume nexenta. might want to check the hardware compatibility list for your IB card.

    Cheers
  • Mattbreitbach - Tuesday, October 5, 2010 - link

    We have not tested with any other NIC's other than the Intel GB nics onboard the blade. We considered using an iSCSI offload NIC for the ZFS system, but given the cost of such cards we could not justify using them.

    As for Deduplication - we have recently tested using deduplication on Nexenta and the results were abysmal. Most tests were reading above 90% CPU utilization while delivering far lower IOPS. I believe that deduplication could help performance, but only if you have an insane amount of CPU available. With the checksumming and deduplication running our 5504 was simply not able to keep up. By increasing the core count, adding a second processor, and increasing the clock speed, it may be able to keep up, but after you spend that much additional capital on CPU's and better motherboards, you could increase your spindle count, switch to SAS drives, or simply add another storage unit for marginally more money.
  • MGSsancho - Tuesday, October 5, 2010 - link

    from my personal experience i could not agree more for the deduplication. 33% on each core on my phenom 2 for a home setup is insane. Some things like exchange server, it is best to let the application decide what is should be cached but duplication realy make sense for a tier three storage or nightly backup or maybe for a small dev box. Also the drives them selves mater, you want to use the ones that are geared for raid setups. it allows the system to better communicate with it. I wont name a particular vendor but the current 'green' 5400 rpm 2TB drives are terrible for zfs http://pastebin.com/aS9Zbfeg (not my setup) that is a nightly backup array used at a webhosting facility. sure they have great throughput but all those errors after a few hours.
  • andersenep - Tuesday, October 5, 2010 - link

    I use WD green drives in my home OpenSolaris NAS. I have 2 raidz vdevs of 4 drives each (initially I used mirrors, but wanted more space). I can serve 720p content to two laptops and my Xstreamer simultaneously without a hiccup...I guess it depends on your needs, but for a home media server, I have absolutely no complaints with the 'green' drives. Weekly scrubs for 1 yr plus with no issues. I did have to replace a scorpio on my mirrored rpool after 6 months. I am quite happy with my setup.
  • solori - Wednesday, October 20, 2010 - link

    As a Nexenta partner, we see these issues all the time. Deduplication is not an apples-apples feature. The system build-out and deduplication set (affecting DDT size) are both unique factors.

    With ZFS' deduplication, RAM/ARC and L2ARC become critical components for performance. Deduplication tables that spill to disk (will not fit into memory) will cause serious performance issues. Likewise, the deduplication hash function and verify options will impact perfomance.

    For each application, doing the math on spindle count (power, cost, space, etc.) versus effective deduplication is always best. Note that deduplication does not need to be enabled pool-wide, and that - like in compression where it is wasteful to compress pre-compressed data - data with low deduplication rates should not be allowed to dominate a deduplication-enabled pool/folder.

    Deduplication of 15K, primary storage seems contradictory, but that type of storage has the highest $/TB factor and spindle count for any given capacity target. By allocating deduplication to targets folders/zvol, performance and capacity can be optimized for most use cases. Obviously, data sets that are write-heavy and sensitive to storage latency are not good candidates for deduplication or inline compression.

    If you do the math, the cost of SSD augmentation of 7200 RPM SAS pools is very competitive against similar capacity 15K pools. The benefits to SSD augmentation (i.e. L2ARC and ZIL->SLOG where synchronous writes dominate performance profiles) is in higher IOP potential for random IO workloads (where the 7200 disks suffer most). In fact, contrasting 600GB SAS 15K to 2TB SAS 7200, you approach an economic factor where 7200 RPM disks favor mirror groups over 15K raidz groups - again, given the same capacity goals.

    The real beauty of ZFS storage - whether it be Opensolaris/Illumos or Nexenta/Stor/Core - is that mixing 15K and 7200 RPM pools within the same system is very easy/effective to do. With the proper SAS controllers and JBOD/RBOD combinations, you can limit 15K applications to a small working set and commit bulk resources to augmented 7200 RPM spindles in robust raidz2 groups (i.e. watch your MTTDL versus raidz).

    It is important to note that ZFS was not designed with the "home user" in mind. It can be very memory and CPU/thread hungry and easily out-strip a typical hobbyist's setup. A proper enterprise setup will include 2P quad core and RAM stores suited to the target workload. Since ZFS was designed for robust threading, the more "hardware" threads it has at its disposal, the more efficient it is. While snapshots are "free" in ZFS (i.e. copy-on-write nature of ZFS means writes are the same with or without snapshots) but data integrity (checksums) and compression/deduplication are not.
  • Mattbreitbach - Wednesday, October 20, 2010 - link

    Excellent comments! Thank you for your input.

    As you noted, we found deduplication to be beyond the reaches of our system. With proper tuning and component selection, I think it could be used very well (and have talked to several people who have had very good experiences with it). For the average home user it's probably beyond the scope of what they would want to use for their storage.

Log in

Don't have an account? Sign up now