Benchmarks

After running our tests on the ZFS system (both under Nexenta and OpenSolaris) and the Promise M610i, we came up with the following results.  All graphs have IOPS on the Y-Axis, and Disk Que Lenght on the X-Axis.

4k Sequential Reads

 

In the 4k Sequential Read test, we see that the OpenSolaris and Nexenta systems both outperform the Promise M610i by a significant margin when the disk queue is increased.  This is a direct effect of the L2ARC cache.  Interestingly enough the OpenSolaris and Nexenta systems seem to trend identically, but the Nexenta system is measurably slower than the OpenSolaris system.  We are unsure as to why this is, as they are running on the same hardware and the build of Nexenta we ran was based on the same build of OpenSolaris that we tested.  We contacted Nexenta about this performance gap, but they did not have any explanation.  One hypothesis that we had is that the Nexenta software is using more memory for things like the web GUI, and maybe there is less ARC available to the Nexenta solution than to a regular OpenSolaris solution.   

 

4k Random Write

 

In the 4k Random Write test, again the OpenSolaris and Nexenta systems come out ahead of the Promise M610i.  The Promise box seems to be nearly flat, an indicator that it is reaching the limits of its hardware quite quickly.  The OpenSolaris and Nexenta systems write faster as the disk queue increases.  This seems to indicate a better re-ordering of data to make the writes more sequential the disks.

  

4k Random 67% Write 33% Read

 

The 4k 67% Write 33% Read test again gives the edge to the OpenSolaris and Nexenta systems, while the Promise M610i is nearly flat lined.  This is most likely a result of both re-ordering writes and the very effective L2ARC caching.

  

4k Random Reads

 

4k Random Reads again come out in favor of the OpenSolaris and Nexenta systems.  While the Promise M610i does increase its performance as the disk queue increases, it's nowhere near the levels of performance that the OpenSolaris and Nexenta systems can deliver with their L2ARC caching.

  

8k Random Read

 

8k Random Reads indicate a similar trend to the 4k Random Reads with the OpenSolaris and Nexenta systems outperforming the Promise M610i.  Again, we see the OpenSolaris and Nexenta systems trending very similarly but with the OpenSolaris system significantly outperforming the Nexenta system.

  

8k Sequential Read

 

 8k Sequential reads have the OpenSolaris and Nexenta systems trailing at the first data point, and then running away from the Promise M610i at higher disk queues.  It's interesting to note that the Nexenta system outperforms the OpenSolaris system at several of the data points in this test.

   

8k Random Write

 

  8k Random writes play out like most of the other tests we've seen with the OpenSolaris and Nexenta systems taking top honors, with the Promise M610i trailing.  Again, OpenSolaris beats out Nexenta on the same hardware.

  

8k Random 67% Write 33% Read

 

8k Random 67% Write 33% Read again favors the OpenSolaris and Nexenta systems, with the Promise M610i trailing.  While the OpenSolaris and Nexenta systems start off nearly identical for the first 5 data points, at a disk queue of 24 or higher the OpenSolaris system steals the show.

  

16k Random 67% Write 33% Read

 

 16k Random 67% Write 33% read gives us a show that we're familiar with.  OpenSolaris and Nexenta both soundly beat the Promise M610i at higher disk ques.  Again we see the pattern of the OpenSolaris and Nexenta systems trending nearly identically, but the OpenSolaris system outperforming the Nexenta system at all data points.

  

16k Random Write

 

 16k Random write shows the Promise M610i starting off faster than the Nexenta system and nearly on par with the OpenSolaris system, but quickly flattening out.  The Nexenta box again trends higher, but cannot keep up with the OpenSolaris system.

  

16k Sequential Read

 

 The 16k Sequential read test is the first test that we see where the Promise M610i system outperforms OpenSolaris and Nexenta at all data points.  The OpenSolaris system and the Nexenta system both trend upwards at the same rate, but cannot catch the M610i system.

  

16k Random Read

 

The 16k Random Read test goes back to the same pattern that we've been seeing, with the OpenSolaris and Nexenta systems running away from the Promise M610i.  Again we see the OpenSolaris system take top honors with the Nexenta system trending similarly, but never reaching the performance metrics seen on the OpenSolaris system.

  

32k Random 67% Write 33% Read

 

 32k Random 67% Write 33% read has the OpenSolaris system on top, with the Promise M610i in second place, and the Nexenta system trailing everything.  We're not really sure what to make of this, as we expected the Nexenta system to follow similar patterns to what we had seen before.

  

32k Random Read

 

 32k Random Read has the OpenSolaris system running away from everything else.  On this test the Nexenta system and the Promise M610i are very similar, with the Nexentaq system edging out the Promise M610i at the highest queue depths.

  

32k Sequential Read

 

 32k Sequential Reads proved to be a strong point for the Promise M610i.  It outperformed the OpenSolaris and Nexenta systems at all data points.  Clearly there is something in the Promise M610i that helps it excel at 32k Sequential Reads.  

 

32k Random Write

 

  

32k random writes have the OpenSolaris system on top again, with the Promise M610i in second place, and the Nexenta system trailing far behind.  All of the graphs trend similarly, with little dips and rises, but not ever moving much from the initial reading. 

 After all the tests were done, we had to sit down and take a hard look at the results and try to formulate some ideas about how to interpret this data.  We will discuss this in our conclusion.

Test Blade Configuration Demise of OpenSolaris
Comments Locked

102 Comments

View All Comments

  • Exelius - Wednesday, October 6, 2010 - link

    I think you identified the strong issue between SATA and SAS drives, but there's no real reason you can't do both: in fact, this is common practice. I don't know what the distribution for AT is so I may be wrong, but often a relatively small amount of your data is accountable for a large portion of your random writes. Why not store that data permanently on the SSDs?

    For everything else, the cost per gb difference between SATA and SAS is too much to ignore. Once you start talking about adding SAS drives to this, you're moving out of the same class as the Promise device. I've used the Promise vTrak M series (and actually, the M610i specifically) and it's about the cheapest iSCSI SAN device you can get while still being a "real" iSCSI device. It's also about at least a 5 year old product and is growing long in the tooth; I don't know that it's appropriate to compare it with a brand new, performance tuned monster.

    But once you introduce SAS into the equation, the chassis itself becomes a much smaller percentage of cost. You go from $140 a drive to close to $400. You also start competing with EqualLogic, HP, etc. and given the need you expressed to add more RAM and CPU, there's definitely some stiff competition from higher-end, more modern products than the M610i.

    I guess at the end of the day, while the performance numbers are impressive compared to the M610i, I don't know that the M610i is the device I would use if I was interested in performance. The Promise M610i's strength is price and capacity. Given that the M610i is INFINITELY easier to set up and maintain, that has to factor in to the cost as well. The M610i is often used as a staging target for disk-disk-tape backups; it actually has some throughput issues in a number of scenarios so it's not appropriate for all situations. It just depends on where your needs and bottlenecks are.

    I'd rather have seen a comparison with a device such as an EqualLogic or StorageWorks array; because once you upgrade the ZFS box, add labor and support costs into the equation, they do become more appealing in the $10k range (and the fact that you can rather easily add more spindles to an existing array.)
  • Mattbreitbach - Wednesday, October 6, 2010 - link

    You make some strong points.

    1 - our storage system is not used at Anandtech in any way - I am involved in an entirely separate entity who's only affiliation with Anandtech is that we've written an article reviewing our hardware in our environment. As such, I have no idea what Anandtech's storage needs look like. In our environment we use fixed size VHD's for our VM storage currently. As such there is no real way to put small writes on SSD's and static content on slower spindles. We need to maintain performance across the entire data set.

    2 - The Vtrak M610i is about 3 years old from what I can gather from their press releases. We purchased our first Vtrak M610i at about that time. http://www.promise.com/news_room/news.aspx?m=615&a...
    While it may be getting a bit older, it is still available for purchase, and is still a relatively inexpensive way to build a high-capacity SAN device. The reason that it was compared in this article is because that is what we are currently using and replacing. While the controller and chassis is different from our ZFS monster, the drives in the chassis are identical, and the price points are very similar.

    3 - We would have loved to compare it to a current generation Equalogic unit, but we did not have one on hand to test. If we ever happen to get one we will definately run the numbers against it.

    4 - The Promise system has a lot going for it in the ease of setup and use department, and I am currently working on an article that goes in depth on that. Promise also has several new products available that lower the price point (VessRAID) and expand the options that you have available. I hope to get one of those units to test and possibly deploy in the near future also. They also have an enterprise-grade head end (Vtrak S3000) that looks promising also.

    Overall, this article was mainly about the ZFS system, what is possible, and how it performed against our current infrastructure. I am hopeful that we can expand what we have on hand to test with and provide broader comparisons in the future, but there is only so far a budget will stretch for getting hardware to simply test.
  • Exelius - Thursday, October 7, 2010 - link

    I know it's at least 4 years old -- I purchased one at least that long ago. But point taken; I haven't kept up with Promise beyond the vTrak M after getting a budget to higher-end units (I still used the vTrak Ms for cheap storage.)

    And if your data set is large enough to require this many spindles, you might benefit from optimizing it a bit on the front-end... for example, build your VMs to split the VHDs so the high-write data is stored elsewhere. No idea if this would be of benefit for your environment (that's what test labs are for) but it's a strategy most shops with high-volume, high-transaction datasets have to periodically look at as the performance gulf between big, cheap drives and small, fast ones keeps increasing.

    Given the size of your environment, EqualLogic or StorageWorks would probably be willing to let you use a demo unit for a little while. Don't know that they wouldn't make you sign an NDA regarding the benchmarks, but you'd at least be able to do it internally... Plus, IMO, there's a massive benefit to having an enterprise support contract when you have a controller failure (which I'm actually surprised hasn't been an issue with the single controller design of the Promise M610)

    All told; still a good article -- you generally don't see stuff this thorough posted on the Internet. There are just so many possibilities in this space that it's hard not to nitpick. :)
  • JonBendtsen - Thursday, October 7, 2010 - link

    I think it could be interesting to see performance benchmarks without the L2ARC to see how much value it really has.
  • binarycrusader - Thursday, October 7, 2010 - link

    Management of the drive LEDs for faulty drives, etc. is available with the right hardware; it's unfortunate that's it's not well supported on a wide variety of systems, but it does exist.

    As for SMTP notification (and other kinds) of faulty hardware, etc. that should be available depending on the build of OpenSolaris you're using and whether fault management aware drivers are available for your hardware. See 'man fmadm', 'man fmd' and 'man smtp-notify' for more information.

    Ultimately, users looking for a polished storage system with graphical management tools, etc. are encouraged to look at Oracle's Sun Open Storage servers which address many of the complaints listed in the article. Yes, I'm aware you're trying to build your own systems here, but it should be obvious why all of the nice tools aren't given away for free.
  • pburdine - Friday, October 8, 2010 - link

    I haven't installed OpenSolaris yet, but when I am using Solaris 10 with ZFS, it does come with a website manager to manage may of the Sun/Oracle Applications. Did you try https://localhost:6789?
  • murdmath - Friday, October 8, 2010 - link

    Great Article. Very informative. I am excited for you review of the Promise M610i SAN.

    Mat
  • Brutalizer - Monday, October 11, 2010 - link

    First of all, there is only ONE single reason to use ZFS: it protects your data whereas other storage solutions might corrupt your data (including enterprise storage solutions)!

    See here how common file systems such as Ext3, JFS, XFS, ReiserFS, NTFS, etc might corrupt your data:
    http://www.zdnet.com/blog/storage/how-microsoft-pu...

    All the rest of the ZFS features such as snapshots, easy administration, etc are just icing on the cake. If ZFS had only protection of data and no other features, I would still use ZFS..

    See here how Raid-5 does not protect your data. In fact, Raid-6 is not better and also may corrupt your precious data. Google "data corruption raid-6"
    http://www.baarf.com/

    See here how ZFS does protect your data:
    http://www.zdnet.com/blog/storage/zfs-data-integri...
    http://queue.acm.org/detail.cfm?id=1317400
    There is a reason ZFS eats CPU (does checksumming and protects your data), whereas all the other filesystems does not protect your data (rudimentary checksumming).

    ZFS has end-to-end checksumming! That means, ZFS will compare the data in RAM with the data on disk - are they equal? All other storage solutions does not do that - they only check data within a realm. But when data passes a realm it may corrupt (RAM down to disk controller down to disk). There may be bugs in hardware or software within a realm. And data are never compared "the data XYZ in RAM, is it still XYZ on disk?" - this check are never ever made (unless you use ZFS).

    Regarding dedup. If you get slow performance of dedup, it is only because dedup requires huge amounts of RAM. You need something like 2GB RAM for each TB disk. If you have less RAM, dedup will be sloooow. If you have much RAM, dedup will be fast.

    Another advantage of ZFS (there are many) is that ZFS is OS agnostic! You can insert your zfs raid into another OS or computer without any problems! Try that with a hardware raid - impossible.

    Another advantage of ZFS is there are no "fsck"! Instead you do "zfs scrub" every week, while your raid is alive and running. fsck requires you to shutdown the raid to validate it.

    Hardware raid is just a cpu with some software running on it. It is better to move that software to the CPU where you have many cores and GB of RAM and you can easily patch it. In the future, hardware raid will die. Software raid like ZFS will rule.

    Regarding BTRFS, if you read the mail lists, you see that people loose data all the time with BTRFS. In the future it might be good, but it will take at least another 5 years until we reach that stage. Then ZFS have developed even further.
  • solori - Friday, October 22, 2010 - link

    Regarding ZFS and ubiquity: ZFS is only version compatible. As ZFS' capabilities are updated, the blanket statement that "any ZFS-speaking OS can mount a ZFS volume" just isn't going to ring true. In fact, many distributions porting ZFS are still behind in ZFS version.

    As in most "backward compatible" entities, newer versions of ZFS will almost always be compatible with older versions, but the older version will not be able to mount a more recent version. Therefore, you could have a Mac port that can't read a BSD port for instance.

    Also, since ZFS is modular, one OS vendor could included a "highly proprietary" inline encryption or compression algorithm that is not (or not strictly) open in nature. This leads to subsequent OS-based divergence if they fail to include the necessary libraries that are not a part of ZFS itself.

    However, and for the most part, ZFS should be regarded as version compatible regardless of the OS. Another great reason to use JBOD or discrete disk setups: complete portability of storage pools.
  • Hrel - Monday, October 11, 2010 - link

    Why is ZFS not the only file system in use today? I completely forgot about this until this article. I remember first reading about it and thinking "this'll probably be in everything in a couple years" so I put it out of my mind. I am upset this is not the file system everything uses.

Log in

Don't have an account? Sign up now