CloudFounders: No More RAID

CloudFounders, a startup of former Terremark and SUN execs, also leverages a flash cache, but the building blocks are very different. Just like Nutanix, there is a VSA (Virtual Storage Appliance) that tries to make the best out of a local flash cache. The cool thing is that the backend, the second tier of storage, is not a traditional RAID based volume. The backend is either an object storage initiator that links to Amazon S3 or a storage device based upon erasure encoding called Distribute Storage System (DSS). Let's start with the DSS backend.

DSS is an object oriented storage system that uses “Bitspread”, an advanced and flexible erasure encoding system developed by the people of Amplidata. Amplidata is a startup with a mix of Belgian and US based infrastructure experts. Some of the directors are working for both CloudFounders and Amplidata. But there is a solid technical reason why CloudFounders chose to go with the Amplidata storage system. Bitspread is meant to be the “big storage alternative” to RAID.

As you probably know or have experienced hands-on, the current RAID implementations—RAID 5 and RAID 6—have reached their limitations now that we have terabyte disks. A few terabyte disks in RAID 5 can take days to rebuild. The result is that the RAID array performance and reliability is heavily degraded. RAID 6 is more reliable (although hardly 100%) but is not exactly a good performer for writes, which is another reason why VDI does not work well on a low end or midrange SAN.

“Bitspread” erasure encoding, also called Forward Error Correction Code (FEC), encodes data in “check-blocks”. The beauty is that you can configure the durability policy. In other words you can choose over how many disks these check-blocks should be spread and how many check-blocks you can lose before it becomes a problem. For example you could ask it to spread the datablock over 18 drives and tell the codec to make sure you can recover the original datablock from 12 check-blocks. So it's only if you lose more than 6 drives at once that you lose your data. As the codec requires only 12 of the check blocks to rebuild the original data object, a failure of two drives does not mean the rebuild should happen urgently. The rebuild can be done in the background at a very slow pace while the reliability stays high. You can also have the check-blocks spread over several storage modules, ensuring that you even survive a failure of a complete disk enclosure.

Bitspread: original data (yellow) is split up, encoded with high redundancy (green) and then spread over many disks and enclosures.

For those who are not convinced that the small startup Amplidata is onto something: Intel and Dr. Sam Siewert of Trellis Logic explain in this paper why it can even be mathematically proven that the Reed-Solomon based erasure codes of RAID 6 are a dead end road for large storage systems. The paper concludes:

"Amplida's Bitspread is an efficient, scalable and practical alternative to the stop-gap of combined RAID levels like 6+1."

And that is exactly the reason why CloudFounders chose to build their storage system on the Amplidata backend.

The DSS based on “Bitspread” works with objects and is thus not a block device. A volume driver must be installed that converts the DSS into a block device. This way the hypervisor can connect to an iSCSI target that is running on top of the volume driver, as an iSCSI target requires a block device and does not recognize the format of the DSS.

Bitspread is a lot more CPU intensive and needs more storage room than traditional RAID algorithms. To reduce the CPU impact, Amplidata leverages the SSE 4.2 capabilities of the latest Xeons. As Bitspread copes so well with disk failures, it is natural to use relatively slow SATA disks, which negates the capacity disadvantage compared to RAID 6. Decent media transfer can still be achieved as the DSS typically spreads the check-blocks over many disks.

Nutanix: No More SAN CloundFounders: Cloud Storage Router
Comments Locked

60 Comments

View All Comments

  • Jammrock - Monday, August 5, 2013 - link

    Great write up, Johan.

    The Fusion-IO ioDrive Octal was designed for the NSA. These babies are probably why they could spy on the entire Internet without ever running low on storage IO. Unsurprisingly that bit about the Octal being designed for the US government is no longer on their site :)
  • Seemone - Monday, August 5, 2013 - link

    I find the lack of ZFS disturbing.
  • Guspaz - Monday, August 5, 2013 - link

    Yeah, you could probably get pretty far throwing a bunch of drives into a well configured ZFS box (striped raidz2/3? Mirrored stripes? Balance performance versus redundancy and take your pick) and throwing some enterprise SSDs in front of the array as SLOG and/or L2ARC drives.

    In fact, if you don't want to completely DIY, as many enterprises don't, there are companies selling enterprise solutions doing exactly this. Nexenta, for example (who also happen to be one of the lead developers behind modern opensource ZFS), sell enterprise software solutions for this. There are other companies that sell hardware solutions based on this and other software.
  • blak0137 - Monday, August 5, 2013 - link

    Another option for this would be to go directly to Oracle with their ZFS Storage Appliances. This gives companies the very valuable benefit of having hardware and software support from the same entity. They also tend to undercut the entrenched storage vendors on price as well.
  • davegraham - Tuesday, August 6, 2013 - link

    *cough* it may be undercut on the front end but maintenance is a typical Oracle "grab you by the chestnuts" type thing.
  • Frallan - Wednesday, August 7, 2013 - link

    More like "grab you by the chestnuts - pull until they rips loose and shove em up where they don't belong" - type of thing...
  • davegraham - Wednesday, August 7, 2013 - link

    I was being nice. ;)
  • equals42 - Saturday, August 17, 2013 - link

    And perhaps lock you into Larry's platform so he can extract his tribute for Oracle software? I think I've paid for a week of vacation on Ellison's Hawaiian island.

    Everybody gets their money to appease shareholders somehow. Either maintenance, software, hardware or whatever.
  • Brutalizer - Monday, August 5, 2013 - link

    Discs have grown bigger, but not faster. Also, they are not safer nor more resilient to data corruption. Large amounts of data will have data corruption. The more data, the more corruption. NetApp has some studies on this. You need new solutions that are designed from the ground up to combat data corruption. Research papers shows that ntfs, ext, etc and hardware raid are vulnerable to data corruption. Research papers also show that ZFS do protect against data corruption. You find all papers on wikipedia article on zfs, including papers from NetApp.
  • Guspaz - Monday, August 5, 2013 - link

    It's worth pointing out, though, that enterprise use of ZFS should always use ECC RAM and disk controllers that properly report when data has actually been written to the disk. For home use, neither are really required.

Log in

Don't have an account? Sign up now