Ecosystem Status: Users and Use Cases

The software changes required, both in firmware and OS support/software, will keep zoned SSDs in the datacenter for the foreseeable future. Most of the early interest and adoption will be with the largest cloud computing companies that have the resources to optimize their software stacks top to bottom for zoned storage. But a lot of software work has already been done: software targeting host-managed SMR hard drives or open-channel SSDs can be easily extended to also support zoned SSDs.  This includes both applications and filesystem drivers that have been modified to work on devices that do not support in-place modification of data.

Linux Kernel version 5.9 will update the NVMe driver with ZNS support, which plugs in to the existing zoned block device framework. Multiple Linux filesystems either already support running directly on zoned devices, or such support has been developed but not yet merged into a stable kernel release. The device mapper framework already includes a component to emulate a regular block device on top of a zoned device like a ZNS SSD, so unmodified filesystems and applications can be used. Western Digital has released a userspace library to help applications interact directly with zoned devices without using one of the kernel's filesystems on the device.

Only a few applications have publicly released support for ZNS SSDs. The Ceph clustered storage system has a backend that supports zoned storage, including ZNS SSDs. Western Digital has developed a zoned storage backend for the RocksDB key-value database (itself used by Ceph), but the patches are still a work in progress. Samsung has released a cross-platform library for accessing NVMe devices, with support for ZNS SSDs. They've written their own RocksDB backend using this library. As with host-managed SMR hard drives, most production use of ZNS (at least early on) will be behind the scenes in large datacenters. Because ZNS gives the host system a great degree of control over data placement on the SSD, it allows for good isolation of competing tasks. This makes it easier to ensure good storage performance QoS in multi-tenant cloud environments, but the relative lack of zone-aware software means there isn't much demand for such a hosting environment yet.

The most enthusiastic and prolific supporter of ZNS and zoned storage in general has been Western Digital, which stands to benefit from the overlap between ZNS and SMR hard drives. But it is very much a multi-vendor effort. The ZNS standard lists authors from all the other major NAND flash manufacturers (Samsung, Intel, Micron, SK Hynix, Kioxia), controller vendors (Microchip), cloud computing hyperscalers (Microsoft, Alibaba), and other familiar names like Seagate, Oracle and NetApp.

Longtime zoned SSD provider Radian Memory recently published a case study conducted with IBM Research. They ported an existing software-based log-structured storage system to run on Radian's non-standard zoned SSDs, and measured significant improvements to throughput, QoS and write amplification compared to running on a block storage SSD.

Most SSD vendors have not yet announced production models supporting ZNS (Radian Memory being the exception), so it's hard to tell what market segments, capacities and form factors will be most common among ZNS SSDs. The most compelling opportunity is probably for ZNS-only QLC based drives with reduced DRAM and overprovisioning, but the earliest models to market will probably be more conventional hardware configurations with updated firmware supporting ZNS.

Overall, ZNS is one of the next steps in mirroring the use of SSDs in the way SSDs are actually designed, rather than an add-on to hard drive methodology. It is a promising new feature. It looks likely to see more widespread adoption than previous efforts like open-channel SSDs, and the cost and capacity advantages should be more significant than what SMR hard drives have offered relative to CMR hard drives.

Comparison With Other Storage Paradigms
Comments Locked

45 Comments

View All Comments

  • Carmen00 - Friday, August 7, 2020 - link

    Fantastic article, both in-depth and accessible, a great primer for what's coming up on the horizon. This is what excellence in tech journalism looks like!
  • Steven Wells - Saturday, August 8, 2020 - link

    Agree with @Carmen00. Super well written. Fingers crossed that one of these “Not a rotating rust emulator” architectures can get airborne. As long as the flash memory chip designers are unconstrained to do great things to reduce cost generation to generation with the SSD maintaining the fixed abstraction I’m all for this.
  • Javier Gonzalez - Friday, August 7, 2020 - link

    Great article Billy. A couple of pointers to other parts of the ecosystem that are being upstreamed at the moment are:

    - QEMU support for ZNS emulation (several patches posted in the mailing list)
    - Extensions to fio: Currently posted and waiting for stabilizing support for append in the kernel
    - nvme-cli: Several patches for ZNS management are already merged

    Also, a comment to xZTL is that it is intended to be used on several LSM-based databases. We ported RocksDB as a first step, but other DBs are being ported on top. xZTL gives the necessary abstractions for the DB backend to be pretty thin - you can see the RocksDB HDFS backend as an example.

    Again, great article!
  • Billy Tallis - Friday, August 7, 2020 - link

    Thanks for the feedback, and for your presentations that were a valuable source for this article!
  • Javier Gonzalez - Friday, August 7, 2020 - link

    Happy to hear that it helped.

    Feel free to reach out if you have questions on a follow-up article :)
  • jabber - Friday, August 7, 2020 - link

    And for all that, will still slow to Kbps and take two hours when copying a 2GB folder full of KB sized microfiles.

    We now need better more efficient file systems not hardware.
  • AntonErtl - Friday, August 7, 2020 - link

    Thank you for this very interesting article.

    It seems to me that ZNS strikes the right abstraction balance:

    It leaves wear leveling to the device, which probably does know more about wear and device characteristics, and the interface makes the job of wear leveling more straightforward than the classic block interface.

    A key-value would cover a significant part of what a file system does, and it seems to me that after all these years, there is still enough going on in this area that we do not want to bake it into drive firmware.
  • Spunjji - Friday, August 7, 2020 - link

    Everything up to the "Supporting Multiple Writers" section seemed pretty universally positive... and then it all got a bit hazy for me. Kinda seems like they introduced a whole new problem, there?

    I guess if this isn't meant to go much further than enterprise hardware then it likely won't be much of an issue, but still, that's a pretty keen limitation.
  • Spunjji - Friday, August 7, 2020 - link

    Great article, by the way. Realised I didn't mention that, but I really appreciate the perspective that's in-depth but not too-in-depth for the average tech-head 😁
  • AntonErtl - Saturday, August 8, 2020 - link

    As long as the zone is not divided between file systems, or direct-access databases, it is natural that writes are are synchronized and sequenced. And talking to the SSD through one NVMe/PCIe interface means that all writes (even to multiple zones) are sent to the drive in sequence.

    OTOH, you have software and hardware with synchronous interfaces (waits for some feedback before sending the next request), and in such a setting doing everything through one thread costs throughput.

    So you can either design everything to work with asynchronous interfaces (e.g., SCSI tagged command queuing), at least at all single-thread levels, or you design synchronous interfaces that work with multiple threads. The "write it somewhere, and then tell where you wrote" approach seems to be along the latter lines. What's the status of asynchronous interfaces for NVMe?

Log in

Don't have an account? Sign up now