Ecosystem Status: Users and Use Cases

The software changes required, both in firmware and OS support/software, will keep zoned SSDs in the datacenter for the foreseeable future. Most of the early interest and adoption will be with the largest cloud computing companies that have the resources to optimize their software stacks top to bottom for zoned storage. But a lot of software work has already been done: software targeting host-managed SMR hard drives or open-channel SSDs can be easily extended to also support zoned SSDs.  This includes both applications and filesystem drivers that have been modified to work on devices that do not support in-place modification of data.

Linux Kernel version 5.9 will update the NVMe driver with ZNS support, which plugs in to the existing zoned block device framework. Multiple Linux filesystems either already support running directly on zoned devices, or such support has been developed but not yet merged into a stable kernel release. The device mapper framework already includes a component to emulate a regular block device on top of a zoned device like a ZNS SSD, so unmodified filesystems and applications can be used. Western Digital has released a userspace library to help applications interact directly with zoned devices without using one of the kernel's filesystems on the device.

Only a few applications have publicly released support for ZNS SSDs. The Ceph clustered storage system has a backend that supports zoned storage, including ZNS SSDs. Western Digital has developed a zoned storage backend for the RocksDB key-value database (itself used by Ceph), but the patches are still a work in progress. Samsung has released a cross-platform library for accessing NVMe devices, with support for ZNS SSDs. They've written their own RocksDB backend using this library. As with host-managed SMR hard drives, most production use of ZNS (at least early on) will be behind the scenes in large datacenters. Because ZNS gives the host system a great degree of control over data placement on the SSD, it allows for good isolation of competing tasks. This makes it easier to ensure good storage performance QoS in multi-tenant cloud environments, but the relative lack of zone-aware software means there isn't much demand for such a hosting environment yet.

The most enthusiastic and prolific supporter of ZNS and zoned storage in general has been Western Digital, which stands to benefit from the overlap between ZNS and SMR hard drives. But it is very much a multi-vendor effort. The ZNS standard lists authors from all the other major NAND flash manufacturers (Samsung, Intel, Micron, SK Hynix, Kioxia), controller vendors (Microchip), cloud computing hyperscalers (Microsoft, Alibaba), and other familiar names like Seagate, Oracle and NetApp.

Longtime zoned SSD provider Radian Memory recently published a case study conducted with IBM Research. They ported an existing software-based log-structured storage system to run on Radian's non-standard zoned SSDs, and measured significant improvements to throughput, QoS and write amplification compared to running on a block storage SSD.

Most SSD vendors have not yet announced production models supporting ZNS (Radian Memory being the exception), so it's hard to tell what market segments, capacities and form factors will be most common among ZNS SSDs. The most compelling opportunity is probably for ZNS-only QLC based drives with reduced DRAM and overprovisioning, but the earliest models to market will probably be more conventional hardware configurations with updated firmware supporting ZNS.

Overall, ZNS is one of the next steps in mirroring the use of SSDs in the way SSDs are actually designed, rather than an add-on to hard drive methodology. It is a promising new feature. It looks likely to see more widespread adoption than previous efforts like open-channel SSDs, and the cost and capacity advantages should be more significant than what SMR hard drives have offered relative to CMR hard drives.

Comparison With Other Storage Paradigms
Comments Locked

45 Comments

View All Comments

  • FreckledTrout - Thursday, August 6, 2020 - link

    Like most things its the cost. I bet the testing alone is prohibitive to back port this into older SSD drives.
  • xenol - Thursday, August 6, 2020 - link

    Bingo. Testing and support costs something. Though I suppose they could release it for older drives under a no-support provision.

    Except depending on who tries this, I'm sure it's inevitable someone will break something and complain that they're not getting support.
  • DigitalFreak - Thursday, August 6, 2020 - link

    Why spend the money to make a retroactive firmware, when you can just sell the user a new drive with the updated spec? If someone cares enough about this, they'll shell out the $$$ for a new drive.
  • IT Mamba - Monday, December 14, 2020 - link

    Easier said then done.

    https://www.manntechnologies.net
  • Grizzlebee11 - Thursday, August 6, 2020 - link

    I wonder how this will affect Optane performance.
  • Billy Tallis - Thursday, August 6, 2020 - link

    Optane has no reason to adopt a zoned model, because the underlying 3D XPoint memory supports in-place modification of data.
  • name99 - Saturday, August 8, 2020 - link

    Does it really? I know Intel made a big deal about this, but isn't the reality (not that it changes your point, but getting the technical details right)
    - the minimum Optane granularity unit is a 64B line (which, admittedly, is the effective same as DRAM, but DRAM could be smaller if necessary, Optane???)

    - the PRACTICAL Optane granularity unit (which is what I am getting at in terms of "in-place"), giving 4x the bandwidth, is 256B.

    Yeah, I'm right. Looking around I found this
    https://www.usenix.org/system/files/fast20-yang.pd...
    which says "the 3D-XPoint physical media access granularity is 256 bytes" with everything that flows from that: need for write combining buffers, RMW if you can't write-combine, write amplification power/lifetime concerns, etc etc.

    So, sure, you don't have BIG zones/pages like flash -- but it's also incorrect (both technically, and for optimal use of the technology) to suggest that it's "true" random access, as much so as DRAM.

    It remains unclear to me how much of the current disappointment around Optane DIMM performance, eg
    https://www.extremetech.com/computing/288855-repor...
    derives from this. Certainly the Optane-targeted algorithms and new file systems I was reading say 5 years ago, when Intel was promising essentially "flash density, RAM performance" seemed very much optimized for "true" random access with no attempts at clustering larger than a cache line.
    Wouldn't be the first time Intel advertising department's lies landed up tanking a technology because of the ultimate gap between what was promised (and designed for) vs what was delivered...
  • MFinn3333 - Sunday, August 9, 2020 - link

    Um... Optane DIMM's have not disappointed anybody in their performance.

    https://www.storagereview.com/review/supermicro-su...

    https://arxiv.org/pdf/1903.05714.pdf Shows just how
  • brucethemoose - Thursday, August 6, 2020 - link

    Optane is byte addressable like DRAM and fairly durable, isn't it? I don't think this "multi kilobyte zoned storage" approach would be any more appropriate than the spinning rust block/sector model.

    Then again, running Optane over PCIe/NVMe always seemed like a waste to me.
  • FunBunny2 - Friday, August 7, 2020 - link

    "Optane is byte addressable like DRAM and fairly durable, isn't it?"

    yes, and my first notion was that Optane would *replace* DRAM/HDD/SSD in a 'true' 64 bit address single level storage space. although slower than DRAM, such an architecture would write program variables as they change direct to 'storage' without all that data migration. completely forgot that current cpu use many levels of buffers between registers and durable storage. iow, there's really no byte addressed update in today's machines.

    back in the 70s and early 80s, TI (and some others, I think) built machines that had no data registers in/on the cpu, all instructions happened in main memory and all data was written directly in memory and then to disc. the morphing to load/store architectures with scads of buffering means that optimum use of an Optane store with such an architecture looks to be a waste of time until/if cpu architecture writes data based on transaction scope of applications, not buffer fill.

Log in

Don't have an account? Sign up now