Comparison With Other Storage Paradigms

Zoned storage is just one of several efforts to enable software to make its IO more SSD-friendly and to reduce unnecessary overhead from the block storage abstraction. The NVMe specification already has a collection of features that allow software to issue writes with appropriate sizes and alignment for the SSD, and features like Streams and NVM Sets to help ensure unrelated IO doesn't land in the same erase block. When supported by the SSD and host software, these features can provide most of the QoS benefits ZNS can achieve, but they aren't as effective at preventing write amplification. Applications that go out of their way to serialize writes (eg. log-structured databases) can expect low write amplification, but only if the filesystem (or another layer of the IO stack) doesn't introduce fragmentation or reordering commands. Another downside is that these several features are individually optional, so applications must be prepared to run on SSDs that support only a subset of the features the application wants.

The Open Channel SSD concept has been tried in several forms. Compared to ZNS, Open Channel SSDs put even more requirements on the host software (such as wear leveling), which has hindered adoption. However, capable hardware has been available from several vendors. The LightNVM Open Channel SSD specification and associated projects have now been discontinued in favor of ZNS and other standard NVMe features, which can provide all of the benefits and functionality of the Open Channel 2.0 specification while placing fewer requirements on host software (but slightly more on SSD firmware). The other vendor-specific open channel SSD specifications will probably be retired when the current hardware implementations reach end of life.

ZNS and Open Channel SSDs can both be seen as modifications to the block storage paradigm, in that they continue to use the concept of a linear space of Logical Block Addresses of a fixed size. Another recently approved NVMe TP adds a command set for Key-Value Namespaces, which completely abandon the fixed-size LBA concept. Instead, the drive acts as a key-value database, storing objects of potentially variable size, identified by keys of a few bytes. This storage abstraction looks nothing like how the underlying flash memory works, but KV databases are very common in the software world. A KV SSD allows such a database's functionality to be almost completely offloaded from the CPU to the SSD. Implementing a KV database directly in the SSD firmware avoids a lot of the overhead of implementing a KV database on top of block storage that is running on top of a traditional Flash Translation Layer, so this is another viable route to making IO more SSD-friendly. KV SSDs don't really have the cost advantages that a ZNS-only SSD can offer, but for some workloads they can provide similar performance and endurance benefits, and save some CPU time and RAM in the process.

The Software Model Ecosystem Status
Comments Locked

45 Comments

View All Comments

  • jeremyshaw - Monday, August 10, 2020 - link

    The early 70s and 80s timeframe saw CPUs and Memory scaling roughly the same, year to year. After a while, memory advanced a whole lot slower, necessitating the multiple tiers of memory we have now, from L1 cache to HDD. Modern CPUs didn't become lots of SRAM with at attached ALU just because CPU designers love throwing their transistor budget into measly megabytes of cache. They became that way, simply because other tiers of memory and storage are just too slow.
  • WorBlux - Wednesday, December 22, 2021 - link

    Modern CPU's have instruction that let you skip cache, and then there was SPARC with streaming accelerators, where you could unleash a true vector/CUDA style instruction directly against a massive chunk of memory.
  • Arbie - Thursday, August 6, 2020 - link

    An excellent article; readable and interesting even to those (like me) who don't know the tech but with depth for those who do. Right on the AT target.
  • Arbie - Thursday, August 6, 2020 - link

    And - I appreciated the "this is important" emphasis so I knew where to pay attention.
  • ads295 - Friday, August 7, 2020 - link

    +1 all the way
  • batyesz - Thursday, August 6, 2020 - link

    UltraRAM is the next big step in the computer market.
  • tygrus - Thursday, August 6, 2020 - link

    The first 512-sectors I remember is going back to the days of IBM XT compatibles, 5¼ inch floppies, 20MB HDD, MSDOS, FAT12 & FAT16. That well over 30 years of baggage is heavy to carry around. They moved to 32bit based file systems and 4KB blocks/clusters or larger (eg. 64 or 128bit addresses, 2MB blocks/clusters are possible).

    It wastes space to save small files/fragments in large blocks but it also wastes resources to handle more locations (smaller blocks) with longer addresses taking up more space and processing.

    Management becomes more complex to overcome the quirks of HW & increased capacities.
  • WaltC - Tuesday, August 11, 2020 - link

    Years ago, just for fun, I formatted a HD with 1k clusters because I wanted to see how much of a slowdown the increased overhead would create--I remember it being quite pronounced and quickly jumped back to 4k clusters. I was surprised at how much of slow down it created. That was many years ago--I can't even recall what version of Windows I was using at the time...;)
  • Crazyeyeskillah - Thursday, August 6, 2020 - link

    I'll ask the dumb questions no one else has posted:
    What kind of performance numbers will this equate to?

    Cheers
  • Billy Tallis - Thursday, August 6, 2020 - link

    There's really too many variables and too little data to give a good answer at this point. Some applications will be really ill-suited to running on zoned storage, and may not gain any performance. Even for applications that are a good fit for zoned storage, the most important benefits may be to latency/QoS metrics that are less straightforward to interpret than throughput.

    The Radian/IBM Research case study mentioned near the end of the article claims 65% improvement to throughput and 22x improvement to some tail latency metric for a Sysbench MySQL test. That's probably close to best-case numbers.

Log in

Don't have an account? Sign up now