Comparison With Other Storage Paradigms

Zoned storage is just one of several efforts to enable software to make its IO more SSD-friendly and to reduce unnecessary overhead from the block storage abstraction. The NVMe specification already has a collection of features that allow software to issue writes with appropriate sizes and alignment for the SSD, and features like Streams and NVM Sets to help ensure unrelated IO doesn't land in the same erase block. When supported by the SSD and host software, these features can provide most of the QoS benefits ZNS can achieve, but they aren't as effective at preventing write amplification. Applications that go out of their way to serialize writes (eg. log-structured databases) can expect low write amplification, but only if the filesystem (or another layer of the IO stack) doesn't introduce fragmentation or reordering commands. Another downside is that these several features are individually optional, so applications must be prepared to run on SSDs that support only a subset of the features the application wants.

The Open Channel SSD concept has been tried in several forms. Compared to ZNS, Open Channel SSDs put even more requirements on the host software (such as wear leveling), which has hindered adoption. However, capable hardware has been available from several vendors. The LightNVM Open Channel SSD specification and associated projects have now been discontinued in favor of ZNS and other standard NVMe features, which can provide all of the benefits and functionality of the Open Channel 2.0 specification while placing fewer requirements on host software (but slightly more on SSD firmware). The other vendor-specific open channel SSD specifications will probably be retired when the current hardware implementations reach end of life.

ZNS and Open Channel SSDs can both be seen as modifications to the block storage paradigm, in that they continue to use the concept of a linear space of Logical Block Addresses of a fixed size. Another recently approved NVMe TP adds a command set for Key-Value Namespaces, which completely abandon the fixed-size LBA concept. Instead, the drive acts as a key-value database, storing objects of potentially variable size, identified by keys of a few bytes. This storage abstraction looks nothing like how the underlying flash memory works, but KV databases are very common in the software world. A KV SSD allows such a database's functionality to be almost completely offloaded from the CPU to the SSD. Implementing a KV database directly in the SSD firmware avoids a lot of the overhead of implementing a KV database on top of block storage that is running on top of a traditional Flash Translation Layer, so this is another viable route to making IO more SSD-friendly. KV SSDs don't really have the cost advantages that a ZNS-only SSD can offer, but for some workloads they can provide similar performance and endurance benefits, and save some CPU time and RAM in the process.

The Software Model Ecosystem Status
Comments Locked

45 Comments

View All Comments

  • FreckledTrout - Thursday, August 6, 2020 - link

    Like most things its the cost. I bet the testing alone is prohibitive to back port this into older SSD drives.
  • xenol - Thursday, August 6, 2020 - link

    Bingo. Testing and support costs something. Though I suppose they could release it for older drives under a no-support provision.

    Except depending on who tries this, I'm sure it's inevitable someone will break something and complain that they're not getting support.
  • DigitalFreak - Thursday, August 6, 2020 - link

    Why spend the money to make a retroactive firmware, when you can just sell the user a new drive with the updated spec? If someone cares enough about this, they'll shell out the $$$ for a new drive.
  • IT Mamba - Monday, December 14, 2020 - link

    Easier said then done.

    https://www.manntechnologies.net
  • Grizzlebee11 - Thursday, August 6, 2020 - link

    I wonder how this will affect Optane performance.
  • Billy Tallis - Thursday, August 6, 2020 - link

    Optane has no reason to adopt a zoned model, because the underlying 3D XPoint memory supports in-place modification of data.
  • name99 - Saturday, August 8, 2020 - link

    Does it really? I know Intel made a big deal about this, but isn't the reality (not that it changes your point, but getting the technical details right)
    - the minimum Optane granularity unit is a 64B line (which, admittedly, is the effective same as DRAM, but DRAM could be smaller if necessary, Optane???)

    - the PRACTICAL Optane granularity unit (which is what I am getting at in terms of "in-place"), giving 4x the bandwidth, is 256B.

    Yeah, I'm right. Looking around I found this
    https://www.usenix.org/system/files/fast20-yang.pd...
    which says "the 3D-XPoint physical media access granularity is 256 bytes" with everything that flows from that: need for write combining buffers, RMW if you can't write-combine, write amplification power/lifetime concerns, etc etc.

    So, sure, you don't have BIG zones/pages like flash -- but it's also incorrect (both technically, and for optimal use of the technology) to suggest that it's "true" random access, as much so as DRAM.

    It remains unclear to me how much of the current disappointment around Optane DIMM performance, eg
    https://www.extremetech.com/computing/288855-repor...
    derives from this. Certainly the Optane-targeted algorithms and new file systems I was reading say 5 years ago, when Intel was promising essentially "flash density, RAM performance" seemed very much optimized for "true" random access with no attempts at clustering larger than a cache line.
    Wouldn't be the first time Intel advertising department's lies landed up tanking a technology because of the ultimate gap between what was promised (and designed for) vs what was delivered...
  • MFinn3333 - Sunday, August 9, 2020 - link

    Um... Optane DIMM's have not disappointed anybody in their performance.

    https://www.storagereview.com/review/supermicro-su...

    https://arxiv.org/pdf/1903.05714.pdf Shows just how
  • brucethemoose - Thursday, August 6, 2020 - link

    Optane is byte addressable like DRAM and fairly durable, isn't it? I don't think this "multi kilobyte zoned storage" approach would be any more appropriate than the spinning rust block/sector model.

    Then again, running Optane over PCIe/NVMe always seemed like a waste to me.
  • FunBunny2 - Friday, August 7, 2020 - link

    "Optane is byte addressable like DRAM and fairly durable, isn't it?"

    yes, and my first notion was that Optane would *replace* DRAM/HDD/SSD in a 'true' 64 bit address single level storage space. although slower than DRAM, such an architecture would write program variables as they change direct to 'storage' without all that data migration. completely forgot that current cpu use many levels of buffers between registers and durable storage. iow, there's really no byte addressed update in today's machines.

    back in the 70s and early 80s, TI (and some others, I think) built machines that had no data registers in/on the cpu, all instructions happened in main memory and all data was written directly in memory and then to disc. the morphing to load/store architectures with scads of buffering means that optimum use of an Optane store with such an architecture looks to be a waste of time until/if cpu architecture writes data based on transaction scope of applications, not buffer fill.

Log in

Don't have an account? Sign up now