How to Enable NVMe Zoned Namespaces

Hardware changes for ZNS

At a high level, in order to enable ZNS, most drives on the market only require a firmware update. ZNS doesn't put any new requirements on SSD controllers or other hardware components; this feature can be implemented for existing drives with firmware changes alone.

The critical element in hardware comes down to when an SSD is designed to support only zoned namespaces. First and foremost, a ZNS-only SSD doesn't need anywhere near as much overprovisioning as a traditional enterprise SSD. ZNS SSDs are still responsible for performing wear leveling, but this no longer requires a large spare area for the garbage collection process. Used properly, ZNS allows the host software to avoid almost all of the circumstances that would lead to write amplification inside the SSD. Enterprise SSDs commonly use overprovisioning ratios up to 28% (800GB usable per 1024GB of flash on typical 3 DWPD models) and ZNS SSDs can expose almost all of that capacity to the host system without compromising the ability to deliver high sustained write performance. ZNS SSDs still need some reserve capacity (for example, to cope with failures that crop up in flash memory as it wears out), but Western Digital says we can expect ZNS to allow roughly a factor of 10 reduction in overprovisioning ratios.

Better control over write amplification also means QLC NAND is a more viable option for use cases that would otherwise require TLC NAND. Enterprise storage workloads often lead to write amplification factors of 2-5x. With ZNS, the SSD itself causes virtually no write amplification and clever host software can avoid causing much write amplification, so the overall effect is a boost to drive lifespan that offsets the lower endurance of QLC compared to TLC (or beyond QLC). Even in a ZNS SSD, QLC NAND is still fundamentally slower than TLC, but that same near-elimination of background data management within the SSD means a QLC-based ZNS SSD can probably compete with TLC-based traditional SSDs on QoS metrics even if the total throughput is lower.

 

The other major hardware change enabled by ZNS is a drastic reduction in DRAM requirements. The Flash Translation Layer (FTL) in traditional block-based SSDs requires about 1GB of DRAM for every 1TB of NAND flash. This is used to store the address mapping or indirection tables that record the physical NAND flash memory address that is currently storing each Logical Block Address (LBA). The 1GB per 1TB ratio is a consequence of the FTL managing the flash with a granularity of 4kB. Right off the bat, ZNS gets rid of that requirement by letting the SSD manage whole zones that are hundreds of MB each. Tracking which physical NAND erase blocks comprise each zone now requires so little memory that it could be done with on-controller SRAM even for SSDs with tens of TB of flash. ZNS doesn't completely eliminate the need for SSDs to include DRAM, because the metadata that the drive needs to store about each zone is larger than what a traditional FTL needs to store for each LBA, and drives are likely to also use some DRAM for caching writes - more on this later.

NVMe Zoned Namespaces Explained The Software Model
Comments Locked

45 Comments

View All Comments

  • jeremyshaw - Monday, August 10, 2020 - link

    The early 70s and 80s timeframe saw CPUs and Memory scaling roughly the same, year to year. After a while, memory advanced a whole lot slower, necessitating the multiple tiers of memory we have now, from L1 cache to HDD. Modern CPUs didn't become lots of SRAM with at attached ALU just because CPU designers love throwing their transistor budget into measly megabytes of cache. They became that way, simply because other tiers of memory and storage are just too slow.
  • WorBlux - Wednesday, December 22, 2021 - link

    Modern CPU's have instruction that let you skip cache, and then there was SPARC with streaming accelerators, where you could unleash a true vector/CUDA style instruction directly against a massive chunk of memory.
  • Arbie - Thursday, August 6, 2020 - link

    An excellent article; readable and interesting even to those (like me) who don't know the tech but with depth for those who do. Right on the AT target.
  • Arbie - Thursday, August 6, 2020 - link

    And - I appreciated the "this is important" emphasis so I knew where to pay attention.
  • ads295 - Friday, August 7, 2020 - link

    +1 all the way
  • batyesz - Thursday, August 6, 2020 - link

    UltraRAM is the next big step in the computer market.
  • tygrus - Thursday, August 6, 2020 - link

    The first 512-sectors I remember is going back to the days of IBM XT compatibles, 5¼ inch floppies, 20MB HDD, MSDOS, FAT12 & FAT16. That well over 30 years of baggage is heavy to carry around. They moved to 32bit based file systems and 4KB blocks/clusters or larger (eg. 64 or 128bit addresses, 2MB blocks/clusters are possible).

    It wastes space to save small files/fragments in large blocks but it also wastes resources to handle more locations (smaller blocks) with longer addresses taking up more space and processing.

    Management becomes more complex to overcome the quirks of HW & increased capacities.
  • WaltC - Tuesday, August 11, 2020 - link

    Years ago, just for fun, I formatted a HD with 1k clusters because I wanted to see how much of a slowdown the increased overhead would create--I remember it being quite pronounced and quickly jumped back to 4k clusters. I was surprised at how much of slow down it created. That was many years ago--I can't even recall what version of Windows I was using at the time...;)
  • Crazyeyeskillah - Thursday, August 6, 2020 - link

    I'll ask the dumb questions no one else has posted:
    What kind of performance numbers will this equate to?

    Cheers
  • Billy Tallis - Thursday, August 6, 2020 - link

    There's really too many variables and too little data to give a good answer at this point. Some applications will be really ill-suited to running on zoned storage, and may not gain any performance. Even for applications that are a good fit for zoned storage, the most important benefits may be to latency/QoS metrics that are less straightforward to interpret than throughput.

    The Radian/IBM Research case study mentioned near the end of the article claims 65% improvement to throughput and 22x improvement to some tail latency metric for a Sysbench MySQL test. That's probably close to best-case numbers.

Log in

Don't have an account? Sign up now