Samsung has announced a new prototype key-value SSD that is compatible with the first industry standard API for key-value storage devices. Earlier this year, the Object Drives working group of Storage Networking Industry Association (SNIA) published version 1.0 of the Key Value Storage API Specification. Samsung has added support for this new API to their ongoing key-value SSD project.

Most hard drives and SSDs expose their storage capacity through a block storage interface, where the drive stores blocks of a fixed size (typically 512 bytes or 4kB) and they are identified by Logical Block Addresses that are usually 48 or 64 bits. Key-value drives extend that model so that a drive can support variable-sized keys instead of fixed-sized LBAs, and variable-sized values instead of fixed 512B or 4kB blocks. This allows a key-value drive to be used more or less as a drop-in replacement for software key-value databases like RocksDB, and as a backend for applications built atop key-value databases.

Key-value SSDs have the potential to offload significant work from a server's CPUs when used to replace a software-based key-value database. More importantly, moving the key-value interface into the SSD itself means it can be tightly integrated with the SSD's flash translation layer, cutting out the overhead of emulating a block storage device and layering a variable-sized storage system on top of that. This means key-value SSDs can operate with much lower write amplification and higher performance than software key-value databases, with only one layer of garbage collection in the stack instead of one in the SSD and one in the database.

Samsung has been working on key-value SSDs for quite a while, and they have been publicly developing open-source software to support KV SSDs for over a year, including the basic libraries and drivers needed to access KV SSDs as well as a sample benchmarking tool and a Ceph backend. The prototype drives they have previously discussed have been based on their PM983 datacenter NVMe drives with TLC NAND, using custom firmware to enable the key-value interface. Those drives support key lengths from 4 to 255 bytes and value lengths up to 2MB, and it is likely that Samsung's new prototype is based on the same hardware platform and retains similar size limits.

Samsung's Platform Development Kit software for key-value SSDs originally supported their own software API, but now additionally supports the vendor-neutral SNIA standard API. The prototype drives are currently available for companies that are interested in developing software to use KV SSDs. Samsung's KV SSDs probably will not move from prototype status to being mass production products until after the corresponding key-value command set extension to NVMe is finalized, so that KV SSDs can be supported without needing a custom NVMe driver. The SNIA standard API for key-value drives is a high-level transport-agnostic API that can support drives using NVMe, SAS or SATA interfaces, but each of those protocols needs to be extended with key-value support.

POST A COMMENT

48 Comments

View All Comments

  • submux - Saturday, September 07, 2019 - link

    Back end object storage can be unstructured and manageable, but the front end can be relational and enforce relationships.

    Then you get the scalability advantages of object storage rather than ISAM and you also get the advantages of SQL RDBMS.
    Reply
  • jordanclock - Thursday, September 05, 2019 - link

    I'm curious as to why you would call relational databases "real" databases? Reply
  • FunBunny2 - Thursday, September 05, 2019 - link

    the simple answer: RDBMS control the data independent of the clients, all of these other flat-file analogs leave control in the client, which is a problem. Reply
  • satai - Sunday, September 08, 2019 - link

    Client/server architecture has very little to do with relationality. Reply
  • FunBunny2 - Sunday, September 08, 2019 - link

    "Client/server architecture has very little to do with relationality."

    it's not relationality, per se, that's at issue. it just happens that, these days, SQL (relational, sort of) databases constitute 99.44% of datastores which are controlled by a central TPM. in the case of CICS, which is believe it or don't still around even on linux, is separate from the datastore. RDBMS/SQL engines incorporate the datastore. before Codd and the RM, both IDMS and IMS (again, still around) were/are transaction control engines for their datastores.

    doing transaction control from the client(s) is the disaster waiting to happen. that's what figured out 50 years ago. kiddie koders fresh out of koder skool have no clue about data. they just want a sinecure pounding out LoC.
    Reply
  • prisonerX - Thursday, September 05, 2019 - link

    Relational databases are built from key-value stores (generally b-trees) so your comment doesn't make much sense.

    What does make sense is moving KV stores closer to the hardware to improve performance, so this product is a great idea.
    Reply
  • cosmotic - Thursday, September 05, 2019 - link

    The index might be a b-tree but the storage of the data almost definitely isn't a b-tree. Reply
  • lkcl - Thursday, September 05, 2019 - link

    https://github.com/LMDB/sqlightning

    sqlite3 uses btree for its data. replacing the btree algorithm with LMDB resulted in the "insert" test completing at 1,000 times faster. this is due to a unique feature of LMDB's "insert-at-end" capability.

    synchronous sequential and random writes were 25% better. asynchronous sequential writes about 15% worse. async random writes about 1% better. random reads 80% better. sequential reads 85% better. etc.
    Reply
  • lkcl - Friday, September 06, 2019 - link

    https://blog.biokoda.com/post/133121776825/actordb...

    "How - SQLite

    Lets start with some basics. A basic unit of storage for a database is a page. Pages are generally 4k or 8k. An SQLite file is a sequence of pages one after another."
    Reply
  • FunBunny2 - Sunday, September 08, 2019 - link

    "A basic unit of storage for a database is a page"

    yes and no. the unit of storage depends on the hardware and OS. could be a page or extent or a row. it just depends on what the OS supports and engine writers opt for. whether a write is implemented as a full-page re-write on a row change (update/delete/insert) is up to the engine writer and the OS capabilities. RM/SQL semantics make it the row (strictly speaking the set which may be a join, which itself may resolve as one row), and someday, Codd willing, all RDBMS will support that.
    Reply

Log in

Don't have an account? Sign up now