NVMe vs AHCI: Another Win for PCIe

Improving performance is never just about hardware. Faster hardware can only help to reach the limits of software and ultimately more efficient software is needed to take full advantage of the faster hardware. This applies to SSDs as well. With PCIe the potential bandwidth increases dramatically and to take full advantage of the faster physical interface, we need a software interface that is optimized specifically for SSDs and PCIe.

AHCI (Advanced Host Controller Interface) dates back to 2004 and was designed with hard drives in mind. While that doesn't rule out SSDs, AHCI is more optimized for high latency rotating media than low latency non-volatile storage. As a result AHCI can't take full advantage of SSDs and since the future is in non-volatile storage (like NAND and MRAM), the industry had to develop a software interface that abolishes the limits of AHCI.

The result is NVMe, short for Non-Volatile Memory Express. It was developed by an industry consortium with over 80 members and the development was directed by giants like Intel, Samsung, and LSI. NVMe is built specifically for SSDs and PCIe and as software interfaces usually live for at least a decade before being replaced, NVMe was designed to be capable of meeting the industry needs as we move to future memory technologies (i.e. we'll likely see RRAM and MRAM enter the storage market before 2020).

  NVMe AHCI
Latency 2.8 µs 6.0 µs
Maximum Queue Depth Up to 64K queues with
64K commands each
Up to 1 queue with
32 commands each
Multicore Support Yes Limited
4KB Efficiency One 64B fetch Two serialized host
DRAM fetches required

Source: Intel

The biggest advantage of NVMe is its lower latency. This is mostly due to a streamlined storage stack and the fact that NVMe requires no register reads to issue a command. AHCI requires four uncachable register reads per command, which results in ~2.5µs of additional latency.

Another important improvement is support for multiple queues and higher queue depths. Multiple queues ensure that the CPU can be used to its full potential and that the IOPS is not bottlenecked by single core limitation.

Source: Microsoft

Obviously enterprise is the biggest beneficiary of NVMe because the workloads are so much heavier and SATA/AHCI can't provide the necessary performance. Nevertheless, the client market does benefit from NVMe but just not as much. As I explained in the previous page, even moderate improvements in performance result in increased battery life and that's what NVMe will offer. Thanks to lower latency the disk usage time will decrease, which results in more time spend at idle and thus increased battery life. There can also be corner cases when the better queue support helps with performance.

Source: Intel

With future non-volatile memory technologies and NVMe the overall latency can be cut to one fifth of the current ~100µs latency and that's an improvement that will be noticeable in everyday client usage too. Currently I don't think any of the client PCIe SSDs support NVMe (enterprise has been faster at adopting NVMe) but the SF-3700 will once it's released later this year. Driver support for both Windows and Linux exists already, so it's now up to SSD OEMs to release compatible SSDs.

Why We Need Faster SSDs Testing SATA Express
Comments Locked

131 Comments

View All Comments

  • Kristian Vättö - Tuesday, March 18, 2014 - link

    Bear in mind that SATA-IO is not just some random organization that does standards for fun - it consists of all the players in the storage industry. The current board has members from Intel, Marvell, HP, Dell, SanDisk etc...
  • BMNify - Thursday, March 20, 2014 - link

    indeed, and yet its now clear these and the other design by committee organization's are no longer fit for purpose , producing far to little far to late....

    ARM IP =the generic current CoreLink CCN-508 that can deliver up to 1.6 terabits of sustained usable system bandwidth per second with a peak bandwidth of 2 terabits per second (256 GigaBytes/s) at processor speeds scaling all the way up to 32 processor cores total.

    Intel IP QPI = Intel's Knights Landing Xeon Phi due in 2015 with its antiquated QPI interconnect and its expected ultra short-reach (USR) interconnection only up to 500MB/s data throughput seems a little/lot short on real data throughput by then...
  • Hrel - Monday, March 17, 2014 - link

    Cost: Currently PCI-E SSD's are inexplicably expensive. If this is gonna be the same way it won't sell no matter how many PCI-E lanes Intel builds into it's chipset. My main concern with using the PCI-E bus is cost. Can someone explain WHY those cost so much more? Is it just the niche market or is there an actual legitimate reason for it? Like, PCI-E controllers are THAT much harder to create than SATA ones?

    I doubt that's the case very much. If it is then I guess prices will drop as that gets easier but for now they've priced themselves out of competition.

    Why would I buy a 256GB SSD on PCI-E for $700 when I can buy a 256GB SSD on SATA for $120? That shit makes absolutely no sense. I could see like a 10-30% price premium, no more.
  • BMNify - Tuesday, March 18, 2014 - link

    "Can someone explain WHY those cost so much more?"
    greed...
    due mostly to not invented here is the reason we are not yet using a version of everspin's MRAM 240 pin, 64MByte DIMM with x72 configuration with ECC for instance http://www.everspin.com/image-library/Everspin_Spi...

    it can be packaged for any of the above forms M2 etc too rathe than have motherboard vendors put extra ddr3 ram slots decicated to this ddr3 slot compatable everspin MRAM today with the needed extra ddr3 ram controllers included in any CPU/SoC....

    rather than licence this existing (for 5 years ) commercial MRAM product and collaborate together to make and improve the yield and help them shrink it down to 45nm to get it below all of today's dram fastest speeds etc they all want an invented here product and will make the world markets wait for no good reason...
  • Kristian Vättö - Tuesday, March 18, 2014 - link

    Because most PCIe SSDs (the Plextor M6e being an exception) are just two or four SATA SSDs sitting behind a SATA to PCIe bridge. There is added cost from the bridge chip other additional controller, although the main reason are the laws of economics. Retail PCIe SSDs are low volume because SATA is still the dominant interface and that increases production costs for the OEMs. Low order quantities are also more expensive for the retailers.

    In short, OEMs are just trying to milk enthusiasts with PCIe drives but ones we'll see PCIe entering the mainstream market, you'll no longer have to pay extra for them (e.g. SF3700 combines SATA and PCIe in a single chip, so PCIe isn't more expensive with it).
  • Ammohunt - Thursday, March 20, 2014 - link

    Disappointed there wasn't a SAS offering compared 6GB SAS != 6G SATA
  • jseauve - Thursday, March 20, 2014 - link

    Awesome computer
  • westfault - Saturday, March 22, 2014 - link

    "The SandForce, Marvell, and Samsung designs are all 2.0 but at least OCZ is working on a 3.0 controller that is scheduled for next year."

    When you say OCZ is developing on a PCIe 3.0 controller do you mean that they were working on one before they were purchased by Toshiba, or was this announced since they were acquired by Toshiba? I understand that Toshiba has kept the OCZ name, but is it certain that they have continued all R&D from before OCZ's bankruptcy?
  • dabotsonline - Monday, April 28, 2014 - link

    Roll on SATAe with PCIe 4.0, let alone 3.0 next year!
  • MRFS - Tuesday, January 20, 2015 - link

    I've felt the same way about SATAe and PCIe SSDs --
    cludgy and expensive, respectively.

    Given the roadmaps for PCIe 3.0 and 4.0, it makes sense to me, imho,
    to "sync" SATA and SAS storage with 8G and 16G transmission clocks
    and the 128b/130b "jumbo frame" now implemented in the PCIe 3.0 standard.

    Ideally, end users will have a choice of clock speeds, perhaps with pre-sets:
    6G, 8G, 12G and 16G.

    In actual practice now, USB 3.1 uses a 10G clock and 128b/132b jumbo frame:

    max headroom = 10G / 8.25 bits per byte = 1.212 GB/second.

    132 bits / 16 bytes = 8.25 bits per byte, using the USB 3.1 jumbo frame

    To save a lot of PCIe motherboards, which are designed for expansion,
    PCIe 2.0 and 3.0 expansion slots can be populated with cards
    which implement 8G clocks and 128b/130b jumbo frames.

    That one evolutionary change should put pressure on SSD manufacturers
    to offer SSDs with support for both features.

    Why "SATA-IV" does not already sync with PCIe 3.0 is anybody's guess.

    We tried to discuss this with the SATA-IO folks may moons ago,
    but they were quite committed to their new SATAe connector. UGH!

Log in

Don't have an account? Sign up now