NVMe vs AHCI: Another Win for PCIe

Improving performance is never just about hardware. Faster hardware can only help to reach the limits of software and ultimately more efficient software is needed to take full advantage of the faster hardware. This applies to SSDs as well. With PCIe the potential bandwidth increases dramatically and to take full advantage of the faster physical interface, we need a software interface that is optimized specifically for SSDs and PCIe.

AHCI (Advanced Host Controller Interface) dates back to 2004 and was designed with hard drives in mind. While that doesn't rule out SSDs, AHCI is more optimized for high latency rotating media than low latency non-volatile storage. As a result AHCI can't take full advantage of SSDs and since the future is in non-volatile storage (like NAND and MRAM), the industry had to develop a software interface that abolishes the limits of AHCI.

The result is NVMe, short for Non-Volatile Memory Express. It was developed by an industry consortium with over 80 members and the development was directed by giants like Intel, Samsung, and LSI. NVMe is built specifically for SSDs and PCIe and as software interfaces usually live for at least a decade before being replaced, NVMe was designed to be capable of meeting the industry needs as we move to future memory technologies (i.e. we'll likely see RRAM and MRAM enter the storage market before 2020).

  NVMe AHCI
Latency 2.8 µs 6.0 µs
Maximum Queue Depth Up to 64K queues with
64K commands each
Up to 1 queue with
32 commands each
Multicore Support Yes Limited
4KB Efficiency One 64B fetch Two serialized host
DRAM fetches required

Source: Intel

The biggest advantage of NVMe is its lower latency. This is mostly due to a streamlined storage stack and the fact that NVMe requires no register reads to issue a command. AHCI requires four uncachable register reads per command, which results in ~2.5µs of additional latency.

Another important improvement is support for multiple queues and higher queue depths. Multiple queues ensure that the CPU can be used to its full potential and that the IOPS is not bottlenecked by single core limitation.

Source: Microsoft

Obviously enterprise is the biggest beneficiary of NVMe because the workloads are so much heavier and SATA/AHCI can't provide the necessary performance. Nevertheless, the client market does benefit from NVMe but just not as much. As I explained in the previous page, even moderate improvements in performance result in increased battery life and that's what NVMe will offer. Thanks to lower latency the disk usage time will decrease, which results in more time spend at idle and thus increased battery life. There can also be corner cases when the better queue support helps with performance.

Source: Intel

With future non-volatile memory technologies and NVMe the overall latency can be cut to one fifth of the current ~100µs latency and that's an improvement that will be noticeable in everyday client usage too. Currently I don't think any of the client PCIe SSDs support NVMe (enterprise has been faster at adopting NVMe) but the SF-3700 will once it's released later this year. Driver support for both Windows and Linux exists already, so it's now up to SSD OEMs to release compatible SSDs.

Why We Need Faster SSDs Testing SATA Express
Comments Locked

131 Comments

View All Comments

  • mkozakewich - Friday, March 14, 2014 - link

    Ooh, or what if we had actual M.2 slots on desktop motherboards that could take a ribbon to attach 2.5" PCIe SSDs?
  • phoenix_rizzen - Thursday, March 13, 2014 - link

    Yeah. Seems strange that they wouldn't re-use the M.2 or mSATA connector for this. Why take up 2 complete SATA slots, and add an extra connector? What are they doing with the SATA connectors when running in SATAe mode?

    It amost would have made sense to make a cable that plugged into <whatever> at the drive end, and just slotted into a PCIe x1 or x2 or x4 slot on the mobo. Skipped the dedicated slot entirely. Then they wouldn't need that hokey power dongle off the drive connector.
  • frenchy_2001 - Friday, March 14, 2014 - link

    They were looking for backward compatibility with current storage and in that context, the decision makes sense. No need to think about how to plug it, it just slots right where the rest of the storage goes and can even accept its predecessor.
    It's a desktop/server/storage centric product, not really meant for laptop/portable.

    But I agree its place is becoming squished between full PCIe (used already in data centers) and miniPCIe/M2 used in portables. As the requirement is already 2x PCIe lanes (like the others), it will be hard to use for lots of storage and if you cannot fit 24 of those in a rack (which is how most server use SATA/SAS), as few servers have 48 lanes of PCIe hanging around unused then it seems only reserved to desktop/workstation and those can easily use PCIe storage...
  • phoenix_rizzen - Friday, March 14, 2014 - link

    Yeah, until you try to connect more than 2 of those to a motherboard. And good luck getting that to work on a mini-ATX/micro-ATX board. Why use up two whole SATA ports, and still use an extra port for PCIe side of it?

    How are you going to make add-in controller cards for 4+ drives? There's no room for 4 of those connectors anywhere. And trying to do a multi-lane setup like SFF-8087 for this will be rediculous.

    The connector is dumb, no matter how you look at it. Especially since it doesn't support power.
  • jasonelmore - Saturday, March 15, 2014 - link

    it looks like the only reason to be excited about this connector is for using older Hard Drives 2.5 or 3.5 form factor, and putting them on a faster bus.

    Other than that, other solutions exist and they do it quicker and with less power. its just a solution to let people use old hardware longer.
  • phobos512 - Thursday, March 13, 2014 - link

    It's not an assumption. The cabling adds distance to the signal path, which increases latency. Electrons don't travel at infinite speed; merely the speed of light (in a vacuum; in a cable it is of course reduced).
  • ddriver - Thursday, March 13, 2014 - link

    You might be surprised now negligible the effect of the speed of electrons is for the total overall latency.
  • Khenglish - Thursday, March 13, 2014 - link

    It's negligible.

    The worst cables carry a signal at 66% of the speed of light, with the best over 90%. If we take the worst case scenario of 66% we get this:

    speed of light = 3*10^8 m/s
    1m / (.66 * 3*10^8 m/s) = 5ns per meter

    If we have a really long 5m cable that's 25ns. Kristian says it takes 115us to read a page. You never read less than 1 page at a time.

    25ns/115us = .0217% for a long 5m cable. Completely insignificant latency impact.
  • willis936 - Thursday, March 13, 2014 - link

    The real latency number to look at is the one cited on the nvme page: 2.8us. It's not so negligible then. It does affect control overhead a good deal.

    Also I have a practical concern of channel loss. You can't just slap a pcie lane onto a 1m cable. Pcie is designed to ride a vein of traces straight to a socket, straight to a card. You're now increasing the length of those traces, still putting it through a socket, and now putting it through a long, low cost cable. Asking more than 1.5GB/s might not work as planned going forward.
  • DanNeely - Thursday, March 13, 2014 - link

    Actually you can. Pcie cabling has been part of the spec since 2007; and while there isn't an explicit max length in the spec, at least one vendor is selling pcie2.0 cables that are up to 7m long for passive versions and 25m for active copper cables. Fiberoptic 3.0 cables are available to 300m.

Log in

Don't have an account? Sign up now