NVMe vs AHCI: Another Win for PCIe

Improving performance is never just about hardware. Faster hardware can only help to reach the limits of software and ultimately more efficient software is needed to take full advantage of the faster hardware. This applies to SSDs as well. With PCIe the potential bandwidth increases dramatically and to take full advantage of the faster physical interface, we need a software interface that is optimized specifically for SSDs and PCIe.

AHCI (Advanced Host Controller Interface) dates back to 2004 and was designed with hard drives in mind. While that doesn't rule out SSDs, AHCI is more optimized for high latency rotating media than low latency non-volatile storage. As a result AHCI can't take full advantage of SSDs and since the future is in non-volatile storage (like NAND and MRAM), the industry had to develop a software interface that abolishes the limits of AHCI.

The result is NVMe, short for Non-Volatile Memory Express. It was developed by an industry consortium with over 80 members and the development was directed by giants like Intel, Samsung, and LSI. NVMe is built specifically for SSDs and PCIe and as software interfaces usually live for at least a decade before being replaced, NVMe was designed to be capable of meeting the industry needs as we move to future memory technologies (i.e. we'll likely see RRAM and MRAM enter the storage market before 2020).

  NVMe AHCI
Latency 2.8 µs 6.0 µs
Maximum Queue Depth Up to 64K queues with
64K commands each
Up to 1 queue with
32 commands each
Multicore Support Yes Limited
4KB Efficiency One 64B fetch Two serialized host
DRAM fetches required

Source: Intel

The biggest advantage of NVMe is its lower latency. This is mostly due to a streamlined storage stack and the fact that NVMe requires no register reads to issue a command. AHCI requires four uncachable register reads per command, which results in ~2.5µs of additional latency.

Another important improvement is support for multiple queues and higher queue depths. Multiple queues ensure that the CPU can be used to its full potential and that the IOPS is not bottlenecked by single core limitation.

Source: Microsoft

Obviously enterprise is the biggest beneficiary of NVMe because the workloads are so much heavier and SATA/AHCI can't provide the necessary performance. Nevertheless, the client market does benefit from NVMe but just not as much. As I explained in the previous page, even moderate improvements in performance result in increased battery life and that's what NVMe will offer. Thanks to lower latency the disk usage time will decrease, which results in more time spend at idle and thus increased battery life. There can also be corner cases when the better queue support helps with performance.

Source: Intel

With future non-volatile memory technologies and NVMe the overall latency can be cut to one fifth of the current ~100µs latency and that's an improvement that will be noticeable in everyday client usage too. Currently I don't think any of the client PCIe SSDs support NVMe (enterprise has been faster at adopting NVMe) but the SF-3700 will once it's released later this year. Driver support for both Windows and Linux exists already, so it's now up to SSD OEMs to release compatible SSDs.

Why We Need Faster SSDs Testing SATA Express
Comments Locked

131 Comments

View All Comments

  • Khenglish - Thursday, March 13, 2014 - link

    That 2.8 uS you found is driver interface overhead from an interface that doesn't even exist yet. You need to add this to the access latency of the drive itself to get the real latency.

    Real world SSD read latency for tiny 4K data blocks is roughly 900us on the fastest drives.

    It would take an 18000 meter cable to add even 10% to that.
  • willis936 - Thursday, March 13, 2014 - link

    Show me a consumer phy that can transmit 8Gbps over 100m on cheap copper and I'll eat my hat.
  • Khenglish - Thursday, March 13, 2014 - link

    The problem is long cables is attenuation, not latency. Cables can only be around 50M long before you need a repeater.
  • mutercim - Friday, March 14, 2014 - link

    Electrons have mass, they can't ever travel at the speed of light, no matter the medium. The signal itself would move at the speed of light (in vacuum), but that's a different thing.

    /pedantry
  • Visual - Friday, March 14, 2014 - link

    It's a common misconception, but electrons don't actually need to travel the length of the cable for a signal to travel through it.
    In layman's terms, you don't need to send an electron all the way to the other end of the cable, you just need to make the electrons that are already there react in a certain way as to register a required voltage or current.
    So a signal is a change in voltage, or a change in the electromagnetic fields, and that travels at the speed of light (no, not in vacuum, in that medium).
  • AnnihilatorX - Friday, March 14, 2014 - link

    Just to clarify, it is like pushing a tube full of tennis balls from one end. Assuming the tennis balls are all rigid so deformation is negligible, the 'cause and effect' making the tennis ball on the other end move will travel at speed of light.
  • R3MF - Thursday, March 13, 2014 - link

    having 24x PCIe 3.0 lanes on AMD's Kaveri looks pretty far-sighted right now.
  • jimjamjamie - Thursday, March 13, 2014 - link

    if they got their finger out with a good x86 core the APUs would be such an easy sell
  • MrSpadge - Thursday, March 13, 2014 - link

    Re: "Why Do We Need Faster SSDs"

    You power consumption argument ignores one fact: if you use the same controller, NAND and firmware it costs you x Wh to perform a read or write operation. If you simply increase the interface speed and hence perform more of these operations per time, you also increase the energy required per time, i.e. power consumption. I your example the faster SSD wouldn't continue to draw 3 W with the faster interface: assuming a 30% throughput increase expecting a power draw of 4 W would be reasonable.

    Obviously there are also system components actively waiting for that data. So if the data arrives faster (due to lower latency & higher throughput) they can finish the task quicker and race to sleep. This counterbalances some of the actual NAND power draw increases, but won't negate it completely.
  • Kristian Vättö - Thursday, March 13, 2014 - link

    "If you simply increase the interface speed and hence perform more of these operations per time, you also increase the energy required per time, i.e. power consumption."

    The number of IO operations is a constant here. A faster SSD does not mean that the overall number of operations will increase because ultimately that's up to the workload. Assuming that is the same in both cases, the faster SSD will complete the IO operations faster and will hence spend more time idling, resulting in less power drawn in total.

    Furthermore, a faster SSD does not necessarily mean higher power draw. As the graph on page one shows, PCIe 2.0 increases baseline power consumption by only 2% compared to SATA 6Gbps. Given that SATA 6Gbps is a bottleneck in current SSDs, more processing power (and hence more power) is not required to make a faster SSD. You are right that it may result in higher NAND power draw, though, because the controller will be able to take better advantage of parallelism (more NAND in use = more power consumed).

    I understand the example is not perfect as in real world the number of variables is through the roof. However, the idea was to debunk the claim that PCIe SSDs are just a marketing trick -- they are that too but ultimately there are gains that will reach the average user as well.

Log in

Don't have an account? Sign up now