NVMe vs AHCI: Another Win for PCIe

Improving performance is never just about hardware. Faster hardware can only help to reach the limits of software and ultimately more efficient software is needed to take full advantage of the faster hardware. This applies to SSDs as well. With PCIe the potential bandwidth increases dramatically and to take full advantage of the faster physical interface, we need a software interface that is optimized specifically for SSDs and PCIe.

AHCI (Advanced Host Controller Interface) dates back to 2004 and was designed with hard drives in mind. While that doesn't rule out SSDs, AHCI is more optimized for high latency rotating media than low latency non-volatile storage. As a result AHCI can't take full advantage of SSDs and since the future is in non-volatile storage (like NAND and MRAM), the industry had to develop a software interface that abolishes the limits of AHCI.

The result is NVMe, short for Non-Volatile Memory Express. It was developed by an industry consortium with over 80 members and the development was directed by giants like Intel, Samsung, and LSI. NVMe is built specifically for SSDs and PCIe and as software interfaces usually live for at least a decade before being replaced, NVMe was designed to be capable of meeting the industry needs as we move to future memory technologies (i.e. we'll likely see RRAM and MRAM enter the storage market before 2020).

  NVMe AHCI
Latency 2.8 µs 6.0 µs
Maximum Queue Depth Up to 64K queues with
64K commands each
Up to 1 queue with
32 commands each
Multicore Support Yes Limited
4KB Efficiency One 64B fetch Two serialized host
DRAM fetches required

Source: Intel

The biggest advantage of NVMe is its lower latency. This is mostly due to a streamlined storage stack and the fact that NVMe requires no register reads to issue a command. AHCI requires four uncachable register reads per command, which results in ~2.5µs of additional latency.

Another important improvement is support for multiple queues and higher queue depths. Multiple queues ensure that the CPU can be used to its full potential and that the IOPS is not bottlenecked by single core limitation.

Source: Microsoft

Obviously enterprise is the biggest beneficiary of NVMe because the workloads are so much heavier and SATA/AHCI can't provide the necessary performance. Nevertheless, the client market does benefit from NVMe but just not as much. As I explained in the previous page, even moderate improvements in performance result in increased battery life and that's what NVMe will offer. Thanks to lower latency the disk usage time will decrease, which results in more time spend at idle and thus increased battery life. There can also be corner cases when the better queue support helps with performance.

Source: Intel

With future non-volatile memory technologies and NVMe the overall latency can be cut to one fifth of the current ~100µs latency and that's an improvement that will be noticeable in everyday client usage too. Currently I don't think any of the client PCIe SSDs support NVMe (enterprise has been faster at adopting NVMe) but the SF-3700 will once it's released later this year. Driver support for both Windows and Linux exists already, so it's now up to SSD OEMs to release compatible SSDs.

Why We Need Faster SSDs Testing SATA Express
Comments Locked

131 Comments

View All Comments

  • SunLord - Thursday, March 13, 2014 - link

    I like the idea but they should of rolled there own custom connector not twist the sata connector to meet there needs it's looks stupid. A custom high density connector and cable designed specifically for its task would make far more sense then this hodgepodge but I guess they need to cut comers to "keep costs down" on something already aimed at the high end which is even stupider. A nice clean high density interface with an sata adpater would of been far better.
  • androticus - Thursday, March 13, 2014 - link

    Ugh. What an immensely cumbersome and kludgy design.
  • asuglax - Thursday, March 13, 2014 - link

    Kristian, I completely agree with your final thoughts. I would actually take it a step further and say that Intel should completely do away with the DMI interface and corresponding PCH; they should limit the I/O off the processor to as many as possible PCI-e lanes, 3 DisplayPort (which can be exposed as dual-mode), and however many memory channels. Enterprise could have QPI, additionally. I would like to see I/O controllers embedded into the physical interconnects where PCI-e could be routed to the interconnects and however many USB, SATA, or other connections could be switched and exposed through the devices (I supposed it could be argued that this could be a PCH in itself, only connected through PCI-e instead of DMI). Security protection measures (such as TPM's functionality) should be built-in to all components and, while being independently operative, be able to communicate with one-another through the presented I/O channels.
  • fteoath64 - Saturday, March 15, 2014 - link

    @asuglax: Intel is known and has done this. Provide small incremental adds to the processor and chipset features so they can provide as many iterations of SKUs as they can over a period of time. If they do a radical change, then they risk not being able to manage the incremental changes they wanted. It is a strategy to allows for a large variety of product units, hence expanding the market for themselves. Lately, you see that they have reduced the number of CPU skus while expanded the mobile mobile skus. This is possible since in both market segments they are the majority leader and allows them to maximise profits with minimal changes to production. It is a different strategy for AMD and a completely different one yet for the Arm SoC vendors. Intel's strategy seems like it is coercing the market to move to a place and pace they wanted. The Arm guys just give their best shot on every product they have so we got a lot more than we paid for.
    You just cannot teach an old dog new tricks.
  • Babar Javied - Thursday, March 13, 2014 - link

    This SATA 3.2 really doesn't make a lot of sense to me and others also seem to agree from when I've read in the comments. Is this supposed to be a temporary thing or the middle man before we get to the good stuff? like SATA 4.0, is that the reason why it's called SATA 3.2?

    So here is a genuine question. Why not just use Thunderbolt? It is owned by intel and they can implement it into their next chipset(s). Also, Thunderbolt uses PCIe lanes so it is plenty fast without wasting lanes. Sure, the controller and cables are expensive but once it starts to be mass produced they should come down as is common with electronics.

    It seems to me that SATA is going though a lot of trouble to bring 3.2 when it is only marginally better. I also get the feeling that SSDs are going to get even faster by using more channels (current standard is 8) and NAND chips (current standard is 16) as they become the new standard in storage. Of course the transition from HDD to SDD is not going to happen overnight but it is going to happen and I get the feeling that the 750MB/s is going to become a bottleneck very quickly.

    And finally, by switching to Thunderbolt, we also help kickstart the adoption of this standard and hopefully see it flourish. Allowing us to daisy chain monitors, storage drives (SSDs and HDDs), external graphic cards and so much more.
  • SirKnobsworth - Thursday, March 13, 2014 - link

    There's no point to implementing Thunderbolt internally, which is what SATAe is for. For external purposes you can already buy Thunderbolt SSDs.
  • SittingBull - Thursday, March 13, 2014 - link

    I don't feel like you have proven that there is any need for these faster hard drive interfaces, as you hoped to in the title of your article. The need for, let alone the desire for, higher resolution video is anything but proven by anyone that I know of. 4k video offers only dubious benefits, as only very large displays can show the difference between it and 1080p, ie., 70 or 80 inches! The wider colour gamut would be nice but is not really compelling, and those are the only benefits I am aware of. I seriously doubt that the TV or electronics industry are going to be able to sell the 4k idea to the public as a whole. Even 720p is not shown to be lacking until we get into displays larger than 50 inches.

    It is always nice to read up on the tech of the future and I thank you for explaining the SATAe and other interfaces that are in the works. Eventually these advances will be implemented but I can't see it happening until there is some sort of substantial demand, and your entire article is built on the premise that we will need the bandwidth to support 4k video quite soon. But we don't ... :(
  • BMNify - Sunday, March 16, 2014 - link

    SittingBull , perhaps you should stick your head out of the Native American Law Students offices and look to your alumni of the Indian Institute of Science for inspiration in the tech world today,

    given that its clear and public knowledge that the NHK/ BBC R&D years of UHD development http://www.bbc.co.uk/rd/blog/2013/06/defining-the-... and now ratified by the International Telecommunication Union are the minimum base for any new Soc design to adhere to and comply with IF they want to actually reuse their current UHD IP for the longest time scales...

    the main point is if the PR are not trying to cover up by acts of omission the fact they don't actually comply with the new Rec. 2020 real colour space is better colour coverage due to using 10bits per pixel for UHD-1 consumer grade panels and later UHD-2 12bit grade panels for the 8192×4320 [8K] consumer in 4 years or so.

    to put it simply, antiquated Rec. 709 (HDTV and below) 8bit pseudocolor color = only 256 bands of usable colour.

    Rec. 2020 real colour space 10bits per pixel= 1000+ bands of usable colour so you get far less banding in lower bit rate encodes/decodes and more compression for a given bit rate so a better higher visual quality at smaller size.

    as it happens, NHK announced they are to give another UHD-2/8K 3840 pixels wide by 2160 pixels high Broadcast Demo at the coming NAB Show,"Japanese public broadcaster NHK is planning to give a demonstration of "8K" resolution content over at single 6MHz bandwidth UHF TV channel at the National Association of Broadcasters (NAB) Show coming up in Las Vegas, Nevada, April 5 to 10."
    In order to transmit the 8K signal, whose raw data requirement is 16 times greater than an HDTV signal, it was necessary to deploy additional technologies These include ultra multi-level orthogonal frequency domain multiplex (OFDM) transmission and dual–polarized multiple input multiple output (MIMO) antennas. This was in addition to image data compression. The broadcast uses 4096-point QAM modulation and MPEG-4 AVC H.264 video coding.

    we could also have a debate about how qualcomm and other cortex vendors might finally provide the needed UHD-2 data throughput and far lower power with ether integrated JEDEC Wide IO2 25.6GBps/51.2GBps or Hybrid Memory Cube 2.5D interposer-based architectures,and using MRAM inline computation etc.

    did you notice how the ARM SoC with its current NoC (network On Chip) can already beat today's QPI real life data throughput (1Tb/s,2Tb/s etc) at far lower power,never mind the slower MCI as above, they only need to bring that NoC capability to the external interconnect to take advantage of it in any number of IO ports
  • Popskalius - Friday, March 14, 2014 - link

    I haven't even taken my Asus z87 Plus out of its shrink wrap and it's becoming obsolete.
  • SittingBull - Friday, March 14, 2014 - link

    I just put together my own system with an Asus Z87 Plus mb, an i7 4770k, 16 GB of RAM and an SSD. It is not and will not be obsolete anytime in the near future, ie., at least 3 years. Worry not. There isn't anything on the horizon our systems won't be able to deal with.

Log in

Don't have an account? Sign up now