Sequential Read Performance

Sequential operations still make up a substantial portion of enterprise storage workloads. One of the most IO heavy workloads we run in AnandTech's own infrastructure are our traffic and ad stats processing routines. These tasks run daily as well as weekly and both create tremendous IO load for our database servers. Profiling the workload reveals an access pattern that's largely sequential in nature.

We'll start with a look at sequential read performance. Here we fill the drive multiple times with sequential data and then read it back for a period of 3 minutes. I'm reporting performance in MB/s as well as latency over a range of queue depths. First up, bandwidth figures:

The biggest takeaway from this graph is just how much parallelism Intel manages to extract from each transfer even at a queue depth of 1. The P3700 delivers more than 1GB/s of bandwidth at QD1. That's more than double any of the other competitors here, and equal to the performance of 3.7x SATA Intel SSD DC S3700s. Note that if you force the P3700 into a higher power, 25W operating mode, Intel claims peak performance hits 2.8GB/s compared to the 2.2GB/s we show here.

With NVMe you get a direct path to the PCIe controller, and in the case of any well designed system the storage will communicate directly to a PCIe controller on the CPU's die. With a much lower overhead interface and protocol stack, the result should be substantially lower latency. The graph below looks at average

The P3700 also holds a nice latency advantage here. You'll be able to see just how low the absolute latencies are in a moment, but for now we can look at the behavior of the drives vs. queue depth. The P3700's latencies stay mostly flat up to a queue depth of 16, it's only after QD32 that we see further increased latencies. The comparison to the SATA based S3700 is hilarious. The P3700's IO latency at QD32 is lower than the S3700 at QD8.

The next graph removes the sole SATA option and looks at PCIe comparisons alone, including the native PCIe (non-NVMe) Micron P420m:

Micron definitely holds the latency advantage over Intel's design at higher queue depths. Remember that the P420m also happens to be a native PCIe SSD controller, it's just using a proprietary host controller interface.

Sequential Write Performance

Similar to our discussion around sequential read performance, sequential write performance is still a very valuable metric in the enterprise space. Large log processing can stress a drive's sequential write performance, and once again it's something we see in our own server environment.

Here we fill the drive multiple times with sequential data and then write it back for a period of 3 minutes. I'm reporting performance in MB/s as well as latency over a range of queue depths.

Once again we see tremendous performance at very low queue depths. At a queue depth of 1 the P3700 already performs better than any of the other drives here, and delivers 1.3GB/s of sequential write performance. That's just insane performance at such a low queue depth. By QD4, the P3700 reaches peak performance at roughly 1.9GB/s regardless of what power mode you operate it in.

The chart below shows average latency across the QD sweep:

The P3700 continues to do extremely well in the latency tests, although Intel's original PCIe SSD didn't do so badly here either - its bandwidth was simply nowhere as good. Another way to look at it is that Intel now delivers better latency than the original 910, at substantially higher bandwidths. Micron's P420m manages to land somewhere between a good SATA drive and the P3700.

The next chart just removes the SATA drive so we get a better look at the PCIe comparison:

 

Introduction & Consistency Random Read/Write Performance & Latency Analysis
Comments Locked

85 Comments

View All Comments

  • will792 - Tuesday, June 3, 2014 - link

    How do you hardware RAID these drives?

    With SATA/SAS drives I can use LSI/Adaptec controllers and mirror/striping/parity configuration to tune performance, reliability and drive failure recoverability.
  • iwod - Wednesday, June 4, 2014 - link

    While NVMe only uses a third of the CPU power, it is still quite lot to achieve those IOPS. Although consumer application would / should hardly see those number in use in real life.

    We really need PCI-E to get faster and more lanes, the Ultra M.2 promoted by ASRock was great. Direct CPU connect, 4X PCI-E 3.0. Lots and Lots of headroom to work with. Compared to upcoming going to be standard which would easily get saturated buy the time they arrive.
  • juhatus - Wednesday, June 4, 2014 - link

    You should really really explore how you make this bootable win8.1 drive on Z97. Is it possible or not? With M.2 support on Z97 it really should'nt be a problem?
  • Mick Turner - Wednesday, June 4, 2014 - link

    Was there any hint of a release date?
  • 7Enigma - Wednesday, June 4, 2014 - link

    Why is the S3700 200GB drive being used as the comparison to this gigantic 1.6TB monster? Unless there is something I don't understand it has always been the case where the larger the drive (and more channels used) can significantly increase the performance compared to a smaller drive (with less channels). The S3700 had an 800GB drive. That one IMO would be more representative of the improvements of the P3700.
  • shodanshok - Wednesday, June 4, 2014 - link

    Hi Anand,
    I have some question regarding the I/O efficiency graphs in the "CPU utilization" page.

    What performance counter did you watch when comparing CPU storage load?

    I'm ask you because if you use the classical "I/O wait time" (common on Unix and Windows platform), you are basically measuring the time the CPU is waiting for storage, *not* its load.

    The point it that while the CPU is waiting for storage, it can schedule another readily-available thread. In other words, while it wait for storage, the CPU is free to do other works. If this is the case, it means that you are measuring I/O performance, *not* I/O efficiency (IOPS per CPU load).

    On the other hand, If you are measuring system time and IRQ time, the CPU load graphs are correct.

    Regards.
  • Ramon Zarat - Wednesday, June 4, 2014 - link

    NET NEUTRALITY

    Please, share this video: https://www.youtube.com/watch?v=fpbOEoRrHyU

    I wrote an e-mail to the FCC, called them and left a message and went on their website to fill my comment. Took me 5 insignificant minutes. Do it too! Don't let those motherfuckers run over you! SHARE THIS VIDEO!!!!

    Submit your comments here http://apps.fcc.gov/ecfs/upload/begin?procName=14-... It's proceeding # 14-28

    #FUCKTHEFCC #netneutrality
  • underseaglider - Wednesday, June 4, 2014 - link

    Technological advancements improve the reliability and performance of the tools and processes we all use in our daily routines. Whether for professional or personal needs, technology allows us to perform our tasks more efficiently in most cases.
  • aperson2437 - Thursday, June 5, 2014 - link

    Sounds like once these SSDs get cheap it is going to eliminate the aggravation of waiting for computers to do certain things like loading big programs and games forever. I can't wait to get my hands on one. I'm super impatient when it comes to computers. Hopefully, there will be some intense competition for these NVMe SSDs from Samsung and others and prices come down fast.
  • Shiitaki - Thursday, June 5, 2014 - link

    No, it was not out of necessity. SSD's have used Sata because they lacked vision/ lazy, or whatever other excuse. PCI express has been around for years, as so has AHCI. There is no reason there isn't a single strap on a PCI express card to change between operating modes, like AHCI for older machines, and whatever this new thing is. All an SSD is largely a risk computer that overwhelmingly provides it's functionality using software. Msata should have never existed, if you have to have a controller anyway, why not a PCI-express? After all, SATA controllers connect to PCI-express?

    SSD's could have been PCI express in 2008. Those early drives however were terrible, and didn't need the bandwidth or latency, so there was no reason. They were too busy trying to get NAND flash working to bother worrying about other concerns.

    Even now, most flash drives being sold are not capable of saturating Sata3 even on sequential reads. I'm going to jab Kingston again here about their dishonest V300, but Micron's M500 isn't pushing any limits either. Intel SSD's should be fast, this isn't news, they have been horribly overpriced. What is news is that the price is now justified.

    Why isn't the new spec internal thunderbolt? Oh yeah, has gots to make money on licensing! Why make money producing products when it is so much easier to cash royalty checks? The last thing the pc industry needs is another standard to do something that can already be done 2 other ways, but then we need a jobs program making adapters. Those two ways are PCI-Express, and thunderbolt.

    At some point the hard drive should be replaced by a PCI-express full length card that accepts NAND cards, and the user simply buys and keeps adding cards as space is required. This can already be done with current technology, no reinventing the wheel required.

Log in

Don't have an account? Sign up now