Random Read Performance

Although sequential performance is important, a true staple of any multi-user server is an IO load that appears highly random. For our small block random read test we first fill all drives sequentially, then perform one full drive write using a random 4K pass at a queue depth of 128. We then perform a 3 minute random read run at each queue depth, plotting bandwidth and latency along the way.

Small block random read operations have inherent limits when it comes to parallelism. In the case of all of the drives here, QD1 performance ends up around 20 - 40MB/s. The P3700 manages 36.5MB/s (~8900 IOPS) compared to 27.2MB/s (~6600 IOPS) for the SATA S3700. Even at a queue depth of 8 there's only a small advantage to the P3700 from a bandwidth perspective (~77000 IOPS vs. ~58400 IOPS). Performance does scale incredibly well with increasing queue depths though. By QD16 we see the P3700 pull away, at even as low as QD32 the P3700 delivers roughly 3.5x the performance of the S3700. There's a 70% advantage at QD32 compared to Intel's SSD 910, but that advantage grows to 135% at QD128.

Micron's P420m is incredibly competitive, substantially outperforming the P3700 at the highest queue depth.

Random read latency is incredibly important for applications where response time matters. Even more important for these applications is keeping latency below a certain threshold, what we're looking for here is a flat curve across all queue depths:

And that's almost exactly what the P3700 delivers. While the average latency for Intel's SSD DC S3700 (SATA) sky rockets after QD32, the P3700 remains mostly flat throughout the sweep. It's only at QD128 that we see a bit of an uptick. Even the 910 shows bigger jumps at higher queue depths.

If we remove the SATA drive and look exclusively at PCIe solutions, we get a better idea of the P3700's low latencies:

In this next chart we'll look at some specific numbers. Here we've got average latency (expressed in µs) for 4KB reads at a queue depth of 32. This is the same data as in the charts above, just formatted differently:

Average Latency - 4KB Random Read QD32

The P3700's latency advantage over its SATA counterpart is huge. Compared to other PCIe solutions, the P3700 is still leading but definitely not by as large of a margin. Micron's P420m comes fairly close.

Next up is average latency, but now at our highest tested queue depth: 128.

Average Latency - 4KB Random Read QD128

Micron's P420m definitely takes over here. Micron seems to have optimized the P420m for operation at higher queue depths while Intel focused the P3700 a bit lower. The SATA based S3700 is just laughable here, average completion latency is over 1.6ms.

Looking at maximum latency is interesting from a workload perspective, as well as from a drive architecture perspective. Latency sensitive workloads tend to have a max latency they can't exceed, but at the same time a high max latency but low average latency implies that the drive sees these max latencies infrequently. From an architectural perspective, consistent max latencies across the entire QD sweep give us insight into how the drive works at a lower level. It's during these max latency events that the drive's controller can schedule cleanup and defragmentation routines. I recorded max latency at each queue depth and presented an average of all max latencies across the QD sweet (From QD1 - QD128). In general, max latencies remained consistent across the sweep.

Max Latency - 4KB Random Read

The 910's max latencies never really get out of hand. Part of the advantage is each of the 910's four controllers only ever see a queue depth of 32, so no individual controller is ever stressed all that much. The S3700 is next up with remarkably consistent performance here. The range of values the S3700 had was 2ms - 10ms, not correlating in any recognizable way to queue depth. Note the huge gap between max and average latency for the S3700 - it's an order of magnitude. These high latency events are fairly rare.

The P3700 sees two types of long latency events: one that takes around 3ms and another that takes around 15ms. The result is a higher max latency than the other two Intel drives, but with a lower average latency than both it's still fairly rare.

Micron's P420m runs the longest background task routine of anything here, averaging nearly 53ms. Whatever Micron is doing here, it seems consistent across all queue depths.

Random Write Performance

Now we get to the truly difficult workload: a steady state 4KB random write test. We first fill the drive to capacity, then perform a 4KB (QD128) random write workload until we fill the drive once more. We then run a 3 minute 4KB random write test across all queue depths, recording bandwidth and latency values. This gives us a good indication of steady state performance, which should be where the drives end up over days/weeks/months of continued use in a server.

Despite the more strenuous workload, the P3700 absolutely shines here. We see peak performance attained at a queue depth of 8 and it's sustained throughout the rest of the range.

Average latency is also class leading - it's particularly impressive when you compare the P3700 to its SATA counterpart.

Average Latency - 4KB Random Write QD32

Average Latency - 4KB Random Write QD128

The absolute average latency numbers are particularly impressive. The P3700 at a queue depth of 128 can sustain 4KB random writes with IOs completing at 0.86ms.

Max Latency - 4KB Random Write

Sequential Read & Write Performance Mixed Read/Write Performance
Comments Locked

85 Comments

View All Comments

  • will792 - Tuesday, June 3, 2014 - link

    How do you hardware RAID these drives?

    With SATA/SAS drives I can use LSI/Adaptec controllers and mirror/striping/parity configuration to tune performance, reliability and drive failure recoverability.
  • iwod - Wednesday, June 4, 2014 - link

    While NVMe only uses a third of the CPU power, it is still quite lot to achieve those IOPS. Although consumer application would / should hardly see those number in use in real life.

    We really need PCI-E to get faster and more lanes, the Ultra M.2 promoted by ASRock was great. Direct CPU connect, 4X PCI-E 3.0. Lots and Lots of headroom to work with. Compared to upcoming going to be standard which would easily get saturated buy the time they arrive.
  • juhatus - Wednesday, June 4, 2014 - link

    You should really really explore how you make this bootable win8.1 drive on Z97. Is it possible or not? With M.2 support on Z97 it really should'nt be a problem?
  • Mick Turner - Wednesday, June 4, 2014 - link

    Was there any hint of a release date?
  • 7Enigma - Wednesday, June 4, 2014 - link

    Why is the S3700 200GB drive being used as the comparison to this gigantic 1.6TB monster? Unless there is something I don't understand it has always been the case where the larger the drive (and more channels used) can significantly increase the performance compared to a smaller drive (with less channels). The S3700 had an 800GB drive. That one IMO would be more representative of the improvements of the P3700.
  • shodanshok - Wednesday, June 4, 2014 - link

    Hi Anand,
    I have some question regarding the I/O efficiency graphs in the "CPU utilization" page.

    What performance counter did you watch when comparing CPU storage load?

    I'm ask you because if you use the classical "I/O wait time" (common on Unix and Windows platform), you are basically measuring the time the CPU is waiting for storage, *not* its load.

    The point it that while the CPU is waiting for storage, it can schedule another readily-available thread. In other words, while it wait for storage, the CPU is free to do other works. If this is the case, it means that you are measuring I/O performance, *not* I/O efficiency (IOPS per CPU load).

    On the other hand, If you are measuring system time and IRQ time, the CPU load graphs are correct.

    Regards.
  • Ramon Zarat - Wednesday, June 4, 2014 - link

    NET NEUTRALITY

    Please, share this video: https://www.youtube.com/watch?v=fpbOEoRrHyU

    I wrote an e-mail to the FCC, called them and left a message and went on their website to fill my comment. Took me 5 insignificant minutes. Do it too! Don't let those motherfuckers run over you! SHARE THIS VIDEO!!!!

    Submit your comments here http://apps.fcc.gov/ecfs/upload/begin?procName=14-... It's proceeding # 14-28

    #FUCKTHEFCC #netneutrality
  • underseaglider - Wednesday, June 4, 2014 - link

    Technological advancements improve the reliability and performance of the tools and processes we all use in our daily routines. Whether for professional or personal needs, technology allows us to perform our tasks more efficiently in most cases.
  • aperson2437 - Thursday, June 5, 2014 - link

    Sounds like once these SSDs get cheap it is going to eliminate the aggravation of waiting for computers to do certain things like loading big programs and games forever. I can't wait to get my hands on one. I'm super impatient when it comes to computers. Hopefully, there will be some intense competition for these NVMe SSDs from Samsung and others and prices come down fast.
  • Shiitaki - Thursday, June 5, 2014 - link

    No, it was not out of necessity. SSD's have used Sata because they lacked vision/ lazy, or whatever other excuse. PCI express has been around for years, as so has AHCI. There is no reason there isn't a single strap on a PCI express card to change between operating modes, like AHCI for older machines, and whatever this new thing is. All an SSD is largely a risk computer that overwhelmingly provides it's functionality using software. Msata should have never existed, if you have to have a controller anyway, why not a PCI-express? After all, SATA controllers connect to PCI-express?

    SSD's could have been PCI express in 2008. Those early drives however were terrible, and didn't need the bandwidth or latency, so there was no reason. They were too busy trying to get NAND flash working to bother worrying about other concerns.

    Even now, most flash drives being sold are not capable of saturating Sata3 even on sequential reads. I'm going to jab Kingston again here about their dishonest V300, but Micron's M500 isn't pushing any limits either. Intel SSD's should be fast, this isn't news, they have been horribly overpriced. What is news is that the price is now justified.

    Why isn't the new spec internal thunderbolt? Oh yeah, has gots to make money on licensing! Why make money producing products when it is so much easier to cash royalty checks? The last thing the pc industry needs is another standard to do something that can already be done 2 other ways, but then we need a jobs program making adapters. Those two ways are PCI-Express, and thunderbolt.

    At some point the hard drive should be replaced by a PCI-express full length card that accepts NAND cards, and the user simply buys and keeps adding cards as space is required. This can already be done with current technology, no reinventing the wheel required.

Log in

Don't have an account? Sign up now