Sequential Read Performance

Sequential operations still make up a substantial portion of enterprise storage workloads. One of the most IO heavy workloads we run in AnandTech's own infrastructure are our traffic and ad stats processing routines. These tasks run daily as well as weekly and both create tremendous IO load for our database servers. Profiling the workload reveals an access pattern that's largely sequential in nature.

We'll start with a look at sequential read performance. Here we fill the drive multiple times with sequential data and then read it back for a period of 3 minutes. I'm reporting performance in MB/s as well as latency over a range of queue depths. First up, bandwidth figures:

The biggest takeaway from this graph is just how much parallelism Intel manages to extract from each transfer even at a queue depth of 1. The P3700 delivers more than 1GB/s of bandwidth at QD1. That's more than double any of the other competitors here, and equal to the performance of 3.7x SATA Intel SSD DC S3700s. Note that if you force the P3700 into a higher power, 25W operating mode, Intel claims peak performance hits 2.8GB/s compared to the 2.2GB/s we show here.

With NVMe you get a direct path to the PCIe controller, and in the case of any well designed system the storage will communicate directly to a PCIe controller on the CPU's die. With a much lower overhead interface and protocol stack, the result should be substantially lower latency. The graph below looks at average

The P3700 also holds a nice latency advantage here. You'll be able to see just how low the absolute latencies are in a moment, but for now we can look at the behavior of the drives vs. queue depth. The P3700's latencies stay mostly flat up to a queue depth of 16, it's only after QD32 that we see further increased latencies. The comparison to the SATA based S3700 is hilarious. The P3700's IO latency at QD32 is lower than the S3700 at QD8.

The next graph removes the sole SATA option and looks at PCIe comparisons alone, including the native PCIe (non-NVMe) Micron P420m:

Micron definitely holds the latency advantage over Intel's design at higher queue depths. Remember that the P420m also happens to be a native PCIe SSD controller, it's just using a proprietary host controller interface.

Sequential Write Performance

Similar to our discussion around sequential read performance, sequential write performance is still a very valuable metric in the enterprise space. Large log processing can stress a drive's sequential write performance, and once again it's something we see in our own server environment.

Here we fill the drive multiple times with sequential data and then write it back for a period of 3 minutes. I'm reporting performance in MB/s as well as latency over a range of queue depths.

Once again we see tremendous performance at very low queue depths. At a queue depth of 1 the P3700 already performs better than any of the other drives here, and delivers 1.3GB/s of sequential write performance. That's just insane performance at such a low queue depth. By QD4, the P3700 reaches peak performance at roughly 1.9GB/s regardless of what power mode you operate it in.

The chart below shows average latency across the QD sweep:

The P3700 continues to do extremely well in the latency tests, although Intel's original PCIe SSD didn't do so badly here either - its bandwidth was simply nowhere as good. Another way to look at it is that Intel now delivers better latency than the original 910, at substantially higher bandwidths. Micron's P420m manages to land somewhere between a good SATA drive and the P3700.

The next chart just removes the SATA drive so we get a better look at the PCIe comparison:

 

Introduction & Consistency Random Read/Write Performance & Latency Analysis
Comments Locked

85 Comments

View All Comments

  • 457R4LDR34DKN07 - Tuesday, June 3, 2014 - link

    No, they are 4x pcie 2.5" SFF-8639 drives here is a good article describing the differences between satae and 2.5" SFF-8639 drives:

    http://www.anandtech.com/show/6294/breaking-the-sa...
  • Qasar - Tuesday, June 3, 2014 - link

    ok.. BUT.. that's not what i asked.... will this type of drive, ie the NVMe type.. be on some other type of connection besides PCIe 4x ?? as i said :

    depending on ones usage... finding a PCIe slot to put a drive like this in.. may not be possible, specially in SLI/Crossfire... add the possibility of a sound card or raid card..

    cause one can quickly run out of PCIe slots, or have slots covered/blocked by other PCIe cards ... right now, for example. i have an Asus P6T and due to my 7970.. the 2nd PCIe 16 slot.. is unusable and the 3rd slot.. has a raid card in it.. on a newer board.. it may be different.. but sill SLI/Crossfire.. can quickly cover up slots ... or block them ... hence.. will NVMe type drives also be on sata express ??
  • 457R4LDR34DKN07 - Wednesday, June 4, 2014 - link

    right and what I told you is that 2.5" SFF-8639 is also offered. You can probably plug it into a sata express connector but you will only realize 2x pci-e 3.0 speeds IE 10gb/s.
  • xdrol - Tuesday, June 3, 2014 - link

    It takes 5x 200 GB drives to match the performance of a 1.6 TB drive? That does not sound THAT good... Make it 8x and it's even.
  • Lonyo - Tuesday, June 3, 2014 - link

    Now make a motherboard with 8xPCIe slots to put those drives in.
  • hpvd - Tuesday, June 3, 2014 - link

    sorry only 7 :-(
    http://www.supermicro.nl/products/motherboard/Xeon...
    :-)
  • hpvd - Tuesday, June 3, 2014 - link

    some technical data for the lower capicity models could be fund here:
    http://www.intel.com/content/www/us/en/solid-state...
    maybe this is interesting to be added to the article...
  • huge pile of sticks - Tuesday, June 3, 2014 - link

    but can it run crysis?
  • Homeles - Tuesday, June 3, 2014 - link

    It can run 1,000 instances of Crysis. A kilocrysis, if you will.
  • Shadowmaster625 - Tuesday, June 3, 2014 - link

    How is 200 uS considered low latency? What a joke. If intel had any ambitions besides playing second fiddle to apple and ARM, then they would put the SSD controller on the cpu and create a DIMM type interface for the NAND. Then they would have read latencies in the 1 to 10 uS range, and even less latency as they improve their caching techniques. It's true that you wouldnt be able to address more than a couple TB of NAND through such an interface, but it would be so blazing fast that it could be shadowed using SATA SSDs with very little perceived performance loss over the entire address space. Think big cache for NAND, call it L5 or whatnot. It would do for storage what L2 did for cpus.

Log in

Don't have an account? Sign up now