Mixed Random Performance

Real-world storage workloads usually aren't pure reads or writes but a mix of both. It is completely impractical to test and graph the full range of possible mixed I/O workloads—varying the proportion of reads vs writes, sequential vs random and differing block sizes leads to far too many configurations. Instead, we're going to focus on just a few scenarios that are most commonly referred to by vendors, when they provide a mixed I/O performance specification at all. We tested a range of 4kB random read/write mixes at queue depth 32 (the maximum supported by SATA SSDs) and at QD 128 to better stress the NVMe SSDs. This gives us a good picture of the maximum throughput these drives can sustain for mixed random I/O, but in many cases the queue depth will be far higher than necessary, so we can't draw meaningful conclusions about latency from this test. This test uses 8 threads when testing at QD32, and 16 threads when testing at QD128. This spreads the work over many CPU cores, and for NVMe drives it also spreads the I/O across the drive's several queues.

The full range of read/write mixes is graphed below, but we'll primarily focus on the 70% read, 30% write case that is commonly quoted for mixed IO performance specs.

4kB Mixed Random Read/Write
Queue Depth 32 Queue Depth 128

A queue depth of 32 is only enough to saturate the slowest of these NVMe drives on a 70/30 mixed random workload. All of the high-end drives aren't being stressed enough. At QD128 we see a much wider spread of scores. The DERA and Memblaze 6.4TB drives have pulled past the Optane SSD for overall throughput, but the Samsung PM1725a can't come close to keeping up with them—its throughput is more on par with the DERA D5437 drives with relatively low overprovisioning. The high OP ratio on the DapuStor Haishen3 H3100 allows it to perform much better than any of the other drives with 8-channel controllers, and better than the Intel P4510 which has a 12-channel controller.

4kB Mixed Random Read/Write
QD32 Power Efficiency in MB/s/W QD32 Average Power in W
QD128 Power Efficiency in MB/s/W QD128 Average Power in W

The DapuStor Haishen3 H3100 is the main standout on the power efficiency charts: at QD32 it's the only flash-based NVMe SSD that's more efficient than both of the SATA SSDs, and at QD128 it's getting close to the Optane SSD's efficiency score. Also at QD128 the two fastest 6.4TB drives have pretty good efficiency scores, but still quite a ways behind the Optane SSD: 15-18W vs 10W for similar performance.

QD32 QD128

Most of these drives have hit their power limit by the time the mix is up to about 30% writes. After that point, their performance steadily declines as the workload (and thus power budget) shift more toward slower more power hungry write operations. This is especially true at the higher queue depth. At QD32 things look quite different for the DERA D5457 and Memblaze PBlaze5 C916, because QD32 isn't enough to get close to their full read throughput and they're actually able to deliver higher throughput for writes than for reads. That's not quite true of the Samsung PM1725a because its steady-state random write speed is so much slower, but it does see a bit of an increase in throughput toward the end of the QD32 test run as it gets close to pure writes.

Aerospike Certification Tool

Aerospike is a high-performance NoSQL database designed for use with solid state storage. The developers of Aerospike provide the Aerospike Certification Tool (ACT), a benchmark that emulates the typical storage workload generated by the Aerospike database. This workload consists of a mix of large-block 128kB reads and writes, and small 1.5kB reads. When the ACT was initially released back in the early days of SATA SSDs, the baseline workload was defined to consist of 2000 reads per second and 1000 writes per second. A drive is considered to pass the test if it meets the following latency criteria:

  • fewer than 5% of transactions exceed 1ms
  • fewer than 1% of transactions exceed 8ms
  • fewer than 0.1% of transactions exceed 64ms

Drives can be scored based on the highest throughput they can sustain while satisfying the latency QoS requirements. Scores are normalized relative to the baseline 1x workload, so a score of 50 indicates 100,000 reads per second and 50,000 writes per second. Since this test uses fixed IO rates, the queue depths experienced by each drive will depend on their latency, and can fluctuate during the test run if the drive slows down temporarily for a garbage collection cycle. The test will give up early if it detects the queue depths growing excessively, or if the large block IO threads can't keep up with the random reads.

We used the default settings for queue and thread counts and did not manually constrain the benchmark to a single NUMA node, so this test produced a total of 64 threads scheduled across all 72 virtual (36 physical) cores.

The usual runtime for ACT is 24 hours, which makes determining a drive's throughput limit a long process. For fast NVMe SSDs, this is far longer than necessary for drives to reach steady-state. In order to find the maximum rate at which a drive can pass the test, we start at an unsustainably high rate (at least 150x) and incrementally reduce the rate until the test can run for a full hour, and the decrease the rate further if necessary to get the drive under the latency limits.

Aerospike Certification Tool Score

The strict QoS requirements of this test keep a number of these drives from scoring as well as we would expect based on their throughput on our other tests. The biggest disappointment is the Samsung PM1725a that's barely any faster than their newer 983 DCT. The PM1725a has no problem with outliers above the 8ms or 64ms thresholds, but it cannot get 95% of the reads to complete in under 1ms until the workload slows way down. This suggests that it is not as good as newer SSDs at suspending writes in favor of handling a read request. The DapuStor Haishen3 SSDs also underperform relative to comparable drives, which is a surprise given that they offered pretty good QoS on some of the pure read or write tests.

The Memblaze PBlaze5 C916 is the fastest flash SSD in this bunch, but only scores 60% of what the Optane SSD gets. The DERA SSDs that also use 16-channel controllers are the next fastest, though the 8TB D5437 is substantially slower than the 4TB model.

Aerospike ACT: Power Efficiency
Power Efficiency Average Power in W

Since the ACT test runs drives at the throughput where they offer good QoS rather than at their maximum throughput, the power draw from these drives isn't particularly high: the NVMe SSDs range from roughly 4-13 W. The top performers are also generally the most efficient drives on this test. Even though it is slower than expected, the DapuStor Haishen3 H3100 is the second most efficient flash SSD in this round-up, using just over half the power that the slightly faster Intel P4510 requires.

Peak Throughput and Steady State Conclusion
Comments Locked

33 Comments

View All Comments

  • Billy Tallis - Tuesday, February 18, 2020 - link

    The enterprise market is more serious about QA and product qualification. Phison got a consumer PCIe 4.0 controller out in the middle of 2019, but the quick-and-dirty approach they took for that isn't useful in the enterprise market.

    There are lots of consumer and enterprise PCIe 4.0 SSDs coming down the pipeline, but at the moment there's very little that has completed qualification and is ready for widespread production deployment. That was even more true back in November when testing for this review started.

Log in

Don't have an account? Sign up now