Mixed Random Performance

Real-world storage workloads usually aren't pure reads or writes but a mix of both. It is completely impractical to test and graph the full range of possible mixed I/O workloads—varying the proportion of reads vs writes, sequential vs random and differing block sizes leads to far too many configurations. Instead, we're going to focus on just a few scenarios that are most commonly referred to by vendors, when they provide a mixed I/O performance specification at all. We tested a range of 4kB random read/write mixes at queue depths of 32 and 128. This gives us a good picture of the maximum throughput these drives can sustain for mixed random I/O, but in many cases the queue depth will be far higher than necessary, so we can't draw meaningful conclusions about latency from this test. As with our tests of pure random reads or writes, we are using 32 (or 128) threads each issuing one read or write request at a time. This spreads the work over many CPU cores, and for NVMe drives it also spreads the I/O across the drive's several queues.

The full range of read/write mixes is graphed below, but we'll primarily focus on the 70% read, 30% write case that is a fairly common stand-in for moderately read-heavy mixed workloads.

4kB Mixed Random Read/Write
Queue Depth 32 Queue Depth 128

At the lower queue depth of 32, the PBlaze5 drives have a modest performance advantage over the other flash-based SSDs, and the latest PBlaze5 C916 is the fastest. At the higher queue depth, the PBlaze5 SSDs in general pull way ahead of the other flash-based drives but the C916 no longer has a clear lead over the older models. The Intel P4510's performance increases slightly with the larger queue depth but the Samsung drives are already saturated at QD32.

4kB Mixed Random Read/Write
QD32 Power Efficiency in MB/s/W QD32 Average Power in W
QD128 Power Efficiency in MB/s/W QD128 Average Power in W

As usual the latest PBlaze5 uses less power than its predecessors even before the 10W limit is applied, but on this test that doesn't translate to a clear win in overall efficiency. The Intel Optane SSD is the only one that really stands out with great power efficiency on this test, and compared to that the TLC drives all score fairly close to each other for efficiency, especially at the lower queue depth.

QD32
QD128

The 10W limit has a significant impact on the PBlaze5 C916 through almost all portions of the mixed I/O tests. With or without the power limit, the C916 performs lower than expected at the pure read end of the test, but follows a more normal performance curve through the rest of the IO mixes. The performance declines as more writes are added to the mix are relatively shallow, especially in the read-heavy half of the tests. The older PBlaze5 drives with more extreme overprovisioning hold up a bit better on the write-heavy half of the test than the C916.

Aerospike Certification Tool

Aerospike is a high-performance NoSQL database designed for use with solid state storage. The developers of Aerospike provide the Aerospike Certification Tool (ACT), a benchmark that emulates the typical storage workload generated by the Aerospike database. This workload consists of a mix of large-block 128kB reads and writes, and small 1.5kB reads. When the ACT was initially released back in the early days of SATA SSDs, the baseline workload was defined to consist of 2000 reads per second and 1000 writes per second. A drive is considered to pass the test if it meets the following latency criteria:

  • fewer than 5% of transactions exceed 1ms
  • fewer than 1% of transactions exceed 8ms
  • fewer than 0.1% of transactions exceed 64ms

Drives can be scored based on the highest throughput they can sustain while satisfying the latency QoS requirements. Scores are normalized relative to the baseline 1x workload, so a score of 50 indicates 100,000 reads per second and 50,000 writes per second. Since this test uses fixed IO rates, the queue depths experienced by each drive will depend on their latency, and can fluctuate during the test run if the drive slows down temporarily for a garbage collection cycle. The test will give up early if it detects the queue depths growing excessively, or if the large block IO threads can't keep up with the random reads.

We used the default settings for queue and thread counts and did not manually constrain the benchmark to a single NUMA node, so this test produced a total of 64 threads scheduled across all 72 virtual (36 physical) cores.

The usual runtime for ACT is 24 hours, which makes determining a drive's throughput limit a long process. For fast NVMe SSDs, this is far longer than necessary for drives to reach steady-state. In order to find the maximum rate at which a drive can pass the test, we start at an unsustainably high rate (at least 150x) and incrementally reduce the rate until the test can run for a full hour, and the decrease the rate further if necessary to get the drive under the latency limits.

Aerospike Certification Tool Score

The performance of the PBlaze5 C916 on the Aerospike test is a bit lower than the older C900 delivered, but is still well above what the lower-endurance SSDs can sustain. Even with a 10W limit, the C916 is still able to sustain higher throughput than the Intel P4510.

Aerospike ACT: Power Efficiency
Power Efficiency Average Power in W

The power consumption of the C916 is lower than the C900, but the efficiency score isn't improved because the performance drop roughly matched the power savings. The C916 is still more efficient than the competing drives on this test when is power consumption is unconstrained, but with the 10W limit its efficiency advantage is mostly eliminated.

Peak Throughput And Steady State Conclusion
Comments Locked

13 Comments

View All Comments

  • Samus - Wednesday, March 13, 2019 - link

    That. Capacitor.
  • Billy Tallis - Wednesday, March 13, 2019 - link

    Yes, sometimes "power loss protection capacitor" doesn't need to be plural. 1800µF 35V Nichicon, BTW, since my photos didn't catch the label.
  • willis936 - Wednesday, March 13, 2019 - link

    That’s 3.78W for one minute if they’re running at the maximum voltage rating (which they shouldn’t and probably don’t), if anyone’s curious.
  • DominionSeraph - Wednesday, March 13, 2019 - link

    It's cute, isn't it?

    https://www.amazon.com/BOSS-Audio-CPBK2-2-Capacito...
  • takeshi7 - Wednesday, March 13, 2019 - link

    I wish companies made consumer PCIe x8 SSDs. It would be good since many motherboards can split the PCIe lanes x8/x8 and SLI is falling out of favor anyways.
  • surt - Wednesday, March 13, 2019 - link

    I bet 90% of motherboard buyers would prefer 2 x16 slots vs any other configuration so they can run 1 GPU and 1 very fast SSD. I really don't understand why the market hasn't moved in this direction.
  • MFinn3333 - Wednesday, March 13, 2019 - link

    Because SSD's have a hard time saturating 4x PCIe slots, 16x would just take up space for no real purpose.
  • Midwayman - Wednesday, March 13, 2019 - link

    Maybe, but it sucks that your GPU gets moved to 8x. 16/4 would be an easier split to live with.
  • bananaforscale - Thursday, March 14, 2019 - link

    Not really, GPUs are typically bottlenecked by local memory (VRAM), not PCIe.
  • Opencg - Wednesday, March 13, 2019 - link

    performance would not be very noticeable. and even in the few cases it would be, it would require more expensive cpus and mobos thus mitigating the attractiveness to very few consumers. and fewer consumers means even higher prices. we will get higher throughput but its much more likely with pci 4.0/5.0 than 2 16x

Log in

Don't have an account? Sign up now