Mixed Read/Write Performance

Workloads consisting of a mix of reads and writes can be particularly challenging for flash based SSDs. When a write operation interrupts a string of reads, it will block access to at least one flash chip for a period of time that is substantially longer than a read operation takes. This hurts the latency of any read operations that were waiting on that chip, and with enough write operations throughput can be severely impacted. If the write command triggers an erase operation on one or more flash chips, the traffic jam is many times worse.

The occasional read interrupting a string of write commands doesn't necessarily cause much of a backlog, because writes are usually buffered by the controller anyways. But depending on how much unwritten data the controller is willing to buffer and for how long, a burst of reads could force the drive to begin flushing outstanding writes before they've all been coalesced into optimal sized writes.

Our first mixed workload test is an extension of what Intel describes in their specifications for throughput of mixed workloads. A total of 16 threads are used, each performing a mix of random reads and random writes at a queue depth of 1. Instead of just testing a 70% read mixture, the full range from pure reads to pure writes is tested at 10% increments.

Mixed Random Read/Write Throughput
Vertical Axis units: IOPS MB/s

The Intel Optane SSD DC P4800X is slightly faster than the Optane SSD 900p throughout this test, but either is far faster than the flash-based SSDs. Performance from the Optane SSDs isn't entirely flat across the test, but the minor decline in the middle is nothing to complain about. The Intel P3608 and Micron 9100 both show strong increases near the end of the test due to caching and combining writes.

Random Read Latency
Mean Median 99th Percentile 99.999th Percentile

The mean latency graphs are simply the reciprocal of the throughput graphs above, but the latency percentile graphs reveal a bit more. The median latency of all of the flash SSDs drops significantly once the workload consists of more writes than reads, because the median operation is now a cacheable write instead of an uncacheable read. A graph of the median write latency would likely show writes to be competitive on the flash SSDs even during the read-heavy portion of the test.

The 99th percentile latency chart shows that the flash SSDs have much better QoS on pure read or write workloads than on mixed workloads, but they still cannot approach the stable low latency of the Optane SSDs.

Aerospike Certification Tool

Aerospike is a high-performance NoSQL database designed for use with solid state storage. The developers of Aerospike provide the Aerospike Certification Tool (ACT), a benchmark that emulates the typical storage workload generated by the Aerospike database. This workload consists of a mix of large-block 128kB reads and writes, and small 1.5kB reads. When the ACT was initially released back in the early days of SATA SSDs, the baseline workload was defined to consist of 2000 reads per second and 1000 writes per second. A drive is considered to pass the test if it meets the following latency criteria:

  • fewer than 5% of transactions exceed 1ms
  • fewer than 1% of transactions exceed 8ms
  • fewer than 0.1% of transactions exceed 64ms

Drives can be scored based on the highest throughput they can sustain while satisfying the latency QoS requirements. Scores are normalized relative to the baseline 1x workload, so a score of 50 indicates 100,000 reads per second and 50,000 writes per second. We used the default settings for queue and thread counts and did not manually constrain the benchmark to a single NUMA node, so this test produced a total of 64 threads sharing a total of 32 CPU cores split across two sockets.

The usual runtime for ACT is 24 hours, which makes determining a drive's throughput limit a long process. In order to have results in time for this review, much shorter ACT runtimes were used. Fortunately, none of these SSDs take anywhere near 24h to reach steady state. Once the drives were in steady state, a series of 5-minute ACT runs was used to estimate the drive's throughput limit, and then ACT was run on each drive for two hours to ensure performance remained stable under sustained load.

Aerospike Certification Tool Throughput

ACT Transaction Latency
Drive % over 1ms % over 2ms
Intel Optane SSD DC P4800X 750GB 0.82 0.16
Intel Optane SSD 900p 280GB 1.53 0.36
Micron 9100 MAX 2.4TB 4.94 0.44
Intel SSD DC P3700 1.6TB 4.64 2.22
Intel SSD DC P3608 (single controller) 800GB 4.51 2.29

When held to a specific QoS standard, the two Optane SSDs deliver more than twice the throughput than any of the flash-based SSDs. More significantly, even at their throughput limit, they are well below the QoS limits: the CPU is actually the bottleneck at that rate, leading to overall transaction times that are far higher than the actual drive I/O time. Somewhat higher throughput could be achieved by tweaking the thread and queue counts in the ACT configuration. Meanwhile, the flash SSDs are all close to the 5% limit for 1ms transactions, but are so far under the limit for longer latencies that I've left those numbers out of the above table.

Single-Threaded Performance Conclusion
POST A COMMENT

60 Comments

View All Comments

  • "Bullwinkle J Moose" - Thursday, November 09, 2017 - link

    Humor me.....

    How fast can you copy and paste a 100GB file from and to the same Optane SSD

    I don't believe your mixed mode results adequately demonstrate the internal throughput

    At least not until you demonstrate a direct comparison
    Reply
  • Billy Tallis - Thursday, November 09, 2017 - link

    Your concept of "internal throughput" has no basis in reality. File copies (on a filesystem that does not do copy-on-write) require the file data to be read from the SSD into system DRAM, then written back to the SSD. There are no "copy" commands in the NVMe command set. Reply
  • "Bullwinkle J Moose" - Thursday, November 09, 2017 - link

    "There are no "copy" commands in the NVMe command set."
    ---------------------------------------------------------------------------------
    That might be fixed with a few more onboard processors in the future but does not answer my question

    How fast can you copy/paste 100GB on THAT specific drive?
    Reply
  • "Bullwinkle J Moose" - Thursday, November 09, 2017 - link

    Better yet, I'd like you to GUESS how fast it can copy and paste based on your mixed mode analysis and then go measure it Reply
  • Lord of the Bored - Friday, November 10, 2017 - link

    How will a new processor change that there is no way to tell the drive to do what you want? We don't trust storage devices to "do what I mean", because the cost of a mistake is too high. No device anyone should be using will say "it looks like they're writing back the data they just read in, I'mma ignore the input and duplicate it from the cache to save time." Especially since they can't know if the data is changed in advance.

    Barring a new interface standard, it will take exactly as long to copy a file to another location on the same drive as it will to read the file and then write the file, because that is the only provision within the NVMe command set.
    Reply
  • "Bullwinkle J Moose" - Friday, November 10, 2017 - link

    "How will a new processor change that there is no way to tell the drive to do what you want?"
    --------------------------------------------------------------------------------------------------------------------------
    I have no idea Mr Bored, the "problem" as outlined by Billy Tallis could be ignored COMPLETELY and a fix is not even needed if he would simply do the copy/paste test that I originally asked

    He won't of course, and continue his downward spiral into depression and resentment lashing out at anyone who dares say a bad word about the Spyware/Adware/Malware/Extortionware Platform that is Windows 10

    Check out the interview between Eli the Computer Guy and Baracules Nerdgasm from a year ago

    Poor depressed Eli is still trying to figure out how he can still have a future in a Microsoft World and Baracules is the happiest Guy on Earth

    Check out the Barnacules Videos on Windows Spyware and you can understand his happiness
    He does not let Microsoft dictate the framework of his business, life and future
    Youtube Search: Windows is Spyware (you will find him)

    Poor Billy is on that same sad downward slope that only leads to Suicide or Mass Murder

    Just pull yer head out of Nadella's Ass long enough to tell us how fast this drive can copy/paste 100GB

    It's not hard at all
    Even I could do it (just give me the drive)

    No need to re-invent the drive or come up with what ifs

    Easy-Peasy
    Now SMILE and repeat 3 times, Microsoft is the problem / not the solution
    Deep breath annnnnnd Relax

    There is a future if you make one!
    Reply
  • Lord of the Bored - Friday, November 10, 2017 - link

    Your test is dumb, and is attempting to measure something that can't actually be measured that way. Reply
  • "Bullwinkle J Moose" - Thursday, November 09, 2017 - link

    What would happen if Intel Colludes with AMD to implement this technology into onboard graphics instead of AMD's plan to use Flash in their graphics cards ?

    Seems to me like Internal throughput would be very important to the design
    Reply
  • Samus - Thursday, November 09, 2017 - link

    That is also file system dependent. For example, in Mac OS High Sierra, you can copy and paste (duplicate) any size file instantly on any drive formatted with APFS.

    But your question of a block by block transfer of a file internally for a 100GB file would likely take 50 seconds if not factoring in file system efficiency.
    Reply
  • cygnus1 - Thursday, November 09, 2017 - link

    That's not a copy of the file though. It's just a duplicate file entry referencing the same blocks. That and things like snapshots are possible thanks to the copy on write nature of that file system. But, if any of those blocks were to become corrupted, both 'copies' of the file are corrupt. Reply

Log in

Don't have an account? Sign up now