Single-Threaded Performance

This next batch of tests measures how much performance can be driven by a single thread performing asynchronous I/O to produce queue depths ranging from 1 to 64. With these fast SSDs, the CPU can become the bottleneck at high queue depths and clock speed can have a major impact on IOPS.

4kB Random Reads

With a single thread issuing asynchronous requests, all of the SSDs top out around 1.2GB/s for random reads. What separates them is how high a queue depth is necessary to reach this level of performance, and what their latency is when they first reach saturation.

Random Read Throughput
Throughput: IOPS MB/s
Latency: Mean Median 99th Percentile 99.999th Percentile

For the Optane SSDs, the queue depth only needs to reach 4-6 in order to be near the highest attainable random read performance, and further increases in queue depth only add to the latency without improving throughput. The flash-based SSDs require queue depths well in excess of 32. Even long after the Optane SSDs have reached saturation an latency has begun to climb, the Optane SSDs continue to offer better QoS than the flash SSDs.

4kB Random Writes

The Optane SSDs offer the best single-threaded random write performance, but the margins are much smaller than for random reads, thanks to the write caches on the flash-based SSDs. The flash SSDs have random write latencies that are only 2-3x higher than the Optane SSD's latency, and the throughput advantage of the Optane SSD at saturation is less than 20%.

Random Write Throughput
Throughput: IOPS MB/s
Latency: Mean Median 99th Percentile 99.999th Percentile

The Optane SSDs saturate around QD4 where the CPU becomes the bottleneck, and the flash based SSDs follow suit between QD8 and QD16. Once all the drives are saturated at about the same throughput, the Optane SSDs offer far more consistent performance.

128kB Sequential Reads

With the large 128kB block size, the sequential read test doesn't hit a CPU/IOPS bottleneck like the random read test above. The Optane SSDs saturate at the rated throughput of about 2.4-2.5GB/s while the Micron 9100 MAX and the Intel P3608 scale to higher throughput.

Sequential Read Throughput
Throughput: IOPS MB/s
Latency: Mean Median 99th Percentile 99.999th Percentile

The Optane SSDs reach their full sequential read speed at QD2, and the flash-based SSDs don't catch up until well after QD8. The 99th and 99.999th percentile latencies of the Optane SSDs are more than an order of magnitude lower when the drives are operating at their respective saturation points.

128kB Sequential Writes

Write caches again allow the flash-based SSDs to approach the write latency of the Optane SSDs, albeit at lower throughput. The Optane SSDs quickly exceed their specified 2GB/s sequential write throughput while the flash-based SSDs have to sacrifice low latency in order to reach high throughput.

Sequential Write Throughput
Throughput: IOPS MB/s
Latency: Mean Median 99th Percentile 99.999th Percentile

As with sequential reads, the Optane SSDs reach saturation at a mere QD2, while the flash-based SSDs need until around QD8 to scale up to full throughput. By the time the flash-based SSDs reach their maximum speed, their latency has at least doubled.

Performance VS Transfer Size Mixed Read/Write Performance
POST A COMMENT

58 Comments

View All Comments

  • Elstar - Thursday, November 9, 2017 - link

    I thought the whole point of 3D XPoint memory would be that it is DIMM friendly too.
    1) When will we see this in DIMM form?
    2) Would the DIMM version also need/have the reserved/spare capacity?
    3) Why is this spare capacity even needed? 1/6th seems like a potentially high internal failure rate (or other fundamental problem.)
    Reply
  • PeachNCream - Thursday, November 9, 2017 - link

    It seems like current 3D XPoint doesn't have enough endurance yet to sit in a DIMM slot unless it's gonna be just a storage drive in DIMM form factor. That and because we're only just now seeing early enterprise and retail products, I bet that we're gonna need another generation or two before we get DIMM versions. :( Reply
  • Billy Tallis - Thursday, November 9, 2017 - link

    Intel hasn't said much about 3D XPoint DIMMs this year, other than to promise we'll hear more next year.

    It's not clear how much fault tolerance Intel is trying to build in to the Optane SSDs with the extra capacity. A bit of it is necessary to support sector formats with protection information (eg. 16 extra bytes per 512B sector, or 128B extra for each 4kB sector). Beyond that, there needs to be room for the drive's internal data structures, which aren't as complicated as a flash translation layer but still impose some space overhead. The rest is probably taken by a fairly simple erasure coding/ECC scheme, because it's almost impossible to do LDPC at the speed necessary for this drive. (That's also why DIMMs use simple ECC instead of more space-efficient codes.)
    Reply
  • woggs - Thursday, November 9, 2017 - link

    Most all intel SSDs have a parity die, so one full die likely provides internal raid protection of data. The rest is for ECC, internal system information, media management and mapping out defects... Impossible to know which of these is driving the actual spare implemented. I count 14 packages, so 1/14th (7%) is already the internal parity. 16% is big relative to nand consumer SSDs but comparable to enterprise. Doesn't seem particularly out of line or indicative of something wrong. Reply
  • CheapSushi - Thursday, November 9, 2017 - link

    Micron is probably working on the DIMM version. Reply
  • woggs - Tuesday, November 14, 2017 - link

    Intel is working on DIMMs... "Now, of course, SSDs are important, but in the long run, Intel also wants to have Optane 3D XPoint memory slot into the same sockets as DDR4 main memory, and Krzanich brought a mechanical model of an Optane DIMM to show off." https://www.nextplatform.com/2015/10/28/intel-show... Reply
  • MajGenRelativity - Thursday, November 9, 2017 - link

    I enjoyed the review. Keep up the good work! Reply
  • melgross - Thursday, November 9, 2017 - link

    I’m curious as to how this will perform when PCI 4 is out next year. That is, one with a PCI 4 interface. How throughput limiting is PCI 3 for this right now? Reply
  • MajGenRelativity - Thursday, November 9, 2017 - link

    It shouldn't be that limiting, as PCIe 3.0 x4 allows for a higher throughput than 2.4 GB/s. There could be some latency improvements (probably small), but I don't think throughput is the issue Reply
  • woggs - Thursday, November 9, 2017 - link

    If the question is "could a drive be made to saturate gen 4?" then, yes, of course, if intel chooses to do so. Will require a whole drive. Latency is a more interesting question because that is what 3dxp is really providing. QD1 latency is <10us ((impressive!). I don't expect that to improve since it should be limited by the 3dxp itself. The PCIe and driver overhead is probably 5us of that. Maybe gen 4 will improve that part. Reply

Log in

Don't have an account? Sign up now