Performance VS Transfer Size

Intel's 3D XPoint memory is not bound by the page and erase block structure of NAND flash. Intel hasn't disclosed what the native word size of the 3D XPoint memory array is, but the Optane SSD DC P4800X as a whole is optimized for 4kB or larger transfers. Real-world I/O isn't always constrained to just the 4kB and 128kB transfer sizes most of our synthetic benchmarks use, so the next several tests look at single-threaded QD1 performance of transfers ranging from a single 512-byte sector up to 1MB blocks. For random reads and writes, the impact of issuing commands with suboptimal alignment is also considered. Depending on how a drive is organized internally, unaligned accesses can significantly increase the amount of work the controller needs to do.

Random Read
  Vertical Axis scale:
Block Aligned Linear Logarithmic
512B Aligned Linear Logarithmic

The random read performance of the Optane SSDs is completely beyond the reach of the flash based SSDs, even for large block sizes. Even when the reads aren't aligned to the block size or to the drive's preferred 4kB alignment, performance is still great until the block size reaches 128kB, the largest the Optane SSD can transfer from a single command. At that point, the Optane SSDs slow down slightly before performance continues to grow at a more moderate pace.

Random Read
  Vertical Axis scale:
Block Aligned Linear Logarithmic
512B Aligned Linear Logarithmic

The random write performance of the Optane SSDs is not much better than the Intel P3700, especially for writes smaller than 4kB. The Intel P3608 and Micron 9100 MAX are slower across the entire range of block sizes. For unaligned writes of larger block sizes, the Optane SSDs hit a hard limit at around 1.3GB/s while properly aligned writes can approach 2GB/s for large blocks.

 

Random Read
Vertical Axis scale: Linear Logarithmic

The Intel P3700 delivers a slightly higher QD1 sequential read throughput for small block sizes, likely due to the controller caching whole NAND pages in RAM. At the larger block sizes more typically used for sequential I/O, the Optane SSDs are on top and the many of the flash-based SSDs cannot reach full performance without going beyond QD1.

Random Read
Vertical Axis scale: Linear Logarithmic

None of the SSDs perform particularly well for writes smaller than 4kB, but the Optane SSDs do have a clear advantage. As transfer size grows, the Optane SSDs pick up speed faster than the flash-based SSDs. The Intel P3608 is the first to start hitting a performance ceiling, while the Micron 9100 is almost able to catch up to the Optane SSDs.

Fine Tuning Performance Single-Threaded Performance
Comments Locked

58 Comments

View All Comments

  • Elstar - Thursday, November 9, 2017 - link

    I thought the whole point of 3D XPoint memory would be that it is DIMM friendly too.
    1) When will we see this in DIMM form?
    2) Would the DIMM version also need/have the reserved/spare capacity?
    3) Why is this spare capacity even needed? 1/6th seems like a potentially high internal failure rate (or other fundamental problem.)
  • PeachNCream - Thursday, November 9, 2017 - link

    It seems like current 3D XPoint doesn't have enough endurance yet to sit in a DIMM slot unless it's gonna be just a storage drive in DIMM form factor. That and because we're only just now seeing early enterprise and retail products, I bet that we're gonna need another generation or two before we get DIMM versions. :(
  • Billy Tallis - Thursday, November 9, 2017 - link

    Intel hasn't said much about 3D XPoint DIMMs this year, other than to promise we'll hear more next year.

    It's not clear how much fault tolerance Intel is trying to build in to the Optane SSDs with the extra capacity. A bit of it is necessary to support sector formats with protection information (eg. 16 extra bytes per 512B sector, or 128B extra for each 4kB sector). Beyond that, there needs to be room for the drive's internal data structures, which aren't as complicated as a flash translation layer but still impose some space overhead. The rest is probably taken by a fairly simple erasure coding/ECC scheme, because it's almost impossible to do LDPC at the speed necessary for this drive. (That's also why DIMMs use simple ECC instead of more space-efficient codes.)
  • woggs - Thursday, November 9, 2017 - link

    Most all intel SSDs have a parity die, so one full die likely provides internal raid protection of data. The rest is for ECC, internal system information, media management and mapping out defects... Impossible to know which of these is driving the actual spare implemented. I count 14 packages, so 1/14th (7%) is already the internal parity. 16% is big relative to nand consumer SSDs but comparable to enterprise. Doesn't seem particularly out of line or indicative of something wrong.
  • CheapSushi - Thursday, November 9, 2017 - link

    Micron is probably working on the DIMM version.
  • woggs - Tuesday, November 14, 2017 - link

    Intel is working on DIMMs... "Now, of course, SSDs are important, but in the long run, Intel also wants to have Optane 3D XPoint memory slot into the same sockets as DDR4 main memory, and Krzanich brought a mechanical model of an Optane DIMM to show off." https://www.nextplatform.com/2015/10/28/intel-show...
  • MajGenRelativity - Thursday, November 9, 2017 - link

    I enjoyed the review. Keep up the good work!
  • melgross - Thursday, November 9, 2017 - link

    I’m curious as to how this will perform when PCI 4 is out next year. That is, one with a PCI 4 interface. How throughput limiting is PCI 3 for this right now?
  • MajGenRelativity - Thursday, November 9, 2017 - link

    It shouldn't be that limiting, as PCIe 3.0 x4 allows for a higher throughput than 2.4 GB/s. There could be some latency improvements (probably small), but I don't think throughput is the issue
  • woggs - Thursday, November 9, 2017 - link

    If the question is "could a drive be made to saturate gen 4?" then, yes, of course, if intel chooses to do so. Will require a whole drive. Latency is a more interesting question because that is what 3dxp is really providing. QD1 latency is <10us ((impressive!). I don't expect that to improve since it should be limited by the 3dxp itself. The PCIe and driver overhead is probably 5us of that. Maybe gen 4 will improve that part.

Log in

Don't have an account? Sign up now