Performance VS Transfer Size

Intel's 3D XPoint memory is not bound by the page and erase block structure of NAND flash. Intel hasn't disclosed what the native word size of the 3D XPoint memory array is, but the Optane SSD DC P4800X as a whole is optimized for 4kB or larger transfers. Real-world I/O isn't always constrained to just the 4kB and 128kB transfer sizes most of our synthetic benchmarks use, so the next several tests look at single-threaded QD1 performance of transfers ranging from a single 512-byte sector up to 1MB blocks. For random reads and writes, the impact of issuing commands with suboptimal alignment is also considered. Depending on how a drive is organized internally, unaligned accesses can significantly increase the amount of work the controller needs to do.

Random Read
  Vertical Axis scale:
Block Aligned Linear Logarithmic
512B Aligned Linear Logarithmic

The random read performance of the Optane SSDs is completely beyond the reach of the flash based SSDs, even for large block sizes. Even when the reads aren't aligned to the block size or to the drive's preferred 4kB alignment, performance is still great until the block size reaches 128kB, the largest the Optane SSD can transfer from a single command. At that point, the Optane SSDs slow down slightly before performance continues to grow at a more moderate pace.

Random Read
  Vertical Axis scale:
Block Aligned Linear Logarithmic
512B Aligned Linear Logarithmic

The random write performance of the Optane SSDs is not much better than the Intel P3700, especially for writes smaller than 4kB. The Intel P3608 and Micron 9100 MAX are slower across the entire range of block sizes. For unaligned writes of larger block sizes, the Optane SSDs hit a hard limit at around 1.3GB/s while properly aligned writes can approach 2GB/s for large blocks.

 

Random Read
Vertical Axis scale: Linear Logarithmic

The Intel P3700 delivers a slightly higher QD1 sequential read throughput for small block sizes, likely due to the controller caching whole NAND pages in RAM. At the larger block sizes more typically used for sequential I/O, the Optane SSDs are on top and the many of the flash-based SSDs cannot reach full performance without going beyond QD1.

Random Read
Vertical Axis scale: Linear Logarithmic

None of the SSDs perform particularly well for writes smaller than 4kB, but the Optane SSDs do have a clear advantage. As transfer size grows, the Optane SSDs pick up speed faster than the flash-based SSDs. The Intel P3608 is the first to start hitting a performance ceiling, while the Micron 9100 is almost able to catch up to the Optane SSDs.

Fine Tuning Performance Single-Threaded Performance
Comments Locked

58 Comments

View All Comments

  • tuxRoller - Friday, November 10, 2017 - link

    Since this is for enterprise, the os vendor would be the one responsible (so, yes, third party) and one of the reasons why you pay them ridiculous support fees is for them to be your single point of contact for most issues.
  • tuxRoller - Friday, November 10, 2017 - link

    Very nice write-up.
    Might it be possible for us to get an idea of the difference in cell access times by running a couple tests on a loop device, and, even better, purely dram-based storage accessed over pcie?
  • Pork@III - Friday, November 10, 2017 - link

    Has no normal only speed test? What are these creepy creations of this vc that?
  • romrunning - Friday, November 10, 2017 - link

    Is there any tests of the 4800X in a virtual host? Either Hyper-V or ESX, running multiple server OS clients with a variety of workloads. With the kind of low latency shown, I'd love to see how much more responsive Optane is compared to all flash storage like a P3608. Sort of a" rising tide floats all ships" kind of improvement, I hope.
  • Klimax - Sunday, November 12, 2017 - link

    That's nice review. How about some test using Windows too. (Aka something with more advanced I/O subsystem)
  • Billy Tallis - Monday, November 13, 2017 - link

    I'm not sure what you mean. Nobody seriously considers the Windows I/O system to be more advanced than what Linux provides. Even Intel's documentation states that the best latency they can get out of the Optane SSD on Windows is a few microseconds slower than on the Linux NVMe driver, and on Linux a few more microseconds can be saved using SPDK.
  • tuxRoller - Tuesday, November 14, 2017 - link

    "Advanced" may be the wrong way to look at it because ntkrnl can perform both sync and async operations, while Linux is essentially a sync-based kernel (the limitations surrounding its aio system are legendary). However, by focusing on doing that one thing well the block subsystem has become highly optimized for enterprise workloads.
    Btw, is there any chance you could run that block system (and nvme protocol, if possible) overhead test i asked about?

Log in

Don't have an account? Sign up now