The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance

Name: The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance
Item: The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance
Author: Billy Tallis

by Billy Tallis on April 20, 2017 12:00 PM EST

117 Comments | Add A Comment

117 Comments

Mixed Read/Write Performance

Workloads consisting of a mix of reads and writes can be particularly challenging for flash based SSDs. When a write operation interrupts a string of reads, it will block access to at least one flash chip for a period of time that is substantially longer than a read operation takes. This hurts the latency of any read operations that were waiting on that chip, and with enough write operations throughput can be severely impacted. If the write command triggers an erase operation on one or more flash chips, the traffic jam is many times worse.

The occasional read interrupting a string of write commands doesn't necessarily cause much of a backlog, because writes are usually buffered by the controller anyways. But depending on how much unwritten data the controller is willing to buffer and for how long, a burst of reads could force the drive to begin flushing outstanding writes before they've all been coalesced into optimal sized writes.

The effect of a read still applies to the Optane SSD's 3D XPoint memory, but with greatly reduced severity. Whether a block of reads coming in has an effect depends on how the Optane SSD's controller manages the 3D XPoint memory.

Queue Depth 4

Our first mixed workload test is an extension of what Intel describes in their specifications for throughput of mixed workloads. A total queue depth of 16 is achieved using four worker threads, each performing a mix of random reads and random writes. Instead of just testing a 70% read mixture, the full range from pure reads to pure writes is tested at 10% increments.


Vertical Axis units:	IOPS	MB/s

The Optane SSD's throughput does indeed show the bathtub curve shape that is common for this sort of mixed workload test, but the sides are quite shallow and the minimum (at 40% reads/60% writes) is still 83% of the peak throughput (which occurs with the all-reads workload). While the Optane SSD is operating near 2GB/s the flash SSDs spend most of the test only slightly above 500MB/s. When the portion of writes increases to 70%, the two flash SSDs begin to diverge: the Intel P3700 loses almost half its throughput and only recovers a little of it during the remainder of the test, while the Micron 9100 begins to accelerate and comes much closer to the Optane SSD's level of performance.


Mean	Median	99th Percentile	99.999th Percentile

The median latency curves for the two flash SSDs show a substantial drop when the median operation switches from a read to a cacheable write. The P3700's median latency even briefly drops below that of the Optane SSD, but then the Optane SSD is handling several times the throughput. The 99th and 99.999th percentile latencies for the Optane SSD are relatively flat after jumping a bit when writes are first introduced to the mix. The flash SSDs have far higher 99th and 99.999th percentile latencies through the middle of the test, but much fewer outliers during the pure read and pure write phases.

Adding Writes to a Drive that is Reading

The next mixed workload test takes a different approach and is loosely based on the Aerospike Certification Tool. The read workload is constant throughout the test: a single thread performing 4kB random reads at QD1. Threads performing 4kB random writes at QD1 and throttled to 100MB/s are added to the mix until the drive's throughput is saturated. As the write workload gets heavier, the random read throughput will drop and the read latency will increase.

The three SSDs have very different capacity for random write throughput: the Intel P3700 tops out around 400MB/s, the Micron 9100 can sustain 1GB/s, and the Intel Optane SSD DC P4800X can sustain almost 2GB/s. The Optane SSD's average read latency increases by a factor of 5, but that still enough to provide about 25k read IOPS. The flash SSDs both experience read latency growing by an order of magnitude as write throughput approaches saturation. Even though the Intel P3700 has a much lower capacity for random writes, it provides slightly lower random read latency at its saturation point than the Micron 9100. When comparing the two flash SSDs with the same write load, the Micron 9100 provides far more random read throughput.

Sequential Access Performance Final Words: Is 3D XPoint Ready?

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

117 Comments

View All Comments

ddriver - Friday, April 21, 2017 - link
*450 ns, by which I mean lower by 450 ns. And the current xpoint controller is nowhere near hitting the bottleneck of PCIE. It would take a controller that is at least 20 times faster than the current one to even get to the point where PCIE is a bottleneck. And even faster to see any tangible benefit from connecting xpoint directly to the memory controller.

I'd rather have some nice 3D SLC (better than xpoint in literally every aspect) on PCIE for persistent storage RAM in the dimm slots. Hyped as superior, xpoint is actually nothing but a big compromise. Peak bandwidth is too low even compared to NVME NAND, latency is way too high and endurance is way too low for working memory. Low queue depths performance is good, but credit there goes to the controller, such a controller will hit even better performance with SLC nand. Smarter block management could also double the endurance advantage SLC already has over xpoint.
mdriftmeyer - Saturday, April 22, 2017 - link
ddriver is spot on. just to clarify an early comment. He's correct and the IntelUser2000 is out of his league.
mdriftmeyer - Saturday, April 22, 2017 - link
Spot on.
tuxRoller - Friday, April 21, 2017 - link
We don't know how much slower the media is than dram right now.
We know than using dram over nvme has similar (though much better worst case) perf to this.
See my other post regarding polling and latency.
bcronce - Saturday, April 22, 2017 - link
Re-reading, I see it says "typical" latency is under 10us, placing it in spitting distance of DDR3/4. It's the 99.9999th percentile that is 60us for Q1. At Q16, 99.999th percentile is 140us. That means it takes only 140us to service 16 requests. That's pretty much the same as 10us.

Read Q1 4KiB bandwidth is only about 500MiB/s, but at Q8, it's about 2GiB which puts it on par with DDR4-2400.
ddriver - Saturday, April 22, 2017 - link
"placing it in spitting distance of DDR3/4"

I hope you do realize that dram latency is like 50 NANOseconds, and 1 MICROsecond is 1000 NANOseconds.

So 10 us is actually 200 times as much as 50 ns. Thus making hypetane about 200 times slower in access latency. Not 200%, 200X.
tuxRoller - Saturday, April 22, 2017 - link
Yes, the dram media is that fast but when it's exposed through nvme it has the latency characteristics that bcronce described.
wumpus - Sunday, April 23, 2017 - link
That's only on a page hit. For the type of operations that 3dxpoint is looking at (4k or so) you won't find it on an open page and thus take 2-3 times as long till it is ready.

That still leaves you with ~100x latency. And we are still wondering if losing the PCIe controller will make any significant difference to this number (one problem is that if Intel/Micron magically fixed this, the endurance is only slightly better than SLC and would quickly die if used as main memory).
ddriver - Sunday, April 23, 2017 - link
Endurance for the initial batch postulated from intel's warranty would be around 30k PE cycles, and 50k for the upcoming generation. That's not "only slightly better than SLC" as SCL has 100k PE cycles endurance. But the 100k figure is somewhat old, and endurance goes down with process node. So at a comparable process, SLC might be going down, approaching 50k.

It remains to be seen, the lousy industry is penny pinching and producing artificial NAND shortages to milk people as much as possible, and pretty much all the wafers are going into TLC, some MLC and why oh why, QLC trash.

I guess they are saving the best for last. 3D SLC will address the lower density, samsung currently has 2 TB MLC M2, so 1 TB is perfectly doable via 3D SLC. I am guessing samsung's z-nand will be exactly that - SLC making a long overdue comeback.
tuxRoller - Sunday, April 23, 2017 - link
The endurance issue is, imho, the biggest concern right now.

The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance

Mixed Read/Write Performance

Queue Depth 4

Adding Writes to a Drive that is Reading

Post Your Comment

117 Comments

View All Comments

ddriver - Friday, April 21, 2017 - link

mdriftmeyer - Saturday, April 22, 2017 - link

mdriftmeyer - Saturday, April 22, 2017 - link

tuxRoller - Friday, April 21, 2017 - link

bcronce - Saturday, April 22, 2017 - link

ddriver - Saturday, April 22, 2017 - link

tuxRoller - Saturday, April 22, 2017 - link

wumpus - Sunday, April 23, 2017 - link

ddriver - Sunday, April 23, 2017 - link

tuxRoller - Sunday, April 23, 2017 - link

Log in

Don't have an account? Sign up now