The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Name: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Item: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Author: Billy Tallis

by Billy Tallis on April 24, 2017 12:00 PM EST

110 Comments | Add A Comment

110 Comments

Random Read

Random read speed is the most difficult performance metric for flash-based SSDs to improve on. There is very limited opportunity for a drive to do useful prefetching or caching, and parallelism from multiple dies and channels can only help at higher queue depths. The NVMe protocol reduces overhead slightly, but even a high-end enterprise PCIe SSDs can struggle to offer random read throughput that would saturate a SATA link.

Real-world random reads are often blocking operations for an application, such as when traversing the filesystem to look up which logical blocks store the contents of a file. Opening even an non-fragmented file can require the OS to perform a chain of several random reads, and since each is dependent on the result of the last, they cannot be queued.

These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

Our first test of random read performance looks at the dependence on transfer size. Most SSDs focus on 4kB random access as that is the most common page size for virtual memory systems and it is a common filesystem block size. For our test, each transfer size was tested for four minutes and the statistics exclude the first minute. The drives were preconditioned to steady state by filling them with 4kB random writes twice over.


Vertical Axis scale:	Linear	Logarithmic

The Optane Memory module manages to provide slightly higher performance than even the P4800X for small random reads, though it levels out at about half the performance for larger transfers. The Samsung 960 EVO starts out about ten times slower than the Optane Memory but narrows the gap in the second half of the test. The Crucial MX300 is behind the Optane memory by more than a factor of ten through most of the test.

Queue Depth >1

Next, we consider 4kB random read performance at queue depths greater than one. A single-threaded process is not capable of saturating the Optane SSD DC P4800X with random reads so this test is conducted with up to four threads. The queue depths of each thread are adjusted so that the queue depth seen by the SSD varies from 1 to 16. The timing is the same as for the other tests: four minutes for each tested queue depth, with the first minute excluded from the statistics.

The SATA, flash NVMe and two Optane products are each clearly occupying different regimes of performance, though there is some overlap between the two Optane devices. Except at QD1, the Optane Memory offers lower throughput and higher latency than the P4800X. By QD16 the Samsung 960 EVO is able to exceed the throughput of the Optane Memory at QD1, but only with an order of magnitude more latency.


Vertical Axis scale:	Linear	Logarithmic

Comparing random read throughput of the Optane SSDs against the flash SSDs at low queue depths requires plotting on a log scale. The Optane Memory's lead over the Samsung 960 EVO is much larger than the 960 EVO's lead over the Crucial MX300. Even at QD16 the Optane Memory holds on to a 2x advantage over the 960 EVO and a 6x advantage over the MX300. Over the course of the test from QD1 to QD16, the Optane Memory's random read throughput roughly triples.


Mean	Median	99th Percentile	99.999th Percentile

For mean and median random read latency, the two Optane drives are relatively close at low queue depths and far faster than either flash SSD. The 99th and 99.999th percentile latencies of the Samsung 960 EVO are only about twice as high as the Optane Memory while the Crucial MX300 falls further behind with outliers in excess of 20ms.

Random Write

Flash memory write operations are far slower than read operations. This is not always reflected in the performance specifications of SSDs because writes can be deferred and combined, allowing the SSD to signal completion before the data has actually moved from the drive's cache to the flash memory. Consumer workloads consist of far more reads than writes, but there are enough sources of random writes that they also matter to everyday interactive use. These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

As with random reads, we first examine QD1 random write performance of different transfer sizes. 4kB is usually the most important size, but some applications will make smaller writes when the drive has a 512B sector size. Larger transfer sizes make the workload somewhat less random, reducing the amount of bookkeeping the SSD controller needs to do and generally allowing for increased performance.


Vertical Axis scale:	Linear	Logarithmic

As with random reads, the Optane Memory holds a slight advantage over the P4800X for the smallest transfer sizes, but the enterprise Optane drive completely blows away the consumer Optane Memory for larger transfers. The consumer flash SSDs perform quite similarly in this steady-state test and are consistently about an order of magnitude slower than the Optane Memory.

Queue Depth >1

The test of 4kB random write throughput at different queue depths is structured identically to its counterpart random write test above. Queue depths from 1 to 16 are tested, with up to four threads used to generate this workload. Each tested queue depth is run for four minutes and the first minute is ignored when computing the statistics.

With the Optane SSD DC P4800X included on this graph, the two flash SSDs are have barely perceptible random write throughput, and the Optane Memory's throughput and latency both fall roughly in the middle of the gap between the P4800X and the flash SSDs. The random write latency of the Optane Memory is more than twice that of the P4800X at QD1 and is close to the latency of the Samsung 960 EVO, while the Crucial MX300 starts at about twice that latency.


Vertical Axis scale:	Linear	Logarithmic

When testing across the range of queue depths and at steady state, the 525GB Crucial MX300 is always delivering higher throughput than the Samsung 960 EVO, but with substantial inconsistency at higher queue depths. The Optane Memory almost doubles in throughput from QD1 to QD2, and is completely flat thereafter while the P4800X continues to improve until QD8.


Mean	Median	99th Percentile	99.999th Percentile

The Optane Memory and Samsung 960 EVO start out with the same median latency at QD1 and QD2 of about 20µs. The Optane Memory's latency increases linearly with queue depth after that due to its throughput being saturated, but the 960 EVO's latency stays lower until near the end of the test. The Samsung 960 EVO has relatively poor 99th percentile latency to begin with and is joined by the Crucial MX300 once it has saturated its throughput, while the Optane Memory's latency degrades gradually in the face of overwhelming queue depths. The 99.999th percentile latency of the flash-based consumer SSDs is about 300-400 times that of the Optane Memory.

SYSmark 2014 SE Sequential Access Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

110 Comments

View All Comments

BrokenCrayons - Monday, April 24, 2017 - link
A desktop Linux distro would fit nicely on it with room for local file storage. I've lived pretty happily with a netbook that had a 32GB compact flash card on a 2.5 inch SATA adapter that had Linux Mint 17.3 on it. The OS and default applications used less than 8GB of space. I didn't give it a swap partition since 2GB was more than enough RAM under Linux (system was idle at less than 200MB and I never saw it demand more than 1.2GB when I was multi-tasking). As such, there was lots of space to store my music, books, and pics of my cat.
ddriver - Monday, April 24, 2017 - link
And imagine how well DOS will run. And you have ample space for application and data storage. 32 gigs - that's what dreams were made of in the early 90s. Your music, books and cat pics are just icing on the cake. Let me guess, 64 kbit mp3s right?
BrokenCrayons - Monday, April 24, 2017 - link
I'm impressed at the level of your insecurity.
mkozakewich - Thursday, April 27, 2017 - link
I've made the decision to never read any comment with his name above, but sometimes I accidentally miss it.
DanNeely - Monday, April 24, 2017 - link
Looking at the size of it, I'm wondering why they didn't make a 48GB model that would fill up the 80mm stick fully. Or, and unless the 3xpoint dies fully fill the area in the packages make them slightly smaller to support the 2260 form factor (after accounting for the odds and ends at the end of the stick the current design it looks like it's just too big to fit on the smaller size).
CaedenV - Monday, April 24, 2017 - link
Once again, I have to ask.... who on earth is this product for?
So you have a cheap $300 laptop, which is going to have a terrible display, minimal RAM, and a small HDD or eMMC drive... are they expecting these users to spring for one of these drives to choke their CPU?

Maybe a more mainstream $5-900 laptop where price is still ultra competitive. What sales metric does this add to which will promote sales over a cheaper device with seemingly the same specs? Either it will have a SSD onboard already and the performance difference will be un-noticed, or it will have a large HDD and the end-user is going to scratch their heads wondering why 2 seemingly identical computers have 4GB of RAM and 1TB HDD, but one costs $100 more.

Ok, so maybe it is in the premium $1-2000 market. Intel says it isn't aiming at these devices, but they are Intel. Maybe they think a $1-2000 laptop is an 'affordable' mass-market device? Here you are talking about ultrabooks; super slim devices with SSDs... oh, and they only have 1 PCIe slot on board. Just add a 2nd one? Where are you going to put it? Going to add more weight? More thickness? A smaller battery? And even after you manage to cram the part in one of these laptops... what exactly is going to be the performance benefit? An extra half a second when coming out of sleep mode? Word opens in .5 sec instead of .8 sec? Yes, these drives are faster than SSDs... but we are way past the point of where software load times matter at all.

So then what about workstation laptops. That is where these look like they will shine. A video editing laptop, or desktop replacement. And for those few brave souls using such a machine with a single HDD or SSD this seems like it would work well... except I don't know anyone like that. These are production machines, which means RAID1 in case of HDD failure. And this tech does not work with RAID (even though I don't see why not... seems like they could easily integrate this into the RAID controller). But maybe they could use the drive as a 3rd small stand-alone render drive... but that only works in linux, not windows. So, nope, this isn't going to work in this market either.

And that brings us to the desktop market. For the same price/raid concerns this product really doesn't work for desktops either, but the Optate SSDs coming out later this year sound interesting... but here we still have a pretty major issue;
SATA3 vs PCIe m.2 drives have an odd problem. On paper the m.2 drives benchmark amazingly well. And in production environments for rendering they also work really well. But for work applications and games people are reporting that there is little to no difference in performance. Intel is trying to make the claim that the issue is due to access time on the controllers, and that the extremely fast access time on Optane will finally get us past all that. But I don't think that is true. For work applications most of the wait time is either on the CPU or the network connection to the source material. The end-user storage is no longer the limiting factor in these scenarios. For games, much of the load time is in the GPU taking textures and game data and unpackaging them in the GPU's vRAM for use. The CPU and HDD/SSD are largely idle during this process. Even modern HDDs keep up pretty well with their SSD brethren on game load times. This leads me to believe that there is something else that is slowing down the whole process.

And that single bottleneck in the whole thing is Intel. It is their CPUs that have stopped getting faster. It is their RAM management that rather sucks and works the same speed no matter what your RAM is clocked at. It is the whole x86 platform that is stagnant and inefficient which is the real issue here. It is time for Intel to stop focusing on its next die-shrink, and start working on a new modern efficient instruction set and architecture that can take advantage of all this new tech! Backwards compatibility is killing the computer market. Time to make a clean break on the hardware side for a new way of doing things. We can always add software compatibility in an emulation layer so we can still use our old OSs and tools. Its going to be a mess, but we are at a point where it needs to be done.
Cliff34 - Monday, April 24, 2017 - link
It seems to me that this product doesn't really make sense for your average consumer. Let's assume you don't need to upgrade your hardware to use Optane memory as cache, why not just spend the money to get a faster and a bigger SSD drive?

If that's the case, wouldn't it limited to only a few specific case where someone really need the Optane speed?
mkozakewich - Thursday, April 27, 2017 - link
An extra 4 GB of DDR4 seems to be $30-$40, so getting 16 GB of swap drive for the same price might be a good way to go.
I agree that using it for caching seems a little pointless.
zodiacfml - Monday, April 24, 2017 - link
Wow, strong at random perf where SSDs are weak. I guess this will be the drive for me. Next gen please.
p2131471 - Monday, April 24, 2017 - link
I wish you'd make interactive graphs for random reads. Or at least provide numbers in a table. Right now I can only approximate the exact values.

The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Random Read

Queue Depth 1

Queue Depth >1

Random Write

Queue Depth 1

Queue Depth >1

Post Your Comment

110 Comments

View All Comments

BrokenCrayons - Monday, April 24, 2017 - link

ddriver - Monday, April 24, 2017 - link

BrokenCrayons - Monday, April 24, 2017 - link

mkozakewich - Thursday, April 27, 2017 - link

DanNeely - Monday, April 24, 2017 - link

CaedenV - Monday, April 24, 2017 - link

Cliff34 - Monday, April 24, 2017 - link

mkozakewich - Thursday, April 27, 2017 - link

zodiacfml - Monday, April 24, 2017 - link

p2131471 - Monday, April 24, 2017 - link

Log in

Don't have an account? Sign up now