The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Name: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Item: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Author: Billy Tallis

by Billy Tallis on April 24, 2017 12:00 PM EST

110 Comments | Add A Comment

110 Comments

Random Read

Random read speed is the most difficult performance metric for flash-based SSDs to improve on. There is very limited opportunity for a drive to do useful prefetching or caching, and parallelism from multiple dies and channels can only help at higher queue depths. The NVMe protocol reduces overhead slightly, but even a high-end enterprise PCIe SSDs can struggle to offer random read throughput that would saturate a SATA link.

Real-world random reads are often blocking operations for an application, such as when traversing the filesystem to look up which logical blocks store the contents of a file. Opening even an non-fragmented file can require the OS to perform a chain of several random reads, and since each is dependent on the result of the last, they cannot be queued.

These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

Our first test of random read performance looks at the dependence on transfer size. Most SSDs focus on 4kB random access as that is the most common page size for virtual memory systems and it is a common filesystem block size. For our test, each transfer size was tested for four minutes and the statistics exclude the first minute. The drives were preconditioned to steady state by filling them with 4kB random writes twice over.


Vertical Axis scale:	Linear	Logarithmic

The Optane Memory module manages to provide slightly higher performance than even the P4800X for small random reads, though it levels out at about half the performance for larger transfers. The Samsung 960 EVO starts out about ten times slower than the Optane Memory but narrows the gap in the second half of the test. The Crucial MX300 is behind the Optane memory by more than a factor of ten through most of the test.

Queue Depth >1

Next, we consider 4kB random read performance at queue depths greater than one. A single-threaded process is not capable of saturating the Optane SSD DC P4800X with random reads so this test is conducted with up to four threads. The queue depths of each thread are adjusted so that the queue depth seen by the SSD varies from 1 to 16. The timing is the same as for the other tests: four minutes for each tested queue depth, with the first minute excluded from the statistics.

The SATA, flash NVMe and two Optane products are each clearly occupying different regimes of performance, though there is some overlap between the two Optane devices. Except at QD1, the Optane Memory offers lower throughput and higher latency than the P4800X. By QD16 the Samsung 960 EVO is able to exceed the throughput of the Optane Memory at QD1, but only with an order of magnitude more latency.


Vertical Axis scale:	Linear	Logarithmic

Comparing random read throughput of the Optane SSDs against the flash SSDs at low queue depths requires plotting on a log scale. The Optane Memory's lead over the Samsung 960 EVO is much larger than the 960 EVO's lead over the Crucial MX300. Even at QD16 the Optane Memory holds on to a 2x advantage over the 960 EVO and a 6x advantage over the MX300. Over the course of the test from QD1 to QD16, the Optane Memory's random read throughput roughly triples.


Mean	Median	99th Percentile	99.999th Percentile

For mean and median random read latency, the two Optane drives are relatively close at low queue depths and far faster than either flash SSD. The 99th and 99.999th percentile latencies of the Samsung 960 EVO are only about twice as high as the Optane Memory while the Crucial MX300 falls further behind with outliers in excess of 20ms.

Random Write

Flash memory write operations are far slower than read operations. This is not always reflected in the performance specifications of SSDs because writes can be deferred and combined, allowing the SSD to signal completion before the data has actually moved from the drive's cache to the flash memory. Consumer workloads consist of far more reads than writes, but there are enough sources of random writes that they also matter to everyday interactive use. These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

As with random reads, we first examine QD1 random write performance of different transfer sizes. 4kB is usually the most important size, but some applications will make smaller writes when the drive has a 512B sector size. Larger transfer sizes make the workload somewhat less random, reducing the amount of bookkeeping the SSD controller needs to do and generally allowing for increased performance.


Vertical Axis scale:	Linear	Logarithmic

As with random reads, the Optane Memory holds a slight advantage over the P4800X for the smallest transfer sizes, but the enterprise Optane drive completely blows away the consumer Optane Memory for larger transfers. The consumer flash SSDs perform quite similarly in this steady-state test and are consistently about an order of magnitude slower than the Optane Memory.

Queue Depth >1

The test of 4kB random write throughput at different queue depths is structured identically to its counterpart random write test above. Queue depths from 1 to 16 are tested, with up to four threads used to generate this workload. Each tested queue depth is run for four minutes and the first minute is ignored when computing the statistics.

With the Optane SSD DC P4800X included on this graph, the two flash SSDs are have barely perceptible random write throughput, and the Optane Memory's throughput and latency both fall roughly in the middle of the gap between the P4800X and the flash SSDs. The random write latency of the Optane Memory is more than twice that of the P4800X at QD1 and is close to the latency of the Samsung 960 EVO, while the Crucial MX300 starts at about twice that latency.


Vertical Axis scale:	Linear	Logarithmic

When testing across the range of queue depths and at steady state, the 525GB Crucial MX300 is always delivering higher throughput than the Samsung 960 EVO, but with substantial inconsistency at higher queue depths. The Optane Memory almost doubles in throughput from QD1 to QD2, and is completely flat thereafter while the P4800X continues to improve until QD8.


Mean	Median	99th Percentile	99.999th Percentile

The Optane Memory and Samsung 960 EVO start out with the same median latency at QD1 and QD2 of about 20µs. The Optane Memory's latency increases linearly with queue depth after that due to its throughput being saturated, but the 960 EVO's latency stays lower until near the end of the test. The Samsung 960 EVO has relatively poor 99th percentile latency to begin with and is joined by the Crucial MX300 once it has saturated its throughput, while the Optane Memory's latency degrades gradually in the face of overwhelming queue depths. The 99.999th percentile latency of the flash-based consumer SSDs is about 300-400 times that of the Optane Memory.

SYSmark 2014 SE Sequential Access Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

110 Comments

View All Comments

Billy Tallis - Wednesday, April 26, 2017 - link
As long as you have Intel RST RAID disabled for NVMe drives, it'll be accessible as a standard NVMe device and available for use with non-Intel caching software.
fanofanand - Tuesday, April 25, 2017 - link
I came here to read ddriver's "hypetane" rants, and I was not disappointed!
TallestJon96 - Tuesday, April 25, 2017 - link
Too bad about the drive breaking.

As an enthusiast who is gaming 90% of the time with my pc, I don't think this is for me right now. I actually just bought a 960 evo 500gb to compliment my 1 tb 840 evo. Overkill for sure, but I'm happy with it, even if the difference is sometimes subtle.

This technology really excites me. If they can get a system running eith no Dram or Nand, and just use a large block of Xpoint, that could make for a really interesting system. Put 128 gb of this stuff paired with a 2c/4t mobile chip in a laptop, and you could get a really lean system that is fast for every day usage cases (web browsing, video watching, etc).

For my use case, I'd love to have a reason to buy it (no more loading times ever would be very futuristic) but it'll take time to really take off.
MrSpadge - Tuesday, April 25, 2017 - link
> no more loading times

Not going to happen, because there's quite some CPU work involved with loading things.
SanX - Tuesday, April 25, 2017 - link
Blahblahblah indurance, price, consumption, superspeed. Where they are? ROTFLOL At least don't show these shameful speeds if you opened your mouth this loud, Intel. No one will ever look at anything less then 3.5GB/s set by Samsung 960 Pro if you trolled about superspeeds.
cheshirster - Wednesday, April 26, 2017 - link
Is there any technical reasoning why this won't work with older CPU's?
I don't see this being any different than Intel RST.
KAlmquist - Thursday, April 27, 2017 - link
I think that Intel SRT caches reads, whereas the Optane Memory caches both reads and writes. My guess is that when Intel SRT places data in the cache, it doesn't immediately update the non-volatile lookup tables indicating where that data is stored. Instead, it probably waits until a bunch of data has been added, and then records the locations of all of the cached data. The reason for this would be that NAND can only be written in page units. If Intel were to update the non-volatile mapping table every time it added a page of data to the cache, that would double the amount of data written to the caching SSD.

If I'm correct, then with Intel SRT, a power loss can cause some of the data in the SSD cache to be lost. The data itself would still be there, but it won't appear in the lookup table, making it inaccessible. That doesn't matter because SRT only caches reads, so the data lost from the cache will still be on the hard drive.

In contrast, Optane Memory memory presumably updates the mapping table for cached data immediately, taking advantage of the fact that it uses a memory technology that allows small writes. So if you perform a bunch of 4K random writes, the data is written to the Optane storage only, resulting in much higher write performance than you would get with Intel SRT.

In short, I would guess that Optane Memory uses a different caching algorithm than Intel SRT; an algorithm that is only implemented in Intel's latest chipsets.

That's unfortunate, because if Optane Memory were supported using software drivers only (without any chipset support), it would be a very attractive upgrade to older computer systems. At $44 or $77, an Optane Memory device is a lot less expensive than upgrading to an SSD. Instead, Optane Memory is targeted at new systems, where the economics are less compelling.
mkozakewich - Thursday, April 27, 2017 - link
I would really like to see the 16GB Optane filled with system paging file (on a device with 2 or 4 GB of RAM) and then do some general system experience tests. This seems like the perfect solution: The system is pretty good about offloading stuff that's not needed, and pulling needed files into working memory for full speed; and the memory can be offloaded to or loaded from the Optane cache quickly enough that it shouldn't cause many slowdowns when switching between tasks. This seems like the best strategy, in a world where we're still seeing 'pro' devices with 4 GB of RAM.
Ugur - Monday, May 1, 2017 - link
I wish Intel would release Optane sticks/drives of 1-4TB sizes asap and sell them for 100-300 more than SSDS of same size immediately.
I'm kinda disappointed they do this type of tiered rollout where it looks like it'll take ages until i can get an Optane drive at larger sizes for halfway reasonable prices.
Please Intel, make it available asap, i want to buy it.
Thanks =)
abufrejoval - Monday, May 8, 2017 - link
Well the most important thing is that Optane is now real a product on the market, for consumers and enterprise customers. So some Intel senior managers don’t need to get fired or cross off items on their bonus score cards.

Marketing will convince the world that Optane is better, most importantly that only Intel can have it inside: No ARM, no Power no Zen based server shall ever have it.

For the DRAM-replacement variant, that exclusivity had a reason: Without proper firmware support, that won’t work and without special cache flushing instructions it would be too slow or still volatile.
Of course, all of that could be shared with the competition, but who want to give up a practical monopoly, which no competition can contest in court before their money runs out.

For the PCIe variant Intel, chipset and OS dependencies are all artificial, but doesn’t that make things better for everyone? Now people can give up ECC support in cheap Pentiums and instead gain Optane support for a premium on CPUs and chipsets, which use the very same hardware underneath for production cost efficiency. Whoever can sell that, truly deserves their bonus!

Actually, I’d propose they be paid in snake oil.

For the consumer with a linear link between Optane and its downstream storage tier, it means the storage path has twice as many opportunities to fail. For the service technician it means he has four times as many test scenarios to perform. Just think on how that will double again, once Optane does in fact also come to the DIMM socket! Moore’s law is not finished after all! Yeah!

Perhaps Microsoft could be talked into creating a special Optane Edition which offers much better granularity for forensic data storage, and surely there would be plenty of work for security researchers, who just love to find bugs really, really deep down in critical Intel Firmware, which is designed for the lowest Total Cost of TakeOwnership in the industry!

Where others see crisis, Intel creates opportunities!

The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Random Read

Queue Depth 1

Queue Depth >1

Random Write

Queue Depth 1

Queue Depth >1

Post Your Comment

110 Comments

View All Comments

Billy Tallis - Wednesday, April 26, 2017 - link

fanofanand - Tuesday, April 25, 2017 - link

TallestJon96 - Tuesday, April 25, 2017 - link

MrSpadge - Tuesday, April 25, 2017 - link

SanX - Tuesday, April 25, 2017 - link

cheshirster - Wednesday, April 26, 2017 - link

KAlmquist - Thursday, April 27, 2017 - link

mkozakewich - Thursday, April 27, 2017 - link

Ugur - Monday, May 1, 2017 - link

abufrejoval - Monday, May 8, 2017 - link

Log in

Don't have an account? Sign up now