The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Name: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Item: The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching
Author: Billy Tallis

by Billy Tallis on April 24, 2017 12:00 PM EST

110 Comments | Add A Comment

110 Comments

Random Read

Random read speed is the most difficult performance metric for flash-based SSDs to improve on. There is very limited opportunity for a drive to do useful prefetching or caching, and parallelism from multiple dies and channels can only help at higher queue depths. The NVMe protocol reduces overhead slightly, but even a high-end enterprise PCIe SSDs can struggle to offer random read throughput that would saturate a SATA link.

Real-world random reads are often blocking operations for an application, such as when traversing the filesystem to look up which logical blocks store the contents of a file. Opening even an non-fragmented file can require the OS to perform a chain of several random reads, and since each is dependent on the result of the last, they cannot be queued.

These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

Our first test of random read performance looks at the dependence on transfer size. Most SSDs focus on 4kB random access as that is the most common page size for virtual memory systems and it is a common filesystem block size. For our test, each transfer size was tested for four minutes and the statistics exclude the first minute. The drives were preconditioned to steady state by filling them with 4kB random writes twice over.


Vertical Axis scale:	Linear	Logarithmic

The Optane Memory module manages to provide slightly higher performance than even the P4800X for small random reads, though it levels out at about half the performance for larger transfers. The Samsung 960 EVO starts out about ten times slower than the Optane Memory but narrows the gap in the second half of the test. The Crucial MX300 is behind the Optane memory by more than a factor of ten through most of the test.

Queue Depth >1

Next, we consider 4kB random read performance at queue depths greater than one. A single-threaded process is not capable of saturating the Optane SSD DC P4800X with random reads so this test is conducted with up to four threads. The queue depths of each thread are adjusted so that the queue depth seen by the SSD varies from 1 to 16. The timing is the same as for the other tests: four minutes for each tested queue depth, with the first minute excluded from the statistics.

The SATA, flash NVMe and two Optane products are each clearly occupying different regimes of performance, though there is some overlap between the two Optane devices. Except at QD1, the Optane Memory offers lower throughput and higher latency than the P4800X. By QD16 the Samsung 960 EVO is able to exceed the throughput of the Optane Memory at QD1, but only with an order of magnitude more latency.


Vertical Axis scale:	Linear	Logarithmic

Comparing random read throughput of the Optane SSDs against the flash SSDs at low queue depths requires plotting on a log scale. The Optane Memory's lead over the Samsung 960 EVO is much larger than the 960 EVO's lead over the Crucial MX300. Even at QD16 the Optane Memory holds on to a 2x advantage over the 960 EVO and a 6x advantage over the MX300. Over the course of the test from QD1 to QD16, the Optane Memory's random read throughput roughly triples.


Mean	Median	99th Percentile	99.999th Percentile

For mean and median random read latency, the two Optane drives are relatively close at low queue depths and far faster than either flash SSD. The 99th and 99.999th percentile latencies of the Samsung 960 EVO are only about twice as high as the Optane Memory while the Crucial MX300 falls further behind with outliers in excess of 20ms.

Random Write

Flash memory write operations are far slower than read operations. This is not always reflected in the performance specifications of SSDs because writes can be deferred and combined, allowing the SSD to signal completion before the data has actually moved from the drive's cache to the flash memory. Consumer workloads consist of far more reads than writes, but there are enough sources of random writes that they also matter to everyday interactive use. These tests were conducted on the Optane Memory as a standalone SSD, not in any caching configuration.

Queue Depth 1

As with random reads, we first examine QD1 random write performance of different transfer sizes. 4kB is usually the most important size, but some applications will make smaller writes when the drive has a 512B sector size. Larger transfer sizes make the workload somewhat less random, reducing the amount of bookkeeping the SSD controller needs to do and generally allowing for increased performance.


Vertical Axis scale:	Linear	Logarithmic

As with random reads, the Optane Memory holds a slight advantage over the P4800X for the smallest transfer sizes, but the enterprise Optane drive completely blows away the consumer Optane Memory for larger transfers. The consumer flash SSDs perform quite similarly in this steady-state test and are consistently about an order of magnitude slower than the Optane Memory.

Queue Depth >1

The test of 4kB random write throughput at different queue depths is structured identically to its counterpart random write test above. Queue depths from 1 to 16 are tested, with up to four threads used to generate this workload. Each tested queue depth is run for four minutes and the first minute is ignored when computing the statistics.

With the Optane SSD DC P4800X included on this graph, the two flash SSDs are have barely perceptible random write throughput, and the Optane Memory's throughput and latency both fall roughly in the middle of the gap between the P4800X and the flash SSDs. The random write latency of the Optane Memory is more than twice that of the P4800X at QD1 and is close to the latency of the Samsung 960 EVO, while the Crucial MX300 starts at about twice that latency.


Vertical Axis scale:	Linear	Logarithmic

When testing across the range of queue depths and at steady state, the 525GB Crucial MX300 is always delivering higher throughput than the Samsung 960 EVO, but with substantial inconsistency at higher queue depths. The Optane Memory almost doubles in throughput from QD1 to QD2, and is completely flat thereafter while the P4800X continues to improve until QD8.


Mean	Median	99th Percentile	99.999th Percentile

The Optane Memory and Samsung 960 EVO start out with the same median latency at QD1 and QD2 of about 20µs. The Optane Memory's latency increases linearly with queue depth after that due to its throughput being saturated, but the 960 EVO's latency stays lower until near the end of the test. The Samsung 960 EVO has relatively poor 99th percentile latency to begin with and is joined by the Crucial MX300 once it has saturated its throughput, while the Optane Memory's latency degrades gradually in the face of overwhelming queue depths. The 99.999th percentile latency of the flash-based consumer SSDs is about 300-400 times that of the Optane Memory.

SYSmark 2014 SE Sequential Access Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

110 Comments

View All Comments

name99 - Tuesday, April 25, 2017 - link
Why are you so sure you understand the technology? Intel has told us nothing about how it works.
What we have are
- a bunch of promises from Intel that are DRAMATICALLY not met
- an exceptionally lousy (expensive, low capacity) product being sold.
You can interpret these in many ways, but the interpretation that "Intel over promised and dramatically underdelivered" is certainly every bit as legit as the interpretation "just wait, the next version (which ships when?) will be super-awesome".

If Optane is capable TODAY of density comparable to NAND, then why ship such a lousy capacity? And if it's not capable, then what makes you so sure that it can reach NAND density? Getting 3D-NAND to work was not a cheap exercise. Does Intel have the stomach (and the project management skills) to last till that point, especially given that the PoS that they're shipping today ain't gonna generate enough of a revenue stream to pay for the electric bill of the Optane team while they take however long they need to get out the next generation.
emn13 - Tuesday, April 25, 2017 - link
Intel hasn't confirmed what it is, but AFAICT all the signs point to xpoint being phase-change ram, or at least very similar to it. Which still leaves a lot of wiggle room, of course.
ddriver - Tuesday, April 25, 2017 - link
IIRC they have explicitly denied xpoint being PCM. But then again, who would ever trust a corporate entity, and why?
Cellar - Tuesday, April 25, 2017 - link
Implying Intel would only use only the revenue of Optane to fund their next generation of Optane. You forget how much profit they make milking out their Processors? *Insert Woody Harrelson wiping tears away with money gif*
name99 - Tuesday, April 25, 2017 - link
Be careful. What he's criticizing is the HYPE (ie Intel's business plan for this technology) rather than the technology itself, and in that respect he is basically correct. It's hard to see what more Intel could have done to give this technology a bad name.

- We start with the ridiculous expectations that were made for it. Most importantly the impression given that the RAM-REPLACEMENT version (which is what actually changes things, not a faster SSD) was just around the corner.

- Then we get this attempt to sell to the consumer market a product that makes ZERO sense for consumers along any dimension. The product may have a place in enterprise (where there's often value in exceptionally fast, albeit expensive, particular types of storage), but for consumers there's nothing of value here. Seriously, ignore the numbers, think EXPERIENCE. In what way is the Optane+hard drive experience better than the larger SSD+hard drive or even large SSD and no hard drive experience at the same price points. What, in the CONSUMER experience, takes advantage of the particular strengths of Optane?

- Then we get this idiotic power management nonsense, which reduces the value even further for a certain (now larger than desktop) segment of mobile computing

- And the enforced tying of the whole thing to particular Intel chipsets just shrinks the potential market even further. For example --- you know who's always investigating potential storage solutions and how they could be faster? Apple. It is conceivable (obviously in the absence of data none of us knows, and Intel won't provide the data) that a fusion drive consisting of, say, 4GB of Optane fused to an iPhone or iPad's 64 or 128 or 256GB could have advantages in terms of either performance or power. (I'm thinking particularly for power in terms of allowing small writes to coalesce in the Optane.)
But Intel seems utterly uninterested in investigating any sort of market outside the tiny tiny market it has defined.

Maybe Optane has the POTENTIAL to be great tech in three years. (Who knows since, as I said, right know what it ACTUALLY is is a secret, along with all its real full spectrum of characteristics).
But as a product launch, this is a disaster. Worse than all those previous Intel disasters whose names you've forgotten like ViiV or Intel Play or the Intel Personal Audio Player 3000 or the Intel Dot.Station.
Reflex - Tuesday, April 25, 2017 - link
Meanwhile in the server space we are pretty happy with what we've seen so far. I get that its not the holy grail you expected, but honestly I didn't read Intel's early info as an expectation that gen1 would be all things to all people and revolutionize the industry. What I saw, and what was delivered, was a path forward past the world of NAND and many of its limitations, with the potential to do more down the road.

Today, in low volume and limited form factors it likely will sell all that Intel can produce. My guess is that it will continue to move into the broader space as it improves incrementally generation over generation, like most new memory products have done. Honestly the greatest accomplishment here is Intel and Micron finally introducing a new memory type, at production quantity, with a reasonable cost for its initial markets. We've spent years hearing about phase-change, racetrack, memrister, MRAM and on and on, and nobody has managed to introduce anything at volume since NAND. This is a major milestone, and hopefully it touches off a race between Optane and other technologies that have been in the permanent 3-5 year bucket for a decade plus.
ddriver - Tuesday, April 25, 2017 - link
Yeah, I bet you are offering hypetane boards by the dozens LOL. But shouldn't it be more like "in the _servers that don't serve anyone_ space" since in order to take advantage of them low queue depth transfers and latencies, such a s "server" would have to serve what, like a client or two?

I don't claim to be a "server specialist" like you apparently do, but I'd say if a server doesn't have a good saturation, they either your business sucks and you don't have any clients or you have more servers than you need and should cut back until you get a good saturation.

To what kind of servers is it that beneficial to shave off a few microseconds of data access? And again, only in low queue depth loads? I'd understand if hypetane stayed equally responsive regardless of the load, but as the load increases we see it dwindling down to the performance of available nand SSDs. Which means you won't be saving on say query time when the system is actually busy, and when the system is not it will be snappy enough as it is, without magical hypetane storage. After all, servers serve networks, and even local networks are slow enough to completely mask out them "tremendous" SSD latencies. And if we are talking an "internet" server, then the network latency is much, much worse than that.

You also evidently don't understand how the industry works. It is never about "the best thing that can be done", it is always about "the most profitable thing that can be done". As I've repeated many times, even NAND flash can be made tremendously faster, in terms of both latency and bandwidth, it is perfectly possible today and it has been technologically possible for years. Much like it has been possible to make cars that go 200 MPH, yet we only see a tiny fraction of the cars that are actually capable to make that speed. There has been a small but steady market for mram, but that's a niche product, it will never be mainstream because of technological limitations. It is pretty much the same thing with hypetane, regardless of how much intel are trying to shove it to consumers in useless product forms, it only makes sense in an extremely narrow niche. And it doesn't owe its performance to its "new memory type" but to its improved controller, and even then, its performance doesn't come anywhere close to what good old SLC is capable of technologically as a storage medium, which one should not confuse with a compete product stack.

The x25-e was launched almost 10 years ago. And its controller was very much "with the times" which is the reason the drive does a rather meager 250/170 mb/s. Yet even back then its latency was around 80 microseconds, with its "latest and greatest" hypetane struggling to beat that by a single order of magnitude 10 years later. Yet technologically the SLC PE cycle can go as low as 200 nanoseconds, which is 50 times better than hypetane and 400 times better than what the last pure SLC SSD controller was capable of.

No wonder the industry abandoned SLC - it was and still is too good not only for consumers but also for the enterprise. Which begs the question, with the SLC trump card being available for over a decade why would intel and micron waste money on researching a new media. And whether they really did that, or simply took good old SLC, smeared a bunch of lies, hype and cheap PR on it to step forward and say "here, we did something new".

I mean come on, when was the last time intel made something new? Oh that's right, back when they made netburst, and it ended up a huge flop. And then, where did the rescue come from? Something radically new? Nope, they got back to the same old tried and true, and improved instead of trying to innovate. Which is also what this current situation looks like.

I can honestly think of no better reason to be so secretive about the "amazing new xpoint", unless it actually isn't neither amazing, nor new, nor xpoint. I mean if it s a "tech secret" I don't see how they shouldn't be able to protect their IP via patents, I mean if it really is something new, it is not like they are short on the money it will take to patent it. So there is no good reason to keep it such a secret other than the intent to cultivate mystery over something that is not mysterious at all.
eddman - Tuesday, April 25, 2017 - link
This is what happens when people let their personal feelings get in the way.

"Even if they cure cancer, they still suck and I hate them"
ddriver - Tuesday, April 25, 2017 - link
Except it doesn't cure cancer. And I'd say it is always better to prevent cancer than to have the destructive treatment leave you a diminished being.
eddman - Tuesday, April 25, 2017 - link
Just admit you have a personal hatred towards MS, intel and nvidia, no matter what they do, and be done with it. It's beyond obvious.

The Intel Optane Memory (SSD) Preview: 32GB of Kaby Lake Caching

Random Read

Queue Depth 1

Queue Depth >1

Random Write

Queue Depth 1

Queue Depth >1

Post Your Comment

110 Comments

View All Comments

name99 - Tuesday, April 25, 2017 - link

emn13 - Tuesday, April 25, 2017 - link

ddriver - Tuesday, April 25, 2017 - link

Cellar - Tuesday, April 25, 2017 - link

name99 - Tuesday, April 25, 2017 - link

Reflex - Tuesday, April 25, 2017 - link

ddriver - Tuesday, April 25, 2017 - link

eddman - Tuesday, April 25, 2017 - link

ddriver - Tuesday, April 25, 2017 - link

eddman - Tuesday, April 25, 2017 - link

Log in

Don't have an account? Sign up now