Intel's new 3D XPoint non-volatile memory technology, which has been on the cards publically for the last couple of years, is finally hitting the market as the storage medium for Intel's new flagship enterprise storage platform. The Intel Optane SSD DC P4800X is a PCIe SSD using the standard NVMe protocol, but the use of 3D XPoint memory instead of NAND flash memory allows it to deliver great throughput and much lower access latency than any other NVMe SSD.

3D XPoint

The potential significance of 3D XPoint memory is immense. When it was first publicly announced by Intel and Micron in 2015, 3D XPoint memory was a fundamentally different storage technology from the flash memory that dominates the market. It is the first new truly mass market, high-density solid state storage medium to hit the market since NAND flash itself. It comes at a time where the NAND market is booming like never before, but also at a time when we know that there is a definite end of the line for NAND. The ongoing transition to 3D NAND flash is just a temporary postponement of the fundamental limitations of flash memory. Once NAND can no longer scale in density and cost-per-bit, it will fall to paradigm changes and next-generation memory technologies (one of which will be 3D XPoint) to continue to carry the industry forward. There are many other new memory technologies that may compete alongside flash memory and 3D XPoint in the coming years, but 3D XPoint is the one that's ready to go mainstream now.

In the near term, 3D XPoint is important because it offers a new set of performance tradeoffs entirely unlike NAND; tradeoffs that, for the right applications, can deliver performance far in excess of today's NAND products. By being able to read and write at the bit or word level - and not the 4K+ page level of NAND - 3D XPoint has the potential to deliver excellent performance across a wide range of workloads, but especially in minimally parallel workloads, which are common in the consumer and enterprise spaces.

The drawback here is that, due to various factors regarding time, production, and scope, 3D XPoint is more expensive than NAND. It also comes in as less dense, to aid in ease of production in this first stage, but this also adds to the cost. For now, due to scale and other factors, it won't be able to replicate the sheer capacity and cost effectiveness that has made NAND storage so popular in all market segments. Due to the scale, especially as a first-generation version of the technology, the first 3D XPoint products are being aimed at speciality and high-margin markets: enterprise performance, consumer caching, etc. Future products promised from Intel should add non-volatile DIMMs to the mix, and then later on, if everything goes to plan, a potential wholesale replacement of NAND flash (or at least a strong competitor).

The Intel Optane SSD DC P4800X

The new storage drive, and the focus of today's review, is the Intel Optane SSD DC P4800X. It uses a new NVMe controller Intel developed specifically for use with 3D XPoint memory. Where Intel's enterprise NVMe SSDs like the P3700 use a controller with 18 channels for interfacing to their flash memory, the Optane SSD's controller has only 7 channels. In order to achieve at least parity on peak performance, each of those channels has to provide much higher throughput than on a flash SSD, and it shows that each 3D XPoint memory die is delivering much higher performance than a die of flash memory.

The first capacity of the Optane SSD DC P4800X to ship and the model we've tested here offers a usable capacity of 375GB from a total of 28 3D XPoint memory dies (four per channel) for a raw capacity of 448GB. 3D XPoint memory has better endurance than NAND flash, but not enough to get away without wear levelling. The fine-grained accessibility of 3D XPoint memory gets rid of a lot of the wear leveling and write amplification headaches caused by flash pages and erase blocks being larger than the sector sizes exposed by the drives, but the drive still needs some spare area plus storage for error correction overhead, metadata for tracking the mapping between logical blocks and physical addresses, and potential replacement of bad sectors, similar to normal SSDs.

As with most NVMe SSDs, the Optane SSD DC P4800X supports a configurable sector size. Out of the box it emulates 512B sectors for the sake of compatibility, but using the NVMe FORMAT command it can be switched to emulate 4kB sectors. The larger sector size reduces the amount of metadata the SSD controller has to juggle, so it usually allows for slightly higher performance. The NVMe FORMAT command is also the mechanism for triggering a secure erase of the entire drive, and for flash SSDs the format usually consists of little more than issuing block erase commands to the whole drive. 3D XPoint memory does not have large multi-megabyte erase blocks, so a low-level format of the Optane SSD needs to directly write to the entire drive, which takes about as long as filling it sequentially. Thus, while a 2.4TB flash SSD can perform a low-level format in just over 13 seconds, the 375GB Optane SSD DC P4800X takes six minutes and 47 seconds. This is long enough that unsuspecting software tools or SSD reviewers will give up and assume that the drive has locked up.

Intel Optane SSD DC P4800X Specifications
Capacity 375 GB 750 GB 1.5 TB
Form Factor PCIe HHHL or 2.5" 15mm U.2
Interface PCIe 3.0 x4 NVMe
Controller Intel unnamed
Memory 128Gb 20nm Intel 3D XPoint
Typical Latency (R/W) <10µs
Random Read (4 kB) IOPS (QD16) 550,000 TBA TBA
Random Read 99.999% Latency (QD1) 60µs TBA TBA
Random Read 99.999% Latency (QD16) 150µs TBA TBA
Random Write (4 kB) IOPS (QD16) 500,000 TBA TBA
Random Write 99.999% Latency (QD1) 100µs TBA TBA
Random Write 99.999% Latency (QD16) 200µs TBA TBA
Mixed 70/30 (4kB) Random IOPS (QD16) 500,000 TBA TBA
Endurance 30 DWPD
Warranty 5 years (3 years during early limited release)
MSRP $1520 TBA TBA
Release Date HHHL March 19  Q2 2017 2H 2017
U.2 Q2 2017 2H 2017

So far, Intel has only started shipping the 375GB Optane SSD DC P4800X to select customers, and they have not released detailed specifications for the larger capacities that will ship later this year.

It is worth noting that the performance specifications for the P4800X, as provided in the product specification sheets, cover a different set of metrics than Intel usually reports for their enterprise SSDs. Sequential performance is not mentioned at all, but the product brief has quite a bit to say about latency: average latency for QD1 reads and writes, and 99.999th percentile latency for both reads and writes at QD1 and QD16. The fact that Intel is publishing a five-nines QoS metric at all suggests that they plan to set a new standard for performance consistency.

The throughput claims are also remarkable: half a million IOPS or more for reads, writes and a 70/30 read/write mix. There are already drives on the market that can deliver more than 550k random read IOPS, but those SSDs are far larger than 375GB and they require very high queue depths to hit 550k IOPS. There are even a few multi-TB drives that can beat 500k random write IOPS, but they can't sustain that performance indefinitely. The Optane SSD DC P4800X is promising an unprecedented level of storage performance both in absolute terms and relative to its capacity, so it is interesting to see where Intel is going to lay down its line in the sand.

The P4800X will not really occupy the same niche as the multi-TB monsters that offer comparable throughput. With limited capacity but the highest level of performance, this Optane SSD most closely fits the role of SLC NAND based SSDs. SLC has disappeared from the SSD market as virtually all customers preferred to sacrifice a little bit of performance to double their capacity by using MLC NAND. One of the last high-performance SLC SSDs was the Micron P320h, a PCIe SSD from 2012 that slightly pre-dated NVMe and used 34nm SLC NAND flash. Anyone still using a P320h for its consistent low latency performance will be very interested in the P4800X. Outside of that niche, the Optane SSD will obviously be desirable for its raw throughput, but the low capacity may be problematic for some use cases.

One of the unique and most notable performance advantages of the Optane SSD DC P4800X is that it does not require extremely high queue depths to reach full throughput. Enterprise customers have long had to design their systems around the fact that getting full performance out of the fastest PCIe SSDs requires loading them down with queue depths of 128 or higher, sometimes requiring applications to use dozens of threads for I/O. In the client space achieving such queue depths is outright impossible, and in the enterprise space it doesn't happen for free. The P4800X's high performance at low queue depths makes it a much easier drive to get great real-world performance out of.

Intel originally introduced 3D XPoint memory as having far higher write endurance than NAND flash—on the order of 1000x higher. The Optane SSD DC P4800X is rated for 30 drive writes per day (DWPD) for five years, and the current models shipping during this early limited availability period are only rated for three years, rather than the five years it expects the support for the full retail models. Intel says they're being extremely conservative with a new and unproven technology, and doing the math means that 30 DWPD doesn't provide any endurance advantage over the most highly over-provisioned flash-based enterprise SSDs. In terms of total petabytes written, the P4800X only has four-fifths the endurance of the SLC-based Micron P320h. Even allowing for Intel's original comparisons possibly having been relative to lower-endurance contemporary MLC or TLC flash, it seems like this first generation of 3D XPoint memory is not as durable as originally planned - the headline number of 30 DWPD is aimed at alleviating that issue, however for Intel to match its original intentions then the second and third generation parts will have to be a step up, and we look forward to testing them.

Pricing

The MSRP for the 375GB P4800X is $1520, though it will be quite some time before it can readily be ordered from major online retailers. At slightly more than $4/GB, the P4800X will be almost twice as expensive per GB as Intel's next most pricey SSD, the P3608 (which is really two drives in one plus a PCIe switch). Compared to Intel's fastest single SSD (the P3700), the P4800X will be more than three times as expensive per GB. In the broader SSD market, $4/GB is not completely unprecedented, but most companies selling drives in this price range don't even pretend to have a retail price.

This Review

For this review of the Intel Optane SSD DC P4800X, first, we are going to take a deeper dive into what 3D XPoint actually is. Then we go through our testing suite for enterprise drives, testing Intel's claims on performance.

It is worth noting that there is no such thing as a general-purpose enterprise SSD. Enterprise storage workloads are far more varied than client workloads and it is impossible to make general statements about whether random or sequential performance is more important, what kind of mix of reads and writes to expect, or what queue depth is apporpriate to test with. Real-world application benchmarks are difficult to construct and typically end up being far more narrowly applicable than we would hope. Our strategy for this review is to provide a very broad range of synthetic tests with the knowledge that not all results will be relevant to all use cases. Enterprise customers must know and understand their own workload. Since this is our first time testing anything with 3D XPoint memory, this review includes some new benchmarks that would probably not be applicable to a flash SSDs review, making for some interesting numbers.

3D XPoint Refresher
POST A COMMENT

118 Comments

View All Comments

  • melgross - Tuesday, April 25, 2017 - link

    You're making the mistake those who know nothing make, which is surprising for you. This is a first generation product. It will get much faster, and much cheaper as time goes on. NAND will stagnate. You also have to remember that Intel never made the claim that this was as fast as RAM, or that it would be. The closest they came was to say that this would be in between NAND and RAM in speed. And yes, for some uses, it might be able to replace RAM. But that could be several generations down the road, in possibly 5 years, or so. Reply
  • tuxRoller - Sunday, April 23, 2017 - link

    I'm not sure i understand you.
    You talk about "pages", but, i hope, the reviewer was only using dio, so there would be no page cache.
    It's very unclear where you are getting this "~100x" number. Nvme connected dram has a plurality of hits around 4-6 us (depending on software) but it also has a distributed latency curve. However, i don't know what the latency at the 99.999% percentile. The point is that even with dram's sub-100ns latency, it's still not staying terribly close to the theoretical min latency of the bus.
    Btw, it's not just the controller. A very large amount of latency comes from the block layer itself (amongst other things).
    Reply
  • Santoval - Tuesday, June 06, 2017 - link

    It is quite possible that Intel artificially weakened P4800X's performance and durability in order to avoid internal competition with their SSD division (they already did the same with Atoms). If your new technology is *too* good it might make your other more mainstream technology look bad in comparison and you could see a big drop in sales. Or it might have a "deflationary" effect, where their customers might delay buying in hope of lower prices later. This way they can also have a more clear storage hierarchy, business segment wise, where their mainstream products are good, and their niche ones are better but not too good.

    I am not suggesting that it could ever compete with DRAM, just that the potential of 3D XPoint technology might actually be closer to what they mentioned a year ago than the first products they shipped.
    Reply
  • albert89 - Friday, April 21, 2017 - link

    Intel wont be reducing the price of the optane but rather will be giving the average consumer a watered down version which will be charged at a premium but perform only slightly better then the top SSD. The conclusion ? Another over priced ripoff from Intel. Reply
  • TheinsanegamerN - Thursday, April 20, 2017 - link

    the fastest SSD on the consumer market is the 960 pro, which can hit 3.2GB/s read under certain circumstances.

    This is the equivalent of single channel DDR 400 from 2001. and DDR had far lower latencys to boot.

    We are a long, long way from replacing RAM with storage.
    Reply
  • ddriver - Friday, April 21, 2017 - link

    What makes the most impression is it took a completely different review format to make this product look good. No doubt strictly following intel's own review guidelines. And of course, not a shred of real world application. Enter hypetane - the paper dragon. Reply
  • ddriver - Friday, April 21, 2017 - link

    Also, bandwidth is only one side of the coin. Xpoint is 30-100+ times more latent than dram, meaning the CPU will have to wait 30-100+ times longer before it has data to compute, and dram is already too slow in this aspect, so you really don't want to go any slower.

    I see a niche for hypetane - ram-less systems, sporting very slow CPUs. Only a slow CPU will not be wasted on having to wait on working memory. Server CPUs don't really need to crunch that much data either, if any, which is paradoxical, seeing how intel will only enable avx512 on xeons, so it appears that the "amazingly fast" and overpriced hypetane is at home only in simple low end servers, possibly paired with them many core atom chips. Even overpriced, it will kind of a decent deal, as it offers about 3 times the capacity per dollar as dram, paired with wimpy atoms it could make for a decent simple, low cost, frequent access server.
    Reply
  • frenchy_2001 - Friday, April 21, 2017 - link

    You are missing the usefulness of it entirely.
    Yes, it is a niche product.
    And I even agree, intel is hyping it and offering it for consumer with minimal benefit (beside intel's bottom line).
    But it realistically slots between NAND and DRAM.
    This review shows that it has lower latency than NAND and it has higher density than DRAM.
    This is the play.

    You say it cannot replace DRAM and for most usage (by far) you are true. However, for a small niche that works with very big data sets (like for finace or exploration), having more memory, although slower, will still be much faster than memory + swap (to a slower NAND storage).

    Let me repeat, this is a niche product, but it has its uses.
    Intel marketing is hyping it and trying to use it where its tradeoffs (particularly price) make little sense, but the technology itself is good (if limited).
    Reply
  • wumpus - Sunday, April 23, 2017 - link

    Don't be so sure that latency is keeping it from being used as [secondary] main memory. A 4GB machine can actually function (more or less) for office duty and some iffy gaming capability. I'd strongly suspect that a 4-8GB stack of HBM (preferably the low-cost 512 bit systems, as the CPU really only wants 512bit chunks of memory at a time) with the rest backed by 3dxpoint would still be effective at this high latency. Any improvement is likely to remove latency as something that would stop it (and current software can use the current stack [with PCIe connection] to work 3dxpoint as "swappable ram").

    The endurance may well keep this from happening (it is on par with SLC).

    The other catch is that this is a pretty steep change along the entire memory system. Expect Intel to have huge internal fights as to what the memory map should look like, where the HBM goes (does Intel pay to manufacture an expensive CPU module or foist it on down the line), do you even use HBM (if Ravenridge does, I'd expect that Intel would have to if they tried to use xpoint as main memory)? The big question is what would be the "cache line" of the DRAM memory: the current stack only works with 4k, the CPU "wants" 512 bits, HBM is closer to 4k. 4k looks like a no-brainer, but you still have to put a funky L5/buffer that deals with the huge cache line or waste a ton of [top level, not sure if L3 or L4] cache by giving it 4k cache lines.
    Reply
  • melgross - Tuesday, April 25, 2017 - link

    What is it with you and RAM? This isn't a RAM replacement for most any use. Intel hasn't said that it is. Why are you insisting on comparing it to RAM? Reply

Log in

Don't have an account? Sign up now