3D XPoint Refresher

Intel's 3D XPoint memory technology is fundamentally very different from NAND flash. Intel has not clarified any more low-level details since their initial joint announcement with Micron of this technology, so our analysis from 2015 is still largely relevant. The industry consensus is that 3D XPoint is something along the lines of a phase change memory or conductive bridging resistive RAM, but we won't know for sure until third parties put 3D XPoint memory under an electron microscope.

Even without knowing the precise details, the high-level structure of 3D XPoint confers some significant advantages and disadvantages relative to NAND flash or DRAM. 3D XPoint can be read or written at the bit or word level, which greatly simplifies random access and wear leveling as compared to the multi-kB pages that NAND flash uses for read or program operations and the multi-MB blocks used for erase operations. Where DRAM requires a transistor for each memory cell, 3D XPoint isolates cells from each other by stacking them each in series with a diode-like selector. This frees up 3D XPoint to use a multi-layer structure, though not one that is as easy to manufacture as 3D NAND flash. This initial iteration of 3D XPoint uses just two layers and provides a per-die capacity of 128Gb, a step or two behind NAND flash but far ahead of the density of DRAM. 3D XPoint is currently storing just one bit per memory cell while today's NAND flash is mostly storing two or three bits per cell. Intel has indicated that the technology they are using, with sufficient R&D, can support more bits per cell to help raise density.

The general idea of a resistive memory cell paired with a selector and built at the intersections of word and bit lines is not unique to 3D XPoint memory. The term "crosspoint" has been used to describe several memory technologies with similar high-level architectures but different implementation details. As one Intel employee has explained, it is relatively easy to discover a material that exhibits hysteresis and thus has the potential to be used as a memory cell. The hard part is desiging a memory cell and selector that are fast, durable, and manufacturable at scale. The greatest value in Intel's 3D XPoint technology is not the high-level design but the specific materials and manufacturing methods that make it a practical invention. It has been noted by some analysts that the turning point for technologies such as 3D XPoint may very well be in the development in the selector itself, which is believed to be a Schottky diode or an ovonic selector.

In addition to the advantages that any resistive memory built on a crosspoint array can expect, Intel's 3D XPoint memory is supposed to offer substantially higher write endurance than NAND flash, and much lower read and write times. Intel has only quantified the low-level performance of 3D XPoint memory with rough order of magnitude comparisons against DRAM and NAND flash in general, so this test of the Optane SSD DC P4800X is the first chance to get some precise data. Unfortunately, we're only indirectly observing the capabilities of 3D XPoint, because the Optane SSD is still a PCIe SSD with a controller translating the block-oriented NVMe protocol and providing wear leveling.

The only other Optane product Intel has announced so far is another PCIe SSD, but on an entirely different scale: the Optane Memory product for consumers uses just one or two 3D XPoint chips and is intended to serve as a 32GB cache device accelerating access to a mechanical hard drive or slower SATA SSD. Next year Intel will start talking about putting 3D XPoint on DIMMs, and by then if not sooner we should have more low-level information about 3D XPoint technology.

Introduction Test Configurations
Comments Locked

117 Comments

View All Comments

  • extide - Thursday, April 20, 2017 - link

    Queue depth is how many commands the computer has queued up for the drive. The computer can issue commands to the drive faster than it can service them -- so, for example, SATA can support a queue of up to 32 commands. Typical desktop use just doesn't generate enough traffic on the drives to queue up much data so you usually are in the low 1-2, maybe 4 QD. Some server workloads can be higher, but even on a DB server, if you are seeing QD's of 16 I would say your storage is not fast enough for what you are trying to do, so being able to get good performance at low queue depths is truly a breakthrough.
  • bcronce - Thursday, April 20, 2017 - link

    For file servers, it's not just the queue depth that's important, it's the number of queues. FreeBSD and OpenZFS have had a lot of blogs and videos about the issues of scaling up servers, especially in regards to multi-core.

    SATA only supports 1 queue. NVMe supports up to ~65,000 with a depth of ~65,000 each. They're actually having issues saturating high end SSDs because their IO stack can't handle the throughput.

    If you have a lot of SATA drives, then you effectively have many queues, but if you want a single/few super fast device(s), like say L2ARC, you need to take advantage of the new protocol.
  • tuxRoller - Friday, April 21, 2017 - link

    The answer is something like the Linux kernel's block multiqueue (ongoing, still not the default for all devices but it shouldn't be more than a few more cycles). Its been a massive undertaking and involved rewriting many drivers.

    https://lwn.net/Articles/552904/
  • Shadowmaster625 - Thursday, April 20, 2017 - link

    It is a pity intel doesnt make video cards, because 16GB of this would go very well with 4GB of RAM and a decent memory controller. It would lower the overall cost and not impact performance at all.
  • ddriver - Friday, April 21, 2017 - link

    "It would lower the overall cost and not impact performance at all."

    Yeah, I bet. /s
  • Mugur - Friday, April 21, 2017 - link

    I think I read something like this when i740 was launched... :-)

    Sorry, couldn't resist. But the analogy stands.
  • ridic987 - Friday, April 21, 2017 - link

    "It would lower the overall cost and not impact performance at all."

    What? This stuff is around 50x slower than DRAM, which itself is reaching its limits in GPUs, hence features like delta color compression... Right now when your gpu runs out of ram it uses your system ram as extra space, this is a far better system.
  • anynigma - Thursday, April 20, 2017 - link

    "Intel's new 3D XPoint non-volatile memory technology, which has been on the cards publically for the last couple of years"

    I think you mean "IN the cards". In this context, "ON the cards" makes it sound like we've all been missing out on 3D xPoint PCI cards for a "couple of years" :)
  • SaolDan - Thursday, April 20, 2017 - link

    bI think he means it like Its been in the works publicly for a couple of years.
  • DrunkenDonkey - Thursday, April 20, 2017 - link

    A bit of a suggestion - can you divide (or provide in final thoughts) SSD reviews per consumer base? Desktop user absolutely does not care about sequential performance or QD16 or even write for what matters (except for the odd time installing something). Database can't care less about sequential or low QD, etc. Giving the tables is good for the odd few % of the readers that actually know what to look for, the rest just take a look at the end of the graph and take a stunningly wrong idea. Just a few comparisons tailored per use will make it so easy for the masses. It was Anand that fought for that during the early sandforce days, he forced ocz to reconsider their ways to tweak SSDs for real world performance, not graph based and got me as a follower. Let that not die in vain and let those, that lack the specific knowledge be informed. Just look at the comments and see how people interpret the results.
    I know this is enterprise grade SSD, but it is also a showcase for a new technology that will come in our hands soonish.

Log in

Don't have an account? Sign up now