Checking Intel's Numbers

The product brief for the Optane SSD DC P4800X provides a limited set of performance specifications, entirely omitting any standards for sequential throughput. Some latency and throughput targets are provided for 4kB random reads, writes, and a 70/30 mix of reads and writes.

This section has our results for how the Optane SSD measures up to Intel's advertised specifications and how the flash SSDs fare on the same tests. The rest of this review provides deeper analysis of how these drives perform across a range of queue depths, transfer sizes, and read/write mixes.

4kB Random Read at a Queue Depth of 1 (QD1)
Drive Throughput Latency (µs)
MB/s IOPS Mean Median 99th 99.999th
Intel Optane SSD DC P4800X 375GB 413.0 108.3k 8.9 9 10 37
Intel SSD DC P3700 800GB 48.7 12.8k 77.9 76 96 2768
Micron 9100 MAX 2.4TB 35.3 9.2k 107.7 104 117 306

Intel's queue depth 1 specifications are expressed in terms of latency, and at a throughput specification at QD1 would be redundant. Intel specifies a "typical" latency of less than 10µs, and most QD1 random reads on the Optane SSD take 8 or 9µs; even the 99th percentile latency is still 10µs.

The 99.999th percentile target is less than 60µs, which the Optane SSD beats by a wide margin. Overall, the Optane SSD passes with ease. The flash SSDs are 8-12x slower on average, and the 99.999th percentile latency of the Intel P3700 is far worse, at around 75x slower.

4kB Random Read at a Queue Depth of 16 (QD16)
Drive Throughput Latency (µs)
MB/s IOPS Mean Median 99th 99.999th
Intel Optane SSD DC P4800X 375GB 2231.0 584.8k 25.5 25 41 81
Intel SSD DC P3700 800GB 637.9 167.2k 93.9 91 163 2320
Micron 9100 MAX 2.4TB 517.5 135.7k 116.2 114 205 1560

Intel's QD16 random read result is 584.8k IOPS for throughput, which is above the official specification of 550k IOPS by a few percent. The 99.999th percentile latency scores 81µs, significantly under the target of less than 150µs. The flash SSDs are 3-5x slower on most metrics, but 20-30 times slower at the 99.999th percentile for latency.

4kB Random Write at a Queue Depth of 1 (QD1)
Drive Throughput Latency (µs)
MB/s IOPS Mean Median 99th 99.999th
Intel Optane SSD DC P4800X 375GB 360.6 94.5k 8.9 9 10 64
Intel SSD DC P3700 800GB 350.6 91.9k 9.2 9 18 81
Micron 9100 MAX 2.4TB 160.9 42.2k 22.2 22 24 76

In the specifications, the QD1 random write specifications are 10µs on latency, while the 99.999th percentile for latency is relaxed from 60µs to 100µs. In our results, the QD1 random write throughput (360.6 MB/s) of the Optane SSD is a bit lower than the QD1 random read throughput (413.0 MB/s), but the latency is roughly the same (8.9µs mean, 10µs on 99th).

However it is worth noting that the Optane SSD only manages a passing score when the application uses asynchronous I/O APIs. Using simple synchronous write() system calls pushes the average latency up to 11-12µs.

Also, due to the capacitor-backed DRAM caches, the flash SSDs also handle QD1 random writes very well. The Intel P3700 also manages to keep latency mostly below 10µs, and all three drives have 99.999th percentile latency below Intel's 100µs standard for the Optane SSD.

4kB Random Write at a Queue Depth of 16 (QD16)
Drive Throughput Latency (µs)
MB/s IOPS Mean Median 99th 99.999th
Intel Optane SSD DC P4800X 375GB 2122.5 556.4 27.0 23 65 147
Intel SSD DC P3700 800GB 446.3 117.0 134.8 43 1336 9536
Micron 9100 MAX 2.4TB 1144.4 300.0 51.6 34 620 3504

The Optane SSD DC P4800X is specified for 500k random write IOPS using four threads to provide a total queue depth of 16. In our tests, the Optane SSD scored 556.4k IOPs, exceeding the specification by more than 11%. This equates to a random write throughput of more than 2GB/s.

The flash SSDs are more dependent on the parallelism benefits of higher capacities, and as a result can be slow at the same capacity. Hence in this case the 2.4TB Micron 9100 fares much better than the 800GB Intel P3700. The Micron 9100 hits its own specification right on the nose with 300k IOPS and the Intel P3700 comfortably exceeds its own 90k IOPS specification, although remaining the slowest of the three by far. The Optane SSD stays well below its 200µs limit for 99.999th percentile latency by scoring 147µs, while the flash SSDs have outliers of several milliseconds. Even at the 99th percentile the flash SSDs are 10-20x slower than Optane.

4kB Random Mixed 70/30 Read/Write Queue Depth 16
Drive Throughput Latency (µs)
MB/s IOPS Mean Median 99th 99.999th
Intel Optane SSD DC P4800X 375GB 1929.7 505.9 29.7 28 65 107
Intel SSD DC P3700 800GB 519.9 136.3 115.5 79 1672 5536
Micron 9100 MAX 2.4TB 518.0 135.8 116.0 105 1112 3152

On a 70/30 read/write mix, the Optane SSD DC P4800X scores 505.9k IOPS, which beats the specification of 500k IOPS by 1%. Both of the flash SSDs deliver roughly the same throughput, a little over a quarter of the speed of the Optane SSD. Intel doesn't provide a latency specification for this workload, but the measurements unsurprisingly fall in between the random read and random write results. While low-end consumer SSDs sometimes perform dramatically worse on mixed workloads than on pure read or write workloads, none of these drives have that problem due to their market positioning and capabilities therein.

Test Configurations Random Access Performance
Comments Locked

117 Comments

View All Comments

  • melgross - Tuesday, April 25, 2017 - link

    You're making the mistake those who know nothing make, which is surprising for you. This is a first generation product. It will get much faster, and much cheaper as time goes on. NAND will stagnate. You also have to remember that Intel never made the claim that this was as fast as RAM, or that it would be. The closest they came was to say that this would be in between NAND and RAM in speed. And yes, for some uses, it might be able to replace RAM. But that could be several generations down the road, in possibly 5 years, or so.
  • tuxRoller - Sunday, April 23, 2017 - link

    I'm not sure i understand you.
    You talk about "pages", but, i hope, the reviewer was only using dio, so there would be no page cache.
    It's very unclear where you are getting this "~100x" number. Nvme connected dram has a plurality of hits around 4-6 us (depending on software) but it also has a distributed latency curve. However, i don't know what the latency at the 99.999% percentile. The point is that even with dram's sub-100ns latency, it's still not staying terribly close to the theoretical min latency of the bus.
    Btw, it's not just the controller. A very large amount of latency comes from the block layer itself (amongst other things).
  • Santoval - Tuesday, June 6, 2017 - link

    It is quite possible that Intel artificially weakened P4800X's performance and durability in order to avoid internal competition with their SSD division (they already did the same with Atoms). If your new technology is *too* good it might make your other more mainstream technology look bad in comparison and you could see a big drop in sales. Or it might have a "deflationary" effect, where their customers might delay buying in hope of lower prices later. This way they can also have a more clear storage hierarchy, business segment wise, where their mainstream products are good, and their niche ones are better but not too good.

    I am not suggesting that it could ever compete with DRAM, just that the potential of 3D XPoint technology might actually be closer to what they mentioned a year ago than the first products they shipped.
  • albert89 - Friday, April 21, 2017 - link

    Intel wont be reducing the price of the optane but rather will be giving the average consumer a watered down version which will be charged at a premium but perform only slightly better then the top SSD. The conclusion ? Another over priced ripoff from Intel.
  • TheinsanegamerN - Thursday, April 20, 2017 - link

    the fastest SSD on the consumer market is the 960 pro, which can hit 3.2GB/s read under certain circumstances.

    This is the equivalent of single channel DDR 400 from 2001. and DDR had far lower latencys to boot.

    We are a long, long way from replacing RAM with storage.
  • ddriver - Friday, April 21, 2017 - link

    What makes the most impression is it took a completely different review format to make this product look good. No doubt strictly following intel's own review guidelines. And of course, not a shred of real world application. Enter hypetane - the paper dragon.
  • ddriver - Friday, April 21, 2017 - link

    Also, bandwidth is only one side of the coin. Xpoint is 30-100+ times more latent than dram, meaning the CPU will have to wait 30-100+ times longer before it has data to compute, and dram is already too slow in this aspect, so you really don't want to go any slower.

    I see a niche for hypetane - ram-less systems, sporting very slow CPUs. Only a slow CPU will not be wasted on having to wait on working memory. Server CPUs don't really need to crunch that much data either, if any, which is paradoxical, seeing how intel will only enable avx512 on xeons, so it appears that the "amazingly fast" and overpriced hypetane is at home only in simple low end servers, possibly paired with them many core atom chips. Even overpriced, it will kind of a decent deal, as it offers about 3 times the capacity per dollar as dram, paired with wimpy atoms it could make for a decent simple, low cost, frequent access server.
  • frenchy_2001 - Friday, April 21, 2017 - link

    You are missing the usefulness of it entirely.
    Yes, it is a niche product.
    And I even agree, intel is hyping it and offering it for consumer with minimal benefit (beside intel's bottom line).
    But it realistically slots between NAND and DRAM.
    This review shows that it has lower latency than NAND and it has higher density than DRAM.
    This is the play.

    You say it cannot replace DRAM and for most usage (by far) you are true. However, for a small niche that works with very big data sets (like for finace or exploration), having more memory, although slower, will still be much faster than memory + swap (to a slower NAND storage).

    Let me repeat, this is a niche product, but it has its uses.
    Intel marketing is hyping it and trying to use it where its tradeoffs (particularly price) make little sense, but the technology itself is good (if limited).
  • wumpus - Sunday, April 23, 2017 - link

    Don't be so sure that latency is keeping it from being used as [secondary] main memory. A 4GB machine can actually function (more or less) for office duty and some iffy gaming capability. I'd strongly suspect that a 4-8GB stack of HBM (preferably the low-cost 512 bit systems, as the CPU really only wants 512bit chunks of memory at a time) with the rest backed by 3dxpoint would still be effective at this high latency. Any improvement is likely to remove latency as something that would stop it (and current software can use the current stack [with PCIe connection] to work 3dxpoint as "swappable ram").

    The endurance may well keep this from happening (it is on par with SLC).

    The other catch is that this is a pretty steep change along the entire memory system. Expect Intel to have huge internal fights as to what the memory map should look like, where the HBM goes (does Intel pay to manufacture an expensive CPU module or foist it on down the line), do you even use HBM (if Ravenridge does, I'd expect that Intel would have to if they tried to use xpoint as main memory)? The big question is what would be the "cache line" of the DRAM memory: the current stack only works with 4k, the CPU "wants" 512 bits, HBM is closer to 4k. 4k looks like a no-brainer, but you still have to put a funky L5/buffer that deals with the huge cache line or waste a ton of [top level, not sure if L3 or L4] cache by giving it 4k cache lines.
  • melgross - Tuesday, April 25, 2017 - link

    What is it with you and RAM? This isn't a RAM replacement for most any use. Intel hasn't said that it is. Why are you insisting on comparing it to RAM?

Log in

Don't have an account? Sign up now