Random Read Performance

Our first test of random read performance uses very short bursts of operations issued one at a time with no queuing. The drives are given enough idle time between bursts to yield an overall duty cycle of 20%, so thermal throttling is impossible. Each burst consists of a total of 32MB of 4kB random reads, from a 16GB span of the disk. The total data read is 1GB.

Burst 4kB Random Read (Queue Depth 1)

When the Crucial P1 has plenty of unused capacity and its SLC cache is large enough to contain the entire 16GB of test data, the burst random read performance is excellent. When the drive is full and the test data can no longer fit in the SLC cache, the performance falls behind the Crucial MX500 and most low-end NVMe SSDs.

Our sustained random read performance is similar to the random read test from our 2015 test suite: queue depths from 1 to 32 are tested, and the average performance and power efficiency across QD1, QD2 and QD4 are reported as the primary scores. Each queue depth is tested for one minute or 32GB of data transferred, whichever is shorter. After each queue depth is tested, the drive is given up to one minute to cool off so that the higher queue depths are unlikely to be affected by accumulated heat build-up. The individual read operations are again 4kB, and cover a 64GB span of the drive.

Sustained 4kB Random Read

The sustained random read performance of the Crucial P1 at low queue depths is mediocre at best, falling behind most TLC-based NVMe SSDs and the Crucial MX500. By contrast, the Intel 660p manages to retain its high performance even on the sustained test, indicating that the Intel drive kept more of the test data in its SLC cache than the Crucial P1 does. When the test is run on a full drive, the P1 and the 660p have equivalent performance that is about 12% slower than the P1 with only the 64GB test data file.

Sustained 4kB Random Read (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

The power efficiency of the Crucial P1 during the sustained random read test is less than half of what the Intel 660p offers, due almost entirely to the large performance difference. At just over 2W, the power consumption of the P1 is reasonable, but it doesn't provide the performance to match when the test data isn't in the SLC cache.

The random read performance of the Crucial P1 increases modestly with higher queue depths, but it pales in comparison to what the Intel 660p delivers by serving most of the reads for this test out of its SLC cache. Even the Crucial MX500 develops a large lead over the P1 at the highest queue depths, while using less power.

Plotting the sustained random read performance and power consumption of the Crucial P1 against the rest of the drives that have run through our 2018 SSD test suite, it is clear that the drive doesn't measure up well against even most SATA SSDs, let alone NVMe drives that go beyond the SATA speed limit when given a sufficiently high queue depth. Thanks to its SLC cache being more suited to these test conditions, the Intel 660p is among those NVMe drives that beat the limits of SATA.

Random Write Performance

Our test of random write burst performance is structured similarly to the random read burst test, but each burst is only 4MB and the total test length is 128MB. The 4kB random write operations are distributed over a 16GB span of the drive, and the operations are issued one at a time with no queuing.

Burst 4kB Random Write (Queue Depth 1)

The burst random write performance of the Crucial P1 is good, but not quite on par with the top tier of NVMe SSDs. The Intel 660p is about 10% slower. Both drives clearly have enough free SLC cache to handle this test even when the drives are completely full.

As with the sustained random read test, our sustained 4kB random write test runs for up to one minute or 32GB per queue depth, covering a 64GB span of the drive and giving the drive up to 1 minute of idle time between queue depths to allow for write caches to be flushed and for the drive to cool down.

Sustained 4kB Random Write

The longer sustained random write test involves enough data to show the effects of the variable SLC cache size on the Crucial P1: performance on a full drive is less than half of what the drive provides when it only contains the 64GB test data. As with the burst random write test, the  P1 has a small but clear performance advantage over the Intel 660p.

Sustained 4kB Random Write (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

When the sustained random write test is run on the Crucial P1 containing only the test data, it delivers excellent power efficiency. When the drive is full and the SLC cache is inadequate, power consumption increases slightly and efficiency is reduced by almost a factor of three.

Even when the random write test is conducted on an otherwise empty Crucial P1, the SLC cache starts to fill up by the time the queue depth reaches 32. When the drive is full and the cache is at its minimum size, random write performance decreases with each phase of the test despite the increasing queue depth. By contrast, the Intel 660p shows signs of its SLC cache filling up after QD4 even when the drive is otherwise empty, but its full-drive performance is steadier.

Plotting the Crucial P1's sustained random write performance and power consumption against the rest of the drives that have completed our 2018 SSD test suite emphasizes the excellent combination of performance and power efficiency enabled by the very effective SLC write cache. The P1 requires more power than many SATA drives, but almost all NVMe drives require more power to deliver the same performance, and the very fastest drives aren't much faster than the peak write speed of the Crucial P1.

SYSmark 2018 Sequential Performance
Comments Locked

66 Comments

View All Comments

  • DanNeely - Thursday, November 8, 2018 - link

    When DDR2 went mainstream they stopped making DDR1 dimms. The dimms you could still find for sale a few years later were old ones where you were paying not just the original cost of making them, but the cost of keeping them in a warehouse for several years before you bought them. Individual ram chips continued to be made for a while longer on legacy processes for embedded use but because the same old mature processes were still being used there was no scope for newer tech allowing cost cutting, and lower volumes meant loss of scale savings meaning that the embedded world also had to pay more until they upgraded to new standards.
  • Oxford Guy - Thursday, November 8, 2018 - link

    The point was:

    "QLC may lead to higher TLC prices, if TLC volume goes down and/or gets positioned as a more premium product as manufacturers try to sell us QLC."

    Stopping production leads to a volume drop, eh?
  • romrunning - Thursday, November 8, 2018 - link

    "There is a low-end NVMe market segment with numerous options, but they are all struggling under the pressure from more competitively priced high-end NVMe SSDs."

    I really wish all NVMe drives kept a higher base performance level. QLC should have died on the vine. I get the technical advances, but I prefer tech advances increase performance, not ones that are worse than their predecessor. The price savings, when it's actually there, isn't worth the trade-offs.
  • Flunk - Thursday, November 8, 2018 - link

    In a year or two there are going to be QLC drives faster than today's TLC drives. it just takes time to develop a new technology.
  • Oxford Guy - Thursday, November 8, 2018 - link

    Faster to decay, certainly.

    As I understand it, it's impossible, due to physics, to make QLC faster than TLC, just as it's impossible to make TLC faster than MLC. Just as it's impossible to make MLC faster than SLC.

    Workarounds to mask the deficiencies aren't the same thing. The only benefit to going beyond SLC is density, as I understand it.
  • Billy Tallis - Thursday, November 8, 2018 - link

    Other things being equal, MLC is faster than TLC and so on. But NAND flash memory has been evolving in ways other than changing the number of bits stored per cell. Micron's 64L TLC is faster than their 32L MLC, not just denser and cheaper. I don't think their 96L or 128L QLC will end up being faster than 64L TLC, but I do think it will be faster than their 32L or 16nm planar TLC. (There are some ways in which increased layer count can hurt performance, but in general those effects have been offset by other performance increases.)
  • Oxford Guy - Thursday, November 8, 2018 - link

    "Other things being equal, MLC is faster than TLC and so on"

    So, other than density, there is no benefit to going beyond SLC, correct?
  • Billy Tallis - Thursday, November 8, 2018 - link

    Pretty much. If you can afford to pay for SLC and a controller with enough channels and chip enable lines, then you could have a very nice SSD for a very unreasonable price. When you're constrained to a SATA interface there's no reason not to store at least three bits per cell, and even for enterprise NVMe SSDs there are only a few workloads where the higher performance of SLC is cost-effective.
  • Great_Scott - Monday, November 12, 2018 - link

    They should drop the SLC emulation and just sell the drive as an SLC drive. Sure, there may be some performance left on the table due to the limits of the NVME interface, but the longevity would be hugely attractive to some users.

    They'd make more money too, since they could better justify higher costs that way. In fact, with modern Flash they might be able to get much the same benefit from MLC organization and have roughly half the drive space instead of 25%.
  • Lolimaster - Friday, November 9, 2018 - link

    Do not mix better algorithms of the simulated SLC cache and dram with actual "performance", start crushing their simulated cache and the TLC goes to trash.

Log in

Don't have an account? Sign up now