Peak Random Read Performance

For client/consumer SSDs we primarily focus on low queue depth performance for its relevance to interactive workloads. Server workloads are often intense enough to keep a pile of drives busy, so the maximum attainable throughput of enterprise SSDs is actually important. But it usually isn't a good idea to focus solely on throughput while ignoring latency, because somewhere down the line there's always an end user waiting for the server to respond.

In order to characterize the maximum throughput an SSD can reach, we need to test at a range of queue depths. Different drives will reach their full speed at different queue depths, and increasing the queue depth beyond that saturation point may be slightly detrimental to throughput, and will drastically and unnecessarily increase latency. SATA drives can only have 32 pending commands in their queue, and any attempt to benchmark at higher queue depths will just result in commands sitting in the operating system's queues before being issued to the drive. On the other hand, some high-end NVMe SSDs need queue depths well beyond 32 to reach full speed.

Because of the above, we are not going to compare drives at a single fixed queue depth. Instead, each drive was tested at a range of queue depths up to the excessively high QD 512. For each drive, the queue depth with the highest performance was identified. Rather than report that value, we're reporting the throughput, latency, and power efficiency for the lowest queue depth that provides at least 95% of the highest obtainable performance. This often yields much more reasonable latency numbers, and is representative of how a reasonable operating system's IO scheduler should behave. (Our tests have to be run with any such scheduler disabled, or we would not get the queue depths we ask for.)

One extra complication is the choice of how to generate a specified queue depth with software. A single thread can issue multiple I/O requests using asynchronous APIs, but this runs into at several problems: if each system call issues one read or write command, then context switch overhead becomes the bottleneck long before a high-end NVMe SSD's abilities are fully taxed. Alternatively, if many operations are batched together for each system call, then the real queue depth will vary significantly and it is harder to get an accurate picture of drive latency. Finally, the current Linux asynchronous IO APIs only work in a narrow range of scenarios.

There is work underway to provide a new general-purpose async IO interface that will enable drastically lower overhead, but until that work lands in stable kernel versions, we're sticking with testing through the synchronous IO system calls that almost all Linux software uses. This means that we test at higher queue depths by using multiple threads, each issuing one read or write request at a time.

Using multiple threads to perform IO gets around the limits of single-core software overhead, and brings an extra advantage for NVMe SSDs: the use of multiple queues per drive. The NVMe drives in this review all support 32 separate IO queues, so we can have 32 threads on separate cores independently issuing IO without any need for synchronization or locking between threads.

4kB Random Read

When performing random reads with a high thread count, the Samsung 983 ZET delivers significantly better throughput than any other drive we've tested: about 775k IOPS, which is over 3GB/s and about 33% faster than the Intel Optane SSD DC P4800X. The Optane SSD hits its peak throughput at QD8, while the 983 ZET requires a queue depth of at least 16 to match the Optane SSD's peak, and the peak for the 983 ZET is at QD64.

4kB Random Read (Power Efficiency)
Power Efficiency in kIOPS/W Average Power in W

At this point, it's no surprise to see the 983 turn in great efficiency scores. It's drawing almost 8W during this test, but that's not particularly high by the standards of enterprise NVMe drives. The TLC-based 983 DCT provides the next-best performance per Watt due to even lower power consumption than the Z-SSDs.

4kB Random Read QoS

The Optane SSD still holds on to a clear advantage in the random read latency scores, with 99.99th percentile latency that is lower than the average read latency of even the Z-SSDs. The two Z-SSDs do provide lower tail latencies than the other flash-based SSDs, several of which require very high thread counts to reach full throughput and thus end up with horrible 99.99th percentile latencies due to contention for CPU cores.

Peak Sequential Read Performance

Since this test consists of many threads each performing IO sequentially but without coordination between threads, there's more work for the SSD controller and less opportunity for pre-fetching than there would be with a single thread reading sequentially across the whole drive. The workload as tested bears closer resemblance to a file server streaming to several simultaneous users, rather than resembling a full-disk backup image creation.

128kB Sequential Read

The Memblaze PBlaze5 C900 has the highest peak sequential read speed thanks to its PCIe 3.0 x8 interface. Among the drives with the more common four lane connection, the Samsung 983 ZETs are tied for first place, but they reach that ~3.1GB/s with a slightly lower queue depth than the 983 DCT or PBlaze5 D900. The Optane SSD DC P4800X comes in last place, being limited to just 2.5GB/s for multi-stream sequential reads.

128kB Sequential Read (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

The Samsung 983s clearly have the best power efficiency on this sequential read test, but the TLC-based 983 DCT continues once again uses slightly less power than the 983 ZET, so the Z-SSDs don't quite take first place. The Optane SSD doesn't have the worst efficiency rating, because despite its low performance, it only uses 2W more than the Samsung drives, far less than the Memblaze or Micron drives.

Steady-State Random Write Performance

The hardest task for most enterprise SSDs is to cope with an unending stream of writes. Once all the spare area granted by the high overprovisioning ratios has been used up, the drive has to perform garbage collection while simultaneously continuing to service new write requests, and all while maintaining consistent performance. The next two tests show how the drives hold up after hours of non-stop writes to an already full drive.

4kB Random Write

The Samsung 983 ZETs outperform the TLC-based 983 DCTs for steady-state random writes, but otherwise are outclassed by the larger flash-based SSDs and the Optane SSD, which is almost six times faster than the 960GB 983 ZET. Using Z-NAND clearly helps some with steady-state write performance, but the sheer capacity of the bigger TLC drives helps even more.

4kB Random Write (Power Efficiency)
Power Efficiency in kIOPS/W Average Power in W

The 983 ZET uses more power than the 983 DCT, but not enough to overcome the performance advantage; the 983 ZET has the best power efficiency among the smaller flash-based SSDs. The 4TB and larger drives outperform the 983 ZET so much that they have significantly better efficiency scores even drawing 2.6x the power. The Intel Optane SSD consumes almost twice the power of the 983 ZET but still offers better power efficiency than any of the flash-based SSDs.

4kB Random Write

The Samsung drives and the Intel Optane SSD all have excellent latency stats for the steady-state random write test. The Intel P4510, Memblaze PBlaze5 and Micron 9100 all have decent average latencies but much worse QoS, with 99.99th percentile latencies of multiple milliseconds. These drives don't require particularly high queue depths to saturate their random write speed, so these QoS issues aren't due to any host-side software overhead.

Steady-State Sequential Write Performance

128kB Sequential Write

As with random writes, the steady-state sequential write performance of the Samsung 983 ZET is not much better than its TLC-based sibling. The only way for a flash-based SSD to handle sustained writes as well as the Optane SSD is to have very high capacity and overprovisioning.

128kB Sequential Write (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

The Samsung 983 ZET provides about half the performance per Watt of the Optane SSD during sequential writes. That's still decent compared to many other flash-based NVMe SSDs, but the 983 DCT and Memblaze PBlaze5 are slightly more efficient.

Performance at Queue Depth 1 Mixed I/O & NoSQL Database Performance
Comments Locked

44 Comments

View All Comments

  • patrickjp93 - Tuesday, February 19, 2019 - link

    Okay, no, just no. Pursuing normalisation beyond 3rd normal form is lunacy. You actually start losing ground on compression at that point, and your queries get ridiculously more verbose. 4th & 5th normal form are touted by academics who never have to work with them and DBAs who have more time on their hands than sense.

    Anyone who's ever worked even in 3rd normal form knows those pivot tables that are just key-key pairings are ridiculous wastes of space and code for an extra join or the smartass who'll use Unpivot+Roll-up which 90% of SQL users will not understand and which doesn't perform any better than just using another join!
  • prisonerX - Wednesday, February 20, 2019 - link

    You really don't have a clue about databases, normalized or otherwise. You're entirely FOS.
  • JTBM_real - Thursday, February 21, 2019 - link

    Database servers evolved a lot. High end database servers have the whole database in memory and have scaleable CPU - in practice multiple CPUs. Every processing and storage what is not database can be pushed to other purpose built servers. Purpose built server can be processing or storage heavy as needed.

    If you go to the extremes and cannot build a larger iron you can split your database and have two (or more). This is probably only an issue at google for example.
  • Opencg - Tuesday, February 19, 2019 - link

    no price is too high to stop me from instalocking pharah
  • jrs77 - Tuesday, February 19, 2019 - link

    Bring the price down to a fifth of whats announced in the specsheet and I'll buy it.
  • DigitalFreak - Tuesday, February 19, 2019 - link

    I don't think Samsung cares if you buy it.
  • Samus - Tuesday, February 19, 2019 - link

    That's what I'm saying this is TWICE as expensive as X-Point. The 480GB Intel 900P is $500 and that was already ridiculous. The insult is it's DWPD is identical and performance is still high enough to saturate a PCIe x4 interface.
  • boredsysadmin - Tuesday, February 19, 2019 - link

    It's not competing vs 900P, but vs P4800X and now you could see Samsung is a better deal (relatively speaking)
  • RSAUser - Tuesday, February 19, 2019 - link

    You are not the target, this is for certain professional workloads where $2/gig is worth the increase in productivity/efficiency.
  • XXL_AI - Tuesday, February 19, 2019 - link

    if I'm going to buy this hardware its for sure I'm going to do more than 10 DWPD

Log in

Don't have an account? Sign up now