Performance Consistency

Our performance consistency test explores the extent to which a drive can reliably sustain performance during a long-duration random write test. Specifications for consumer drives typically list peak performance numbers only attainable in ideal conditions. The performance in a worst-case scenario can be drastically different as over the course of a long test drives can run out of spare area, have to start performing garbage collection, and sometimes even reach power or thermal limits.

In addition to an overall decline in performance, a long test can show patterns in how performance varies on shorter timescales. Some drives will exhibit very little variance in performance from second to second, while others will show massive drops in performance during each garbage collection cycle but otherwise maintain good performance, and others show constantly wide variance. If a drive periodically slows to hard drive levels of performance, it may feel slow to use even if its overall average performance is very high.

To maximally stress the drive's controller and force it to perform garbage collection and wear leveling, this test conducts 4kB random writes with a queue depth of 32. The drive is filled before the start of the test, and the test duration is one hour. Any spare area will be exhausted early in the test and by the end of the hour even the largest drives with the most overprovisioning will have reached a steady state. We use the last 400 seconds of the test to score the drive both on steady-state average writes per second and on its performance divided by the standard deviation.

Steady-State 4KB Random Write Performance

The enterprise SSD heritage of the Intel SSD 750 continues to shine through as it holds on to the lead for steady-state random write performance, but Samsung has mostly caught up with the 960 Pro. This is a huge change from the 950 Pro, which had steady-state performance that was no better than typical SATA SSDs. A few consumer SSDs have offered great steady-state random write performance—most notably OCZ's drives based on the Indilinx Barefoot 3 controller—but the 960 Pro is the first one to reach the level of the Intel SSD 750.

Steady-State 4KB Random Write Consistency

In addition to mostly closing the performance gap, the 960 Pro has a great consistency score that is almost as good as the Intel SSD 750's score. While OCZ's Vector 180 offered remarkably high average performance in its steady state, it was far less consistent than the either the Samsung 960 Pro or the Intel SSD 750 and instead the standard deviation of its steady state performance was more than ten times greater.

IOPS over time
Default
25% Over-Provisioning

After the initial period of very high performance, the 960 Pro enters a steady state with very good short-term consistency but gradual long-term variation in performance. This is more similar in character to the behavior of the Intel SSD 750 than Samsung's earlier SSDs, though it's interesting to note that the 960 Pro is more twice as fast during the initial phase before transitioning to steady state.

Steady-State IOPS over time
Default
25% Over-Provisioning

Focusing on the last 400 seconds of the test shows the 960 Pro's steady state to be essentially flawless, rounding out a full page of what can be considered to be perfect scores for a consumer drive. The performance would even make the 960 Pro a pretty good enterprise SSD, and this is usually not the case for drives with consumer-oriented firmware.

A Note About Drivers AnandTech Storage Bench - The Destroyer
Comments Locked

72 Comments

View All Comments

  • Gigaplex - Tuesday, October 18, 2016 - link

    "Because of that, all consumer friendly file systems have resilience against small data losses."

    And for those to work, cache flush requests need to be functional for the journalling to work correctly. Disabling cache flushing will reintroduce the serious corruption issues.
  • emn13 - Wednesday, October 19, 2016 - link

    "100% data protection is not needed": at some level that's obviously true. But it's nice to have *some* guarantees so you know which risks you need to mitigate and which you can ignore.

    Also, NVMe has the potential to make this problem much worse: it's plausible that the underlying NAND+controller cannot outperform SATA alternatives to the degree they appear to; and that to achieve that (marketable) advantage, they need to rely more on buffering and write merging. If so, then it's possible you may be losing still only milliseconds of data, but that might cause quite a lot of corruption given how much data that can be on NVMe. Even though "100%" safe is possibly unnecessary, that would make the NVMe value proposition much worse: not only are such drives much more expensive, they also (in this hypothesis) would be more likely to cause data corruption - I certainly wouldn't buy one given that tradeoff; the performance gains are simply too slim (in almost any normal workload).

    Also, it's not quite true that "all consumer friendly file systems have resilience against small data losses". Journalled filesystems typically only journal metadata; not data - so you may still have a bunch of corrupted files. And, critically - the journaling algorithms rely on proper drive flushing! If a drive can lose data that has been flushed (pre-fsync-writes), then even a journalled filesystem can (easily!) be corrupted extensively. If anything, journalled filesystems are even more vulnerable to that than plain old fat, because they rely on clever interactions of multiple (conflicting) sources of truth in the event of a crash, and when the assumptions the FS makes turn out to be invalid, it (by design) will draw incorrect inferences about which data is "real" and which due to the crash. You can easily lose whole directories (say, user directories) at once like this.
  • HollyDOL - Wednesday, October 19, 2016 - link

    Tbh I consider whole this argument strongly obsolete... if you have close to $1300 spare to buy 2TB SSD monster, you definitely should have $250-350ish to buy decent UPS.

    Or, if you run several thousand USD machine without any, you more than deserve what you can get.

    It's same argument like you won't build double Titan XP monster and power it with chinesse noname PSU. There are things which are simply no go.
  • bcronce - Tuesday, October 18, 2016 - link

    As an ex-IT who used to manage thousands of computers, I have never seen catastrophic data loss caused by a power outage, and I have seen many of them. What I have seen are harddrives or PSUs dying and recently committed data was lost, but never fully committed data.

    That being said. SSDs are a special beast because many times writing new data requires moving existing data, and this is dangerous.

    Most modern filesystems since the 90s, except FAT32, were meant to handle unexpected powerloss. NTFS was the first FS from MS that pretty much got rid of powerloss issues.
  • KAlmquist - Tuesday, October 18, 2016 - link

    The functionality that a file system like NTFS requires to avoid corruption in the case of a power failure is a write barrier. A write barrier is a directive that says that the storage device should perform all writes prior to the write barrier before performing any of the writes issued after the write barrier.

    On a device using flash memory, write barriers should have minimal performance impact. It is not possible to overwrite flash memory, so when an SSD gets a write request, it will allocate a new page (or multiple pages) of flash memory to hold the data begin written. After it writes the data, it will update the mapping table so to point to the newly written page(s). If an SSD gets a whole bunch of writes, it can perform the data write operations in parallel as long as the pages being written all reside on different flash chips.

    If an SSD gets a bunch of writes separated by write barriers, it can write the data to flash just like it would without the write barriers. The only change is in when a write completes, the SSD cannot update the mapping table to point to the new data until earlier writes have completed.

    This is different from a mechanical hard drive. If you issue a bunch of writes to a mechanical hard drive, the drive will attempt to perform the writes in an order that will minimize seek time and rotational latency. If you place write barriers between the write requests, then the drive will execute the writes in the same order you issued them, resulting in lower throughput.

    Now suppose you are unable to use write barriers for some reason. You can achieve the same effect by issuing commands to flush the disk after every write, but that will prevent the device from executing mulitple write commands in parallel. A mechanical hard drive can only execute one write at a time, so cache flushes are a viable alternative to write barriers if you know you are using a mechanical hard drive. But on SSD's, parallel writes are not only possible, they are essential to performance. The write speeds of individual flash chips are slower than hard drive write speeds; the reason that sequential writes on most SSD's are faster than on a hard drive is that the SSD writes to multiple chips in parallel. So if you are talking to an SSD, you do not want to use cache flushes to get the effect of write barriers.

    I take it from what shodanshok wrote is that Microsoft Windows doesn't use write barriers on NVME devices, giving you the choice of either using cache flushes or risking file system corruption on loss of power. A quick look at the NVME specification suggests that this is the fault of Intel, not Microsoft. Unless I've missed it, Intel inexplicably omitted write barrier functionality from the specification, forcing Microsoft to use cache flushing as a work-around:

    http://www.nvmexpress.org/wp-content/uploads/NVM_E...

    On SSD devices, write barriers are essentially free. There is no need for a separate write barrier command; the write command could contain a field indicating that the write operation should be preceded by a write barrier. Users shouldn't have to chose between data protection and performance when the correct use of a sensibly designed protocol would given them both without them having to worry about it.
  • Dorkaman - Monday, November 28, 2016 - link

    So this drive has capacitors to help write out anything in the buffer if the power goes out:

    https://youtu.be/nwCzcFvmbX0 skip to 2:00

    23 power-loss capacitors used to keep the SSD's controller running just long enough, in the event of an outage, to flush all pending writes:

    http://www.tomshardware.com/reviews/samsung-845dc-...

    Will the 960 Evo have that? Would this prevent something like this (RAID 0 lost due to power outage):

    https://youtu.be/-Qddrz1o9AQ
  • Nitas - Tuesday, October 18, 2016 - link

    This may be silly of me but why did they use W8.1 instead of 10?
  • Billy Tallis - Tuesday, October 18, 2016 - link

    I'm still on Windows 8.1 because this is still our 2015 SSD testbed and benchmark suite. I am planning to switch to Windows 10 soon, but that will mean that new benchmark results are not directly comparable to our current catalog of results, so I'll have to re-test all the drives I still have on hand, and I'll probably take the opportunity to make a few other adjustments to the test protocol.

    Switching to Windows 10 hasn't been a priority because of the hassle it entails and the fact that it's something of a moving target, but particularly with the direction the NVMe market is headed the Windows version is starting to become an important factor.
  • Nitas - Tuesday, October 18, 2016 - link

    I see, thanks for clearing that up!
  • Samus - Wednesday, October 19, 2016 - link

    Windows 8.1 will have virtually no difference in performance compared to Windows 10 for the purpose of benchmarking SSD's...

Log in

Don't have an account? Sign up now