A Note About Drivers

The Intel SSD 750, the Samsung 950 Pro and the OCZ RD400 were all reviewed with the NVMe drivers supplied by the SSD vendors. In the past, vendors have sometimes cited performance as an advantage to using their NVMe driver over the one built in to Windows, but the primary reason has been that Microsoft's driver implements a limited feature set. The driver that was made available as an update to add NVMe support to Windows 7 SP1 and Windows Server 2008 SP1 did not include the necessary interfaces for updating SSD firmware, and even on Windows 8.1 and later the vendor-specific management tools require their own driver for performing tasks like a secure erase.

Samsung's NVMe driver for the 960 Pro was not ready in time for this review. They are planning to release it in mid-November in conjunction with their Magician 5.0 utility. The Samsung NVMe driver will be required to support Magician 5.0's new "Magic Vault" secure archive/backup feature and the new secure file erase feature.

In the meantime, rather than try to hack Samsung's NVMe driver for the 950 Pro to work with the 960 Pro, this review is relying on Microsoft's NVMe driver built-in to Windows 8.1. While most SSD vendors (especially the smaller ones) now say that Microsoft's NVMe driver offers adequate performance and that there is no need for a custom driver to get full performance, there are some pitfalls.

Windows provides two settings for drive write caching policy. By default, write caching is enabled on internal drives and there is an unselected option to turn off write cache buffer flushing. Both options have warnings attached about the possibility of data loss in the event of a power failure. It is normal for SSDs to cache and combine writes rather than immediately send all written data straight to the flash, and this is necessary to overcome the fact that NAND flash write operations are inherently much slower than read operations. Without write caching on the SSD, we would never see good random write performance, let alone random write performance that exceeds random read performance.

The default write caching policy settings work fine for SATA SSDs. This is not the case for NVMe SSDs when using Microsoft's driver. Microsoft's NVMe driver in the default configuration is extremely conservative about write caching, leading to extremely poor performance on some tests. Checking the second box gives performance that is as expected while leaving it unchecked for a high-end NVMe drive can lead to worse performance than a low-end SATA drive. Normally I would not review a drive with an obscure setting like this changed, especially since it can increase the risk of data loss, but Microsoft's default is clearly broken and not in line with the industry standard practices. The 960 Pro was benchmarked with the settings as shown above, and a more thorough comparison of how NVMe drivers and operating system versions affect performance will be coming in the future.

Introduction Performance Consistency
POST A COMMENT

72 Comments

View All Comments

  • emn13 - Wednesday, October 19, 2016 - link

    Especially since NAND hasn't magically gotten lots faster after the SATA->NVMe transition. If SATA is fast enough to saturate the underlying NAND+controller combo when they must actually write to disk, then NVMe simply looks unnecessarily expensive (if you look at writes only). Since the fast NVMe drives all have ram caches, it's hard to detect whether data is properly being written.

    Perhaps windows is doing something odd here, but it's definitely fishy.
    Reply
  • jhoff80 - Tuesday, October 18, 2016 - link

    This is probably a stupid question because I've been changing that setting for years on SSDs without even thinking about it and you clearly know more about this than I do, but does the use of a drive in a laptop (eg battery-powered) or with a UPS for the system negate this risk anyway? That was always my impression, but it could very much be wrong. Reply
  • shodanshok - Tuesday, October 18, 2016 - link

    Having a battery, laptops are inherently safer than desktop against power loss. However, a bad (or missing) battery and/or a failing cable/connector can expose the disks to the very same unprotected power-loss scenario. Reply
  • Dr. Krunk - Sunday, October 23, 2016 - link

    What happens if accidently press the battery release button and it pops out just enough to lose connection? Reply
  • woggs - Tuesday, October 18, 2016 - link

    I would love to see Anandtech do a deep dive into this very topic. It's important. I've heard that windows and other apps do excessive cache flushing when enabled and that's also a problem. I've also heard intel SSDs completely ignore the cache flush command and simply implement full power loss protection. Batching writes into ever larger pieces is a fact of SSD life and it needs to be done right. Reply
  • voicequal - Tuesday, October 18, 2016 - link

    Agreed. Last year I traced slow disk i/o on a new Surface Pro 4 with 256GB Toshiba XG3 NVMe to the write-cache buffer flushing, so I checked the box to turn it off. Then in July, another driver bug caused the Surface Pro 4 to frequently lock up and require a forced power off. Within a few weeks I had a corrupted Windows profile and system file issues that took several DISM runs to clean up. Don't know for sure if my problem resulted from the disabled buffer flushing, but I'm now hesitant to reenable the setting.

    It would be good to understand what this setting does with respect to NVMe driver operation, and interesting to measure the impact / data loss when power loss does occur.
    Reply
  • Kristian Vättö - Tuesday, October 18, 2016 - link

    I think you are really exaggerating the problem. DRAM cache has been used in storage well before SSDs became mainstream. Yes, HDDs have DRAM cache too and it's used for the same purpose: to cache writes. I would argue that HDDs are even more vulnerable because data sits in the cache for a longer time due to the much higher latency of platter-based storage.

    Because of that, all consumer friendly file systems have resilience against small data losses. In the end, only a few MB of user data is cached anyway, so it's not like we talk about a major data loss. It's small enough not to impact user experience, and the file system can recover itself in case there was metadata in the lost cache.

    If this was a severe issue, there would have been a fix years ago. For client-grade products there is simply no need because 100% data protection and uptime are not needed.
    Reply
  • shodanshok - Tuesday, October 18, 2016 - link

    The problem is not the cache, rather ignoring cache flushes requests. I know DRAM caches are used from decades, and when disks lied about flushing them (in the good old IDE days), catastrophic filesystem failure were much more common (see XFS or ZFS FAQs / mailing lists for some reference, or even SATA command specifications).

    I'm not exaggerating anything: it is a real problem, greatly debated in the Linux community in the past. From https://lwn.net/Articles/283161/
    "So the potential for corruption is always there; in fact, Chris Mason has a torture-test program which can make it happen fairly reliably. There can be no doubt that running without barriers is less safe than using them"

    This quote is ext3-specific, but other journaled filesystem behave in very similar manners. And hey - the very same Windows check box warns you about the risks related to disabling flushes.

    You should really inquiry Microsoft about what these check box do on its NVMe driver. Anyway, suggesting to disable cache flushes is a bad advise (unless you don't use your PC for important things).
    Reply
  • Samus - Wednesday, October 19, 2016 - link

    I don't think people understand how cache flushing works at the hardware level.

    If the operating system has buffer flushing disabled, it will never tell the drive to dump the cache, for example, when an operation is complete. In this event, a drive will hold onto whatever data is in cache until the cache fills up, then the drive firmware will trigger the controller to write the cache to disk.

    Since OS's randomly write data to disk all the time, bits here and there go into cache to prevent disk thrashing/NAND wear, all determined in hardware. This has nothing to do with pooled or paged data at the OS level or RAM data buffers.

    Long story short, it's moronic to disable write buffer flushing, where the OS will command the drive after IO operations (like a file copy or write) complete, ensuring the cache is clear as the system enters idle. This happens hundreds if not thousands of times per minute and its important to fundamentally protect the data in cache. With buffer flushing disabled the cache will ALWAYS have something in it until you shutdown - which is the only time (other than suspend) a buffer flush command will be sent.
    Reply
  • Billy Tallis - Wednesday, October 19, 2016 - link

    "With buffer flushing disabled the cache will ALWAYS have something in it until you shutdown - which is the only time (other than suspend) a buffer flush command will be sent."

    I expect at least some drives flush their internal caches before entering any power saving mode. I've occasionally seen the power meter spike before a drive actually drops down to its idle power level, and I probably would have seen a lot more such spikes if the meter were sampling more than once per second.
    Reply

Log in

Don't have an account? Sign up now