CPU Utilization

With the move to NVMe not only do we get lower latency IOs but we should also see lower CPU utilization thanks to the lower overhead protocol. To quantify the effects I used task manager to monitor CPU utilization across all four cores in a Core i7 4770K system (with HT disabled). Note that these values don't just look at the impact of the storage device, but also the CPU time required to generate the 4KB random read (QD128) workload. I created four QD32 threads so all cores are taxed and we're not limited by a single CPU core.

Total System CPU Utilization (4 x 3.5GHz Haswell Cores)

To really put these values in perspective though we need to take into account performance as well. The chart below divides total IOPS during this test by total CPU usage to give us IOPS/% CPU usage:

Platform Efficiency: IOPS per % CPU Utilization

Here all of the PCIe solutions do pretty well. The SATA based S3700 is put to shame but even the Intel SSD 910 does well here.

For the next charts I'm removing Iometer from the CPU usage calculation and instead looking at the CPU usage from the rest of the software stack:

Storage Subsystem CPU Utilization (4 x 3.5GHz Haswell Cores)

 

Platform Efficiency: IOPS per % Storage CPU Utilization

Here the 910 looks very good, it's obviously a much older (and slower) drive but it's remarkably CPU efficient. Micron's P420m doesn't look quite as good, and the SATA S3700 is certainly far less efficient when it comes to IOPS/CPU.

Mixed Read/Write Performance Power Consumption
Comments Locked

85 Comments

View All Comments

  • andrewaggb - Tuesday, June 3, 2014 - link

    that's really the question isn't it. I'm skeptical until somebody proves otherwise. Seems like you'd need a bios update at a minimum.
  • BeethovensCat - Tuesday, June 3, 2014 - link

    Yes, this would be key. Would be annoying to buy a card and not be able to boot Windows from it. Would it be only be possible with Z97 based chipsets or also Z87? Have a relatively new Z87 card. As much as I don't want to change to Apple, one must admit they are better at getting some of these things right. Come on Intel (Asus) - make it possible to boot from one of these on a Z87 motherboard and I will buy one right away!!
  • Taurothar - Tuesday, June 3, 2014 - link

    Honestly, it's up to the motherboard's capabilities. A bios update should be possible but it depends on many things like how the PCIe lanes are distributed etc, I wouldn't count on getting the full performance out of a chipset designed before PCIe SSDs. PCIe RAID cards have the controller to boot from built in, but these stand alone SSDs mean the chipset or other onboard controller has to be able to recognize it, that might not be as simple as a bios update.
  • morganf - Tuesday, June 3, 2014 - link

    I was disappointed that the 4K QD1 read was no better than 40 MB/sec that can be achieved by SATA / AHCI SSDs like the Samsung 840 Pro.

    FusionIO has been getting twice that (i.e., around 80 MB/sec) for years. I was expecting NVMe to achieve something similar.

    But maybe the 40 MB/sec is an OS driver limitation? Perhaps FusionIO is able to get around that because they have their own driver.
  • boogerlad - Tuesday, June 3, 2014 - link

    Why does the p3500 have such low 4k random write IOPS? Is it merely the worst case/steady-state performance? Is it much lower quality NAND? Is it lack of over provisioning and not a problem if the drive is not filled to the brim? I've been waiting for a product like this for a very long time. To be honest, I was surprised Intel was the one to deliver. It looks like they checked out of making innovative products looking at their CPU lineup.
  • boogerlad - Tuesday, June 3, 2014 - link

    Then again, I guess as long as 4k qd1 write speeds are the same as the p3700, it doesn't really matter. Many enthusiasts will buy the p3500 and put it under a consumer workload anyways that rarely has qd > 1.
  • Dangledon - Wednesday, June 4, 2014 - link

    Low random write performance is probably an indirect reflection of TLC. The endurance numbers make this pretty clear. TLC has relatively long P/E dwell times. These times become apparent when garbage collection is triggered by sustained random write workloads. I don't know whether these devices support overprovisioning. Having it might help deal with spikey workloads as long as the "runway" is long enough. Though, frankly, the P3500 was not designed for a high random write workload.
  • Dangledon - Wednesday, June 4, 2014 - link

    My bad. They're using MLC, not TLC. The reserve/spare capacity is 7% on the P3500, which in-part accounts for the relatively low endurance. Intel is probably also doing NAND part binning, using the poorest quality parts in the P3500.
  • rob_allshouse - Tuesday, June 3, 2014 - link

    One comment (and I do work for Intel, to be open about it)... but the P3700 does this all in x4 while the p420m does it in x8, so half the PCIe lanes consumed. I didn't see this in the article, and feel like it's very relevant. It also explains the disparity in sequential read performance.
  • mfenn - Tuesday, June 3, 2014 - link

    I find it interesting that this article is presented as an enterprise SSD review and even goes so far as to decry the performance of previous implementations, but does not mention Fusion-io or Virident. We've had 500K IOPS and latencies in the tens of microseconds for years now without Intel or NVMe, those are not the stories here.

    NVMe is not some wonderful advance from a performance point of view, and should not be presented as such. What it is is a path towards the commoditization of relatively high performance PCIe SSDs. That's an incredibly important achievement and should have been the focus of the discussion.

    As it stands, this article follows the the Intel marketing tune a little too closely and does not respect the deep market insights that I've come to expect from AnandTech.

Log in

Don't have an account? Sign up now