Original Link: http://www.anandtech.com/show/8147/the-intel-ssd-dc-p3700-review-part-2-nvme-on-client-workloads



Last week we reviewed Intel's first NVMe drive: the DC P3700. Based on a modified version of the controller in Intel's SSD DC S3700/S3500, the P3700 moves to an 18-channel design, drops internal latencies and sheds SATA for a native PCIe interface. The result is an extremely high performance enterprise SSD that delivers a combination of high bandwidth and very low latencies, across a wide span of queue depths.

Although Intel's SSD DC P3700 is clearly targeted at the enterprise, the drive will be priced quite aggressively at $3/GB. Furthermore, Intel will be using the same controller and firmware architecture in two other, lower cost derivatives (P3500/P3600). In light of Intel's positioning of the P3xxx family, a number of you asked for us to run the drive through our standard client SSD workload. We didn't have the time to do that before Computex, but it was the first thing I did upon my return. If you aren't familiar with the P3700 I'd recommend reading the initial review, but otherwise let's look at how it performs as a client drive.

Performance Consistency

Performance consistency tells us a lot about the architecture of these SSDs and how they handle internal defragmentation. The reason we don’t have consistent IO latency with SSD is because inevitably all controllers have to do some amount of defragmentation or garbage collection in order to continue operating at high speeds. When and how an SSD decides to run its defrag or cleanup routines directly impacts the user experience as inconsistent performance results in application slowdowns.

To test IO consistency, we fill a secure erased SSD with sequential data to ensure that all user accessible LBAs have data associated with them. Next we kick off a 4KB random write workload across all LBAs at a queue depth of 32 using incompressible data. The test is run for just over half an hour and we record instantaneous IOPS every second.

We are also testing drives with added over-provisioning by limiting the LBA range. This gives us a look into the drive’s behavior with varying levels of empty space, which is frankly a more realistic approach for client workloads.

Each of the three graphs has its own purpose. The first one is of the whole duration of the test in log scale. The second and third one zoom into the beginning of steady-state operation (t=1400s) but on different scales: the second one uses log scale for easy comparison whereas the third one uses linear scale for better visualization of differences between drives. Click the buttons below each graph to switch the source data.

For more detailed description of the test and why performance consistency matters, read our original Intel SSD DC S3700 article.

  Intel SSD DC P3700 Intel SSD DC S3700 Samsung SSD 840 Pro SanDisk Extreme II Samsung SSD XP941
Default
25% Spare Area    

In our enterprise P3700 review we looked at IO consistency during a multi-hour run of a 4KB random write test at a queue depth of 128. The P3700 did quite well in that test, but the results weren't exactly comparable to what we've run for the past 18+ months. Here I ran the same QD32 test on the P3700, and the results are even better than the S3700. Keep in mind that the scales aren't comparable between the two drives (the P3700's higher performance drives the scale up to 1M IOPS), but the P3700 shows a very small drop in performance once the drive is out of spare area.

  Intel SSD DC P3700 Intel SSD DC S3700 Samsung SSD 840 Pro SanDisk Extreme II Samsung SSD XP941
Default
25% Spare Area    

  Intel SSD DC P3700 Intel SSD DC S3700 Samsung SSD 840 Pro SanDisk Extreme II Samsung SSD XP941
Default
25% Spare Area    

There's definitely some tweaking to the S3700's controller/firmware as the P3700 shows a much longer period of stable performance before there's a drop and recovery.



AnandTech Storage Bench 2013

Our Storage Bench 2013 focuses on worst-case multitasking and IO consistency. Similar to our earlier Storage Benches, the test is still application trace based—we record all IO requests made to a test system and play them back on the drive we're testing and run statistical analysis on the drive's responses. There are 49.8 million IO operations in total with 1583.0GB of reads and 875.6GB of writes. I'm not including the full description of the test for better readability, so make sure to read our Storage Bench 2013 introduction for the full details.

AnandTech Storage Bench 2013 - The Destroyer
Workload Description Applications Used
Photo Sync/Editing Import images, edit, export Adobe Photoshop CS6, Adobe Lightroom 4, Dropbox
Gaming Download/install games, play games Steam, Deus Ex, Skyrim, Starcraft 2, BioShock Infinite
Virtualization Run/manage VM, use general apps inside VM VirtualBox
General Productivity Browse the web, manage local email, copy files, encrypt/decrypt files, backup system, download content, virus/malware scan Chrome, IE10, Outlook, Windows 8, AxCrypt, uTorrent, AdAware
Video Playback Copy and watch movies Windows 8
Application Development Compile projects, check out code, download code samples Visual Studio 2012

We are reporting two primary metrics with the Destroyer: average data rate in MB/s and average service time in microseconds. The former gives you an idea of the throughput of the drive during the time that it was running the test workload. This can be a very good indication of overall performance. What average data rate doesn't do a good job of is taking into account response time of very bursty (read: high queue depth) IO. By reporting average service time we heavily weigh latency for queued IOs. You'll note that this is a metric we've been reporting in our enterprise benchmarks for a while now. With the client tests maturing, the time was right for a little convergence.

AnandTech Storage Bench 2013 - The Destroyer (Data Rate)

The P3700 takes the performance crown away from Samsung's XP941. Granted we are talking about a much larger and more expensive drive, but if you're after the absolute best performance for a workstation or high-end client, the P3700 is without equal.

AnandTech Storage Bench 2013 - The Destroyer (Service Time)

In our initial P3700 review we talked about the impact of NVMe and a lower overhead interface stack on IO latency - we see the benefits of that here in our look at average service times.

AnandTech Storage Bench 2011

Back in 2011 (which seems like so long ago now!), we introduced our AnandTech Storage Bench, a suite of benchmarks that took traces of real OS/application usage and played them back in a repeatable manner. The MOASB, officially called AnandTech Storage Bench 2011 - Heavy Workload, mainly focuses on peak IO performance and basic garbage collection routines. There is a lot of downloading and application installing that happens during the course of this test. Our thinking was that it's during application installs, file copies, downloading and multitasking with all of this that you can really notice performance differences between drives. The full description of the Heavy test can be found here, while the Light workload details are here.

Heavy Workload 2011 - Average Data Rate

The XP941 remains the king in our 2011 heavy test. I was pretty surprised to find the P3700 lose its first place position here, but it's still competitive.

Light Workload 2011 - Average Data Rate

The situation reverses back to normal when we look at the light workload.



Random Read/Write Speed

The four corners of SSD performance are as follows: random read, random write, sequential read and sequential write speed. Random accesses are generally small in size, while sequential accesses tend to be larger and thus we have the four Iometer tests we use in all of our reviews.

Our first test writes 4KB in a completely random pattern over an 8GB space of the drive to simulate the sort of random access that you'd see on an OS drive (even this is more stressful than a normal desktop user would see). We perform three concurrent IOs and run the test for 3 minutes. The results reported are in average MB/s over the entire time.

Desktop Iometer - 4KB Random Write (4K Aligned) - 8GB LBA Space

Our enterprise look at the P3700 focused on steady state 4KB random write performance, but surprisingly enough our short burst/8GB LBA space testing puts the P3700 at a very similar performance level. Here the P3700 is more than twice as fast as the closest SATA competitor, which is amazing despite the low queue depth of our test. I also included the old X25-M G2 to show just how far we've come - the P3700 is nearly 15x the speed of Intel's first generation MLC SSD controller.

Desktop Iometer - 4KB Random Write (8GB LBA Space QD=32)

At a higher queue depth the Z-Drive R4 is able to catch up to the P3700, but being able to deliver excellent random IO performance even at low queue depths is a staple of a good client drive.

Desktop Iometer - 4KB Random Read (4K Aligned)

Random read performance is better than anything else here, but there's a limit to how much parallelism you can extract from a low queue depth random read workload.

Sequential Read/Write Speed

To measure sequential performance we run a 1 minute long 128KB sequential test over the entire span of the drive at a queue depth of 1. The results reported are in average MB/s over the entire test length.

Desktop Iometer - 128KB Sequential Read (4K Aligned)

Once again we see the P3700 does extremely well at low queue depths, here its sequential read performance is substantially better than anything else.

Desktop Iometer - 128KB Sequential Write (4K Aligned)

Sequential writes are even more impressive. We typically never see this sort of performance at a queue depth of 1. The P3700's 18-channel controller and firmware do a good job of splitting up write requests across as many parallel die as possible. Once again comparing the P3700 to the old X25-M G2 we see 15x the performance in 6 years.

AS-SSD Incompressible Sequential Read/Write Performance

The AS-SSD sequential benchmark uses incompressible data for all of its transfers. The result is a pretty big reduction in sequential write speed on SandForce based controllers. At a higher queue depth the P3700's performance scales even further. It used to only be possible to see these numbers on PCIe SSDs that leveraged multiple controllers.

Incompressible Sequential Read Performance - AS-SSD

Incompressible Sequential Write Performance - AS-SSD



Performance vs. Transfer Size

ATTO is a useful tool for quickly benchmarking performance across various transfer sizes. You can get the complete data set in Bench. The P3700 does a good job of scaling performance, although Samsung definitely holds an advantage at some of the smaller transfer sizes when it comes to reads.

The story changes a bit if we look at sequential writes:

Note the ultra low performance at really small transfer sizes (512B - 2KB). To better showcase what I was seeing, I cropped out the larger transfers and just focused on the first few datapoints:

The P3700's performance is really low when it comes to ultra small sequential write transfers. Once you hit 4KB the P3700's performance skyrockets, but up until that point it's substantially slower than even a high end SATA drive. As very few workloads actually care about performance down here I suspect it's something that Intel never optimized for.



Final Words

In the past, high-performance enterprise PCIe drives didn't do all that well in our client test suite. Intel's SSD DC P3700 on the other hand does remarkably well, thanks in no small part to its excellent performance at low queue depths. A continued focus on IO consistency and performance recovery also help tremendously. In our heaviest client workload (our 2013/Destroyer benchmark), the P3700 takes the crown as the fastest SSD we've ever tested.

At $3/GB, the P3700 is a clear fit for enterprise workloads but I can see that being a bit too pricey for even a high end client PC. The real question is how close can Intel's cheaper, lower endurance NVMe drives get to the P3700's performance. The P3500 and P3600 will be available at much lower price points, and may be able to deliver a good portion of the P3700's performance.

Intel won't be shipping the rest of its lineup for a little while longer, but as soon as we get those drives in house we'll provide an update.

Log in

Don't have an account? Sign up now