Performance Consistency

In our Intel SSD DC S3700 review we introduced a new method of characterizing performance: looking at the latency of individual operations over time. The S3700 promised a level of performance consistency that was unmatched in the industry, and as a result needed some additional testing to show that. The reason we don't have consistent IO latency with SSDs is because inevitably all controllers have to do some amount of defragmentation or garbage collection in order to continue operating at high speeds. When and how an SSD decides to run its defrag and cleanup routines directly impacts the user experience. Frequent (borderline aggressive) cleanup generally results in more stable performance, while delaying that can result in higher peak performance at the expense of much lower worst case performance. The graphs below tell us a lot about the architecture of these SSDs and how they handle internal defragmentation.

To generate the data below I took a freshly secure erased SSD and filled it with sequential data. This ensures that all user accessible LBAs have data associated with them. Next I kicked off a 4KB random write workload at a queue depth of 32 using incompressible data. I ran the test for just over half an hour, no where near what we run our steady state tests for but enough to give me a good look at drive behavior once all spare area filled up.

I recorded instantaneous IOPS every second for the duration of the test. I then plotted IOPS vs. time and generated the scatter plots below. Each set of graphs features the same scale. The first two sets use a log scale for easy comparison, while the last set of graphs uses a linear scale that tops out at 40K IOPS for better visualization of differences between drives.

The first set of graphs shows the performance data over the entire 2000 second test period. In these charts you'll notice an early period of very high performance followed by a sharp dropoff. What you're seeing in that case is the drive allocating new blocks from its spare area, then eventually using up all free blocks and having to perform a read-modify-write for all subsequent writes (write amplification goes up, performance goes down).

The second set of graphs zooms in to the beginning of steady state operation for the drive (t=1400s). The third set also looks at the beginning of steady state operation but on a linear performance scale. Click the buttons below each graph to switch source data.


         

     

Wow, that's bad. While we haven't run the IO consistency test on all the SSDs we have in our labs, the M5 Pro is definitely the worst one we have tested so far. In less than a minute the M5 Pro's performance drops below 100, which at 4KB transfer size is equal to 0.4MB/s. What makes it worse is that the drops are not sporadic but in fact most of the IOs are in the magnitude of 100 IOPS. There are singular peak transfers that happen at 30-40K IOPS but the drive consistently performs much worse.

Even bigger issue is that over-provisioning the drive more doesn't bring any relief. As we discovered in our performance consistency article, giving the controller more space for OP usually made the performance much more consistent, but unfortunately this doesn't apply to the M5 Pro. It does help a bit as it takes longer for the drive to enter steady-state and there are more IOs happening in the ~40K IOPS range, but the fact is that most IO are still handicapped to 100 IOPS.

The next set of charts look at the steady state (for most drives) portion of the curve. Here we'll get some better visibility into how everyone will perform over the long run.


         

     

Concentrating on the final part of the test doesn't really bring anything new because as we saw in the first graph already, the M5 Pro reaches steady-state very quickly and the performance stays about the same throughout the test. The peaks are actually high compared to other SSDs but having one IO transfer at 3-5x the speed every now and then won't help if over 90% of the transfers are significantly slower.


         

     

The Firmware and Test Final Words
Comments Locked

46 Comments

View All Comments

  • jwilliams4200 - Tuesday, December 11, 2012 - link

    I think it is a worthwhile test. At the very least, it is always interesting to see how products react when you hit them very hard. Sometimes you can expose hidden defects that way. Sometimes you can get a better idea of how the product operates under stress (which may sometimes be extrapolated to understand how it operates under lighter workloads). Since SSD reviewers generally only have the product for a few days before publishing a review, putting the equivalent of weeks (or months) of wear on the SSDs in a few days requires hitting them as hard as possible. And there will always be a few users who will subject their SSDs to extremely heavy workloads, so it will be relevant to a few users.

    As long as the review mentions that the specific extreme workload being tested is unlikely to match that of the majority of consumers, I think the sustained heavy workload is a valuable component of all SSD reviews.
  • JellyRoll - Tuesday, December 11, 2012 - link

    Without a filesystem or TRIM that the testing is merely pointing out anomalies in SSD performance. Full span writes in particular reduce the performance in many aspects such as GC.
    These SSDs are tailored to be used in consumer environments with consumer workloads. This is the farthest thing from a consumer workload that you can possibly get.
    The firmware is designed to operate in a certain manner, with filesystems and TRIM functions. They are also optimized for low QD usage and scheduled GC/maintenance that typically occurs during idle/semi-idle times.
    Pounding this with sustained and unreal workloads is like saying "Hey, if we test it for something it wasn't designed for it doesn't work well !!!!.
    Surprise. Of course it doesn't.
    Testing against the grain merely shows odd results that will never be observed in real life usage.
    This is a consumer product. This SSD is among the best in testing that is actually semi-relevant (though the trace testing isnt conducted with TRIM or a filesystem either, as disclosed by the staff) but the 'consistency testing' places it among the worst.
    They aren't allowing the SSD to function as it was designed to, then complain that it has bad performance.
    Kristan even specifically states that users will notice hiccups in performance. unreal.
  • jwilliams4200 - Tuesday, December 11, 2012 - link

    Wrong again.

    There is no one way that an SSD should work, unless you want to say that it is to store and retrieve data written to LBAs. Since there are many ways SSDs can be used, and many different types of filesystems, it is absurd to say that doing a general test to the raw device is irrelevant.

    On the contrary, it is clearly relevant and often useful, for the reasons I already explained. In many cases, it is even more relevant than picking one specific filesystem and testing with that, since any quirks of that filesystem could be irrelevant to usage with other filesystems. Besides, anandtech already does tests with a common filesystem (NTFS), so the tests you are so upset about are merely additional information that can be used or ignored.
  • JellyRoll - Tuesday, December 11, 2012 - link

    Anand does not do testing with a filesystem . The trace program operates without a filesystem or the benefit of TRIM. It also substitutes its own data in place of the actual data used by the programs during recording. This leads to incorrect portrayals of system performance when dealing with SSDs that rely upon compression, and also any type of SSD in general since it isn't utilizing a filesystem or TRIM. Utilizing raw I/O to attempt to emulate recordings of user activity on an actual filesystem is an apples to oranges comparison.
    There is no 'wrong again' to it. SSDs do store and retrieve data from LBAs, but they are reliant upon the filesystem to issue the TRIM command. Without a filesystem there is no TRIM command. Therefore it is unrealistic testing not relevant to the device tested. The SSDs are tuned for consumer/light workloads, and their internal housekeeping and management routines are adjusted accordingly.
  • jwilliams4200 - Tuesday, December 11, 2012 - link

    Wrong that it is totally irrelevant. Wrong again that the SSDs are only designed for certain workloads. Wrong that there is no TRIM without a filesystem.

    Also wrong that anandtech does not do (any) testing with a filesystem. Some of the tests they do can only be run with a filesystem.
  • JellyRoll - Tuesday, December 11, 2012 - link

    You are right that they do some very limited testing with filesystems, such as the ATTO testing. They do use Iometer, though it is possible to test without a filesystem in Iometer, so we aren't sure if they are or not.
    Their trace testing does not use a filesystem, and this has been publicly acknowledged. Recording filesystem usage and replaying it without a filesystem, and with different data, does not make sense.
    The firmware on SSDs can be tailored to handle tasks, such as GC, during certain times. This is determined upon the intended usage model for the SSD. The firmware is also tuned for certain types of access, which explains the huge differences in performance between some firmwares (reference M4 firmwares). This is SSD 101.
    TRIM requires that the SSD be informed of which data is now deleted, which is a function of the filesystem.
    Repeating the word 'wrong' isn't doing much to further your argument or actually prove anything.
  • jwilliams4200 - Tuesday, December 11, 2012 - link

    I don't know what else to call it other than wrong. Would you prefer correctionally challenged? :-)

    TRIM can be done without a filesystem. I'm not sure why you seem to think filesystems are magical. A filesystem is just a set of routines that handle the interface between the programs using filesystem calls and the raw block device. But there is nothing stopping a program from sending TRIM commands directly to the device. I use TRIM in my tests without a filesystem (hdparm is useful for that).

    By the way, it seems to me that we have lost track of the important point here, which is that tests like we are discussing, without a filesystem and with sustained heavy write loads, are NOT meant to be given much weight by people with typical consumer workloads who are comparing SSDs. However, that does not mean that the test is irrelevant or a waste of time. When the test results are used correctly (as I explained earlier), in combination with results from other real-world tests, they provide a useful addition to our knowledge about each SSD.

    Please don't assume that I am arguing that these tests are the only tests that should be done, or even that these are the most important tests. They are not. But they are also NOT irrelevant.
  • JellyRoll - Wednesday, December 12, 2012 - link

    Yes, you can manually send a TRIM command via a number of techniques. However, this is no substitute for real time TRIM.
    For instance, when replaying a trace (such as those used in the bench here) the filesystem will be tagging data as deleted, and issuing TRIM commands in real time. Then it is up to the SSD to process these commands.
    This makes a tremendous impact upon the performance of the SSD, as it is intended to. Replaying traces without this crucial element leads to disastrously wrong results. Different SSDs handle TRIM commands better, or worse, than others. As a matter of fact some SSDs have had to get 'exceptions' from Windows because they do not handle the TRIM commands within a specified time range as dictated by spec (SandForce). So there is much more to TRIM than meets the eye, and it has a tremendous impact upon performance. Otherwise it simply would not exist.
    Running traces that are recorded with the benefit of TRIM without the benefit of TRIM is what you end up with.
    It has already been publicly acknowledged that anands trace testing does not have the benfit of TRIM, of course it doesnt, the SSDs do not have a filesystem for deletion issuance and command.
    SO yes, irrelevant incorrect trace testing.
    Yes, irrelevant and incorrect 'consistency testing' of consumer SSDs which are designed to operate with TRIM, and state such in their specifications. Pointing out errata on consumer SSDs revealed outside of intended usage is irresponsible.
  • jwilliams4200 - Wednesday, December 12, 2012 - link

    In its default configuration, Windows does not issue TRIM commands for deleted files until you empty the trash (or the size of the trash exceeds some amount). So any trace that is "issuing TRIM commands in realtime" is not especially realistic. Besides, some write-intensive workloads do not delete files at all, so there would not be any TRIM commands to be issued (eg., some types of database files, some types of VM files)

    Your problem is that you are making assumptions about how SSDs will used, and then saying any usage that does not follow your assumptions is irrelevant. As long as you continue to do that, you will continue to be wrong.
  • JellyRoll - Wednesday, December 12, 2012 - link

    These are not assumptions: it is the intended market. These SSDs are designed and sold in the consumer market. period.
    There is a separate class of SSDs designed for these workloads, and they are designed and sold in the enterprise market. Thus the distinction, and name : "Enterprise SSDs".
    That is not hard to figure out. There are two classes simply due to the fact that the devices are tailored for their intended market, and usage model. I do not know how to explain this in more simple terms so that you may 'get it'.
    Once you begin speaking of the lack of TRIM commands in VM and database files surely something 'clicks' that you are beginning to speak of enterprise workloads.
    Consumer SSDs=designed to work in consumer environment (thus the name) with TRIM and other functionality.

Log in

Don't have an account? Sign up now