Performance Consistency

In our Intel SSD DC S3700 review I introduced a new method of characterizing performance: looking at the latency of individual operations over time. The S3700 promised a level of performance consistency that was unmatched in the industry, and as a result needed some additional testing to show that. The reason we don't have consistent IO latency with SSDs is because inevitably all controllers have to do some amount of defragmentation or garbage collection in order to continue operating at high speeds. When and how an SSD decides to run its defrag and cleanup routines directly impacts the user experience. Frequent (borderline aggressive) cleanup generally results in more stable performance, while delaying that can result in higher peak performance at the expense of much lower worst case performance. The graphs below tell us a lot about the architecture of these SSDs and how they handle internal defragmentation.

To generate the data below I took a freshly secure erased SSD and filled it with sequential data. This ensures that all user accessible LBAs have data associated with them. Next I kicked off a 4KB random write workload across all LBAs at a queue depth of 32 using incompressible data. I ran the test for just over half an hour, no where near what we run our steady state tests for but enough to give me a good look at drive behavior once all spare area filled up.

I recorded instantaneous IOPS every second for the duration of the test. I then plotted IOPS vs. time and generated the scatter plots below. Each set of graphs features the same scale. The first two sets use a log scale for easy comparison, while the last set of graphs uses a linear scale that tops out at 40K IOPS for better visualization of differences between drives.

The high level testing methodology remains unchanged from our S3700 review. Unlike in previous reviews however, I did vary the percentage of the drive that I filled/tested depending on the amount of spare area I was trying to simulate. The buttons are labeled with the advertised user capacity had the SSD vendor decided to use that specific amount of spare area. If you want to replicate this on your own all you need to do is create a partition smaller than the total capacity of the drive and leave the remaining space unused to simulate a larger amount of spare area. The partitioning step isn't absolutely necessary in every case but it's an easy way to make sure you never exceed your allocated spare area. It's a good idea to do this from the start (e.g. secure erase, partition, then install Windows), but if you are working backwards you can always create the spare area partition, format it to TRIM it, then delete the partition. Finally, this method of creating spare area works on the drives we've tested here but not all controllers may behave the same way.

The first set of graphs shows the performance data over the entire 2000 second test period. In these charts you'll notice an early period of very high performance followed by a sharp dropoff. What you're seeing in that case is the drive allocating new blocks from its spare area, then eventually using up all free blocks and having to perform a read-modify-write for all subsequent writes (write amplification goes up, performance goes down).

The second set of graphs zooms in to the beginning of steady state operation for the drive (t=1400s). The third set also looks at the beginning of steady state operation but on a linear performance scale. Click the buttons below each graph to switch source data.

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

Thanks to the EVO's higher default over provisioning, you actually get better consistency out of the EVO than the 840 Pro out of the box. Granted you can get similar behavior out of the Pro if you simply don't use all of the drive. The big comparison is against Crucial's M500, where the EVO does a bit better. SanDisk's Extreme II however remains the better performer from an IO consistency perspective.

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

 

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

Zooming in we see very controlled and frequent GC patterns on the 1TB drive, something we don't see in the 840 Pro. The 250GB drive looks a bit more like a clustered random distribution of IOs, but minimum performance is still much better than on the standard OP 840 Pro.

TRIM Validation

Our performance consistency test actually replaces our traditional TRIM test in terms of looking at worst case scenario performance, but I wanted to confirm that TRIM was functioning properly on the EVO so I dusted off our old test for another go. The test procedure remains unchanged: fill the drive with sequential data, run a 4KB random write test (QD32, 100% LBA range) for a period of time (30 minutes in this case) and use HDTach to visualize the impact on write performance:

Minimum performance drops down to around 30MB/s, eugh. Although the EVO can be reasonably consistent, you'll still want to leave some free space on the drive to ensure that performance always stays high (I recommend 15 - 25% if possible).

A single TRIM pass (quick format under Windows 7) fully restores performance as expected:

The short period of time at 400MB/s is just TurboWrite doing its thing.

RAPID: PCIe-like Performance from a SATA SSD AnandTech Storage Bench 2013
Comments Locked

137 Comments

View All Comments

  • Riven98 - Thursday, July 25, 2013 - link

    Anand,
    Thanks for the great article. I had just been thinking that there had been a downturn in the number of articles like these, which are the main reasons I visit on an almost daily basis.
  • chrnochime - Friday, July 26, 2013 - link

    Still recommending a technology that's known to not last as long as the MLC. Yes the *extropolated* result indicates that its lifetime is far longer than advertised, but really, why when even M500 is not that slow in the first place and cost about the same, why risk going with the TLC? Not to mention Samsung's 830 has its fair share of horror stories as well...
  • watersb - Friday, July 26, 2013 - link

    Excellent review.

    How does write amplification scale as the disk fills up? Wouldn't a full disk fail more rapidly than a half-full one?
  • BobAjob2000 - Tuesday, January 28, 2014 - link

    Hopefully wear leveling and TRIM/garbage collection algorithms should take care of your concerns. They should take existing unchanged 'cold' data and move it around to make way for regularly changed 'hot' data. This should reduce the impact of both data longevity and write amplification as it guides new writes to hit the 'freshest' unused or rarely written blocks on the disk and also helps to ensure that data goes not go 'stale' after being untouched for years. Different vendors use different algorithms that have evolved and improved over time. I think Samsung (being a RAM manufacturer) can possibly provide better RAM caches for their disks that may provide advantages for garbage collection and wear leveling algorithms by improving the available 'thinking space' for the caching and sorting/organizing of 'hot' data.
    Its all to do with managing the 'temperature' of your data somewhat like a data 'weather forecast' which can be very useful in the short term or for simple predictable/settled patterns but less practical for long term or unseasonal data storms.
    Would like to see these things tested by 'what if' scenarios though to demonstrate the differences between different vendors algorithms.
  • xtreme2k - Friday, July 26, 2013 - link

    Can anyone tell me why I am paying 90% of the price for 33% of the endurance of a drive?
  • MrSpadge - Saturday, July 27, 2013 - link

    Because endurance doesn't matter (very likely also for you), but price does.
  • log - Friday, July 26, 2013 - link

    Can you partition this drive and still take advantage of its features? Thnaks
  • Timur Born - Friday, July 26, 2013 - link

    I don't quite understand exactly why the Samsung RAPID software cache brings higher performance in *practice* than Windows' own cache? Using two software caches will lead to the same information being stored in RAM twice or even thrice, which is exactly what the Windows cache tries to avoid since XP days.

    That the usual benchmark programs get fooled is visible, as they think to be working without a software cache. So the higher values ​there are not surprising. But I am a bit puzzled why the Anand Storage Bench results increase, too?! Why is RAPID software caching better than Windows' own cache in this scenario? Or does the ASB bypass Windows' cache, too (like most benchmarks)?

    By the way: ATTO allows the Windows cache to be turned ON for testing. My "old" Crucial M4 256 gets sees very high read results once ATTO makes use of Windows' cache. Only the write rates remain significantly smaller.

    Therefor an ATTO test with combinations of either or both software caches (RAPID and Windows) would be interesting.
  • MrSpadge - Saturday, July 27, 2013 - link

    I think it's because Samsung is being much more agressive with caching than Win dares to be, i.e. it holds files far longer before writing them, so they can be combined more efficiently but are longer at risk of being lost.
  • Timur Born - Sunday, July 28, 2013 - link

    I am not convinced about that yet, especially since you can turn off drive cache flushing via Device-Manager and thus should get an even more aggressive Windows cache behavior than what RAPID offers (which is reported to adhere to Windows' flush commands).

    The Windows cache is designed to keep data in RAM for as long as it's not needed for something else. Even more important, data is *directly* executed from inside the Windows cache instead of being copied back and forth between separate memory regions. This keeps duplication to a minimum (implemented since XP as far as I remember). So at least for reads the Windows cache is very useful, especially in combination with Superfetch, which is *not* disabled for SSDs btw (even Prefetch for the boot phase isn't disabled, but in practice it makes not much of a difference whether you boot with or without Prefetch from an SSD).

    There is something funky going on with Windows' cache and the drive's onboard cache of my Crucial M4 in combination with ATTO (Windows cache enabled). Different block sizes get very different results, with some *larger* block sizes not benefiting from Windows' cache either at read or write, the latter depending on the block size chosen. Turning the drive's own cache flushing on/off via Device-Manager can have an impact on that, too.

    In some cases I get less throughput with Windows cache than without (i.e. 512 kb block size with drive flushing on). This may be an issue of ATTO, though, because I also got some measurements where ATTO claimed a write speed of zero (0)! Turning off either drive cache flushing or the Windows cache or both helps ATTO to get meaningful measurements again.

    So the main question remains: How and why would RAPID affect "real-world" performance on top of the Windows cache and does the Anand Storage Bench deliberately circumvent the Windows cache?

    The reason I was looking at this review was that I am currently looking for a new SSD to build a desktop PC and the 840 EVO looks like the thing to buy. So once I get my hands on one myself I will just try RAPID myself. ;)

Log in

Don't have an account? Sign up now