Performance Consistency

In our Intel SSD DC S3700 review I introduced a new method of characterizing performance: looking at the latency of individual operations over time. The S3700 promised a level of performance consistency that was unmatched in the industry, and as a result needed some additional testing to show that. The reason we don't have consistent IO latency with SSDs is because inevitably all controllers have to do some amount of defragmentation or garbage collection in order to continue operating at high speeds. When and how an SSD decides to run its defrag and cleanup routines directly impacts the user experience. Frequent (borderline aggressive) cleanup generally results in more stable performance, while delaying that can result in higher peak performance at the expense of much lower worst case performance. The graphs below tell us a lot about the architecture of these SSDs and how they handle internal defragmentation.

To generate the data below I took a freshly secure erased SSD and filled it with sequential data. This ensures that all user accessible LBAs have data associated with them. Next I kicked off a 4KB random write workload across all LBAs at a queue depth of 32 using incompressible data. I ran the test for just over half an hour, no where near what we run our steady state tests for but enough to give me a good look at drive behavior once all spare area filled up.

I recorded instantaneous IOPS every second for the duration of the test. I then plotted IOPS vs. time and generated the scatter plots below. Each set of graphs features the same scale. The first two sets use a log scale for easy comparison, while the last set of graphs uses a linear scale that tops out at 40K IOPS for better visualization of differences between drives.

The high level testing methodology remains unchanged from our S3700 review. Unlike in previous reviews however, I did vary the percentage of the drive that I filled/tested depending on the amount of spare area I was trying to simulate. The buttons are labeled with the advertised user capacity had the SSD vendor decided to use that specific amount of spare area. If you want to replicate this on your own all you need to do is create a partition smaller than the total capacity of the drive and leave the remaining space unused to simulate a larger amount of spare area. The partitioning step isn't absolutely necessary in every case but it's an easy way to make sure you never exceed your allocated spare area. It's a good idea to do this from the start (e.g. secure erase, partition, then install Windows), but if you are working backwards you can always create the spare area partition, format it to TRIM it, then delete the partition. Finally, this method of creating spare area works on the drives we've tested here but not all controllers may behave the same way.

The first set of graphs shows the performance data over the entire 2000 second test period. In these charts you'll notice an early period of very high performance followed by a sharp dropoff. What you're seeing in that case is the drive allocating new blocks from its spare area, then eventually using up all free blocks and having to perform a read-modify-write for all subsequent writes (write amplification goes up, performance goes down).

The second set of graphs zooms in to the beginning of steady state operation for the drive (t=1400s). The third set also looks at the beginning of steady state operation but on a linear performance scale. Click the buttons below each graph to switch source data.

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

Thanks to the EVO's higher default over provisioning, you actually get better consistency out of the EVO than the 840 Pro out of the box. Granted you can get similar behavior out of the Pro if you simply don't use all of the drive. The big comparison is against Crucial's M500, where the EVO does a bit better. SanDisk's Extreme II however remains the better performer from an IO consistency perspective.

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

 

  Crucial M500 960GB Samsung SSD 840 EVO 1TB Samsung SSD 840 EVO 250GB SanDisk Extreme II 480GB Samsung SSD 840 Pro 256GB
Default

Zooming in we see very controlled and frequent GC patterns on the 1TB drive, something we don't see in the 840 Pro. The 250GB drive looks a bit more like a clustered random distribution of IOs, but minimum performance is still much better than on the standard OP 840 Pro.

TRIM Validation

Our performance consistency test actually replaces our traditional TRIM test in terms of looking at worst case scenario performance, but I wanted to confirm that TRIM was functioning properly on the EVO so I dusted off our old test for another go. The test procedure remains unchanged: fill the drive with sequential data, run a 4KB random write test (QD32, 100% LBA range) for a period of time (30 minutes in this case) and use HDTach to visualize the impact on write performance:

Minimum performance drops down to around 30MB/s, eugh. Although the EVO can be reasonably consistent, you'll still want to leave some free space on the drive to ensure that performance always stays high (I recommend 15 - 25% if possible).

A single TRIM pass (quick format under Windows 7) fully restores performance as expected:

The short period of time at 400MB/s is just TurboWrite doing its thing.

RAPID: PCIe-like Performance from a SATA SSD AnandTech Storage Bench 2013
Comments Locked

137 Comments

View All Comments

  • Grim0013 - Sunday, July 28, 2013 - link

    I wonder what, if anything, the impact of Turbo Write is on drive endurance, as in, does the SLC buffer have the effect of "shielding" the TLC from some amount of write amplification (WA)? More specifically, I was thinking that in the case of small random writes (high WA), many of them would be going to the SLC first, then when the data is transferred to the TLC, I wonder if the buffering affords the controller the opportunity to write the data is such a way as to reduce WA on the TLC?

    In fact, I wonder if that is something that is done...if the controller is able to characterize certain types of files as being likely the be frequently modified then just keep them in the SLC semi-permanently. Stuff like the page file and other OS stuff that is constantly modified...I'm not very well-versed on this stuff so I'm just guessing. It just seems like taking advantage of SLCs crazy p/e endurance in addition to it's speed could really help make these things bulletproof.
  • shodanshok - Sunday, July 28, 2013 - link

    Yea, I was thinking the same thing. After all, Sandisk already did it on the Ultra Plus and Ultra II SSDs: they have a small pseudo-SLC zone used both for greater performance and reducing WA.
  • shodanshok - Sunday, July 28, 2013 - link

    I am not so exited about RAPID: data integrity is a delicate thing, so I am not so happy to trust Samsung (or others) replacing the key well-tested caching algorithm natively built into the OS.

    Anyway, Windows' write caching is not so quick because the OS, by default, flush its in-memory cache each second. Moreover, it normally issue a barrier event to flush the disk's DRAM cache. This last thing can be disabled, but the flush of the in-memory cache can not be changed, as far I know.

    Linux, on the other side, use much aggressive caching policy: it issue an in-memory cache flush (pagecache) ever 30 seconds, and it aggressively try to coalesce multiple writes into a single transactions. This parameter is configurable using the /proc interface. Moreover, if you have a BBU or power-tolerant disk subsystem, you can even disable the barrier instruction normally issued to the disk's DRAM cache.
  • Timur Born - Sunday, July 28, 2013 - link

    My Windows 8 setup uses quite exactly 1 gb RAM for write caching, regardless of whether it's writing to a 5400 rpm 2.5" HD, 5400 rpm 3.5" HD or Crucial M4 256 gb SSD. That's exactly the size of the RAPID cache. The "flush its cache each second" part becomes a problem when the source and destination are on the same drive, because once Windows starts writing the disk queue starts to climb.

    But even then it should mostly be a problem for spinning HDs that don't really like higher queue numbers. Even more so when you copy multiple files via Windows Explorer, which reads and write files concurrently even on spinning HDs.

    So I wonder if RAPID's only real advantage is its feature to coalesce multiple small writes into single big ones for durations longer than one second?!
  • Timur Born - Sunday, July 28, 2013 - link

    By the way, my personal experience is that CPU power saving features, as set up in both in the default "Balanced" and the "High Performance" power-profiles, have far more of an impact on SSD performance than caching stuff. I can up my M4' 4K random performance by 60% and more just by messing with CPU power savings to be less aggressive (or off).
  • shodanshok - Monday, July 29, 2013 - link

    If I correctly remember, Windows use at most 1/8 of total RAM size for write caching. How much RAM did you have?
  • Timur Born - Tuesday, July 30, 2013 - link

    8 gb, so you may be correct. Or you may mix it up with the 1/8 part of dirty cache that is being flushed by the Windows cache every second. Or both may be 1/8. ;-)
  • zzz777 - Monday, July 29, 2013 - link

    I'm interested in caching writes to a ram disk then to storage. This reminds me if the concept of a write-back cache: for almost everyone The possibility of data corruption is so low that there's no reason not to enable it: can this ssd ramdisk write quickly enough that home users also don't have to worry about using it? Beyond that I'm not a normal home user, I want to see benchmarks for virtualization, I want the quickest way to create, modify and test a vm before putting it on front life hardware
  • Wwhat - Monday, July 29, 2013 - link

    I for me still say: I rather go for the pro version.
  • andreaciri - Thursday, August 1, 2013 - link

    i have to decide if buy an 840 now, or an EVO when it will be available, for my macbook. considering that RAPID technology is only supported under windows, and that i'm more interested in read performance than write, is 840 a good choice?

Log in

Don't have an account? Sign up now