TurboWrite: MLC Performance on a TLC Drive

All NAND trends towards lower performance as we move down to smaller process geometries. Clever architectural tricks are what keep overall SSD performance increasing each generation, but if you look at Crucial's M500 you'll see that it's not always possible to do. Historically, whenever a level of the memory hierarchy got too slow, the industry would more or less agree to insert another level above it to help hide latency. The problem is exascerbated once you start talking about TLC NAND. Samsung's mitigation to the problem is to dedicate a small portion of each TLC NAND die as an SLC write buffer. The feature is called TurboWrite. Initial writes hit the TurboWrite buffer at very low latency and are quickly written back to the rest of the TLC NAND array.

Since the amount of spare area available on the EVO varies depending on capacity, TurboWrite buffer size varies with capacity. The smallest size is around 3GB while the largest is 12GB on the 1TB EVO:

Samsung SSD 840 EVO TurboWrite Buffer Size vs. Capacity
  120GB 250GB 500GB 750GB 1TB
TurboWrite Buffer Size 3GB 3GB 6GB 9GB 12GB

I spent some time poking at the TurboWrite buffer and it pretty much works the way you'd expect it to. Initial writes hit the buffer first, and as long as they don't exceed the size of the buffer the performance you get is quite good. If your writes stop before exceeding the buffer size, the buffer will write itself out to the TLC NAND array. You need a little bit of idle time for this copy to happen, but it tends to go pretty quickly as it's just a sequential move of data internally (we're talking about a matter of 15 - 30 seconds). Even before the TurboWrite buffer is completely emptied, you can stream new writes into the buffer. It all works surprisingly well. For most light use cases I can see TurboWrite being a great way to deliver more of an MLC experience but on a TLC drive.

TurboWrite's impact is best felt on the lower capacity drives that don't have as many NAND die to stripe requests across (thus further hiding long program latencies). The chart below shows sequential write performance vs. time for all of the EVO capacities. The sharp drop in performance on each curve is when the TurboWrite buffer is exceeded and sequential writes start streaming to the TLC NAND array instead:

On the 120GB drive the delta between TurboWrite and standard performance is huge. On the larger drives the drop isn't as big and the TurboWrite buffer is also larger, the combination of the two is why the impact isn't felt as muchon those drives. It's this TurboWrite buffer that gives the EVO its improvement in max sequential write speed over last year's vanilla SSD 840.

Endurance: Not a Problem Even at 19nm RAPID: PCIe-like Performance from a SATA SSD
Comments Locked

137 Comments

View All Comments

  • Grim0013 - Sunday, July 28, 2013 - link

    I wonder what, if anything, the impact of Turbo Write is on drive endurance, as in, does the SLC buffer have the effect of "shielding" the TLC from some amount of write amplification (WA)? More specifically, I was thinking that in the case of small random writes (high WA), many of them would be going to the SLC first, then when the data is transferred to the TLC, I wonder if the buffering affords the controller the opportunity to write the data is such a way as to reduce WA on the TLC?

    In fact, I wonder if that is something that is done...if the controller is able to characterize certain types of files as being likely the be frequently modified then just keep them in the SLC semi-permanently. Stuff like the page file and other OS stuff that is constantly modified...I'm not very well-versed on this stuff so I'm just guessing. It just seems like taking advantage of SLCs crazy p/e endurance in addition to it's speed could really help make these things bulletproof.
  • shodanshok - Sunday, July 28, 2013 - link

    Yea, I was thinking the same thing. After all, Sandisk already did it on the Ultra Plus and Ultra II SSDs: they have a small pseudo-SLC zone used both for greater performance and reducing WA.
  • shodanshok - Sunday, July 28, 2013 - link

    I am not so exited about RAPID: data integrity is a delicate thing, so I am not so happy to trust Samsung (or others) replacing the key well-tested caching algorithm natively built into the OS.

    Anyway, Windows' write caching is not so quick because the OS, by default, flush its in-memory cache each second. Moreover, it normally issue a barrier event to flush the disk's DRAM cache. This last thing can be disabled, but the flush of the in-memory cache can not be changed, as far I know.

    Linux, on the other side, use much aggressive caching policy: it issue an in-memory cache flush (pagecache) ever 30 seconds, and it aggressively try to coalesce multiple writes into a single transactions. This parameter is configurable using the /proc interface. Moreover, if you have a BBU or power-tolerant disk subsystem, you can even disable the barrier instruction normally issued to the disk's DRAM cache.
  • Timur Born - Sunday, July 28, 2013 - link

    My Windows 8 setup uses quite exactly 1 gb RAM for write caching, regardless of whether it's writing to a 5400 rpm 2.5" HD, 5400 rpm 3.5" HD or Crucial M4 256 gb SSD. That's exactly the size of the RAPID cache. The "flush its cache each second" part becomes a problem when the source and destination are on the same drive, because once Windows starts writing the disk queue starts to climb.

    But even then it should mostly be a problem for spinning HDs that don't really like higher queue numbers. Even more so when you copy multiple files via Windows Explorer, which reads and write files concurrently even on spinning HDs.

    So I wonder if RAPID's only real advantage is its feature to coalesce multiple small writes into single big ones for durations longer than one second?!
  • Timur Born - Sunday, July 28, 2013 - link

    By the way, my personal experience is that CPU power saving features, as set up in both in the default "Balanced" and the "High Performance" power-profiles, have far more of an impact on SSD performance than caching stuff. I can up my M4' 4K random performance by 60% and more just by messing with CPU power savings to be less aggressive (or off).
  • shodanshok - Monday, July 29, 2013 - link

    If I correctly remember, Windows use at most 1/8 of total RAM size for write caching. How much RAM did you have?
  • Timur Born - Tuesday, July 30, 2013 - link

    8 gb, so you may be correct. Or you may mix it up with the 1/8 part of dirty cache that is being flushed by the Windows cache every second. Or both may be 1/8. ;-)
  • zzz777 - Monday, July 29, 2013 - link

    I'm interested in caching writes to a ram disk then to storage. This reminds me if the concept of a write-back cache: for almost everyone The possibility of data corruption is so low that there's no reason not to enable it: can this ssd ramdisk write quickly enough that home users also don't have to worry about using it? Beyond that I'm not a normal home user, I want to see benchmarks for virtualization, I want the quickest way to create, modify and test a vm before putting it on front life hardware
  • Wwhat - Monday, July 29, 2013 - link

    I for me still say: I rather go for the pro version.
  • andreaciri - Thursday, August 1, 2013 - link

    i have to decide if buy an 840 now, or an EVO when it will be available, for my macbook. considering that RAPID technology is only supported under windows, and that i'm more interested in read performance than write, is 840 a good choice?

Log in

Don't have an account? Sign up now