The Trim Command: Coming Soon to a Drive Near You

We run into these problems primarily because the drive doesn’t know when a file is deleted, only when one is overwritten. Thus we lose performance when we go to write a new file at the expense of maintaining lightning quick deletion speeds. The latter doesn’t really matter though, now does it?

There’s a command you may have heard of called TRIM. The command would require proper OS and drive support, but with it you could effectively let the OS tell the SSD to wipe invalid pages before they are overwritten.

The process works like this:

First, a TRIM-supporting OS (e.g. Windows 7 will support TRIM at some point) queries the hard drive for its rotational speed. If the drive responds by saying 0, the OS knows it’s a SSD and turns off features like defrag. It also enables the use of the TRIM command.

When you delete a file, the OS sends a trim command for the LBAs covered by the file to the SSD controller. The controller will then copy the block to cache, wipe the deleted pages, and write the new block with freshly cleaned pages to the drive.

Now when you go to write a file to that block you’ve got empty pages to write to and your write performance will be closer to what it should be.

In our example from earlier, here’s what would happen if our OS and drive supported TRIM:

Our user saves his 4KB text file, which gets put in a new page on a fresh drive. No differences here.

Next was a 8KB JPEG. Two pages allocated; again, no differences.

The third step was deleting the original 4KB text file. Since our drive now supports TRIM, when this deletion request comes down the drive will actually read the entire block, remove the first LBA and write the new block back to the flash:


The TRIM command forces the block to be cleaned before our final write. There's additional overhead but it happens after a delete and not during a critical write.

Our drive is now at 40% capacity, just like the OS thinks it is. When our user goes to save his 12KB JPEG, the write goes at full speed. Problem solved. Well, sorta.

While the TRIM command will alleviate the problem, it won’t eliminate it. The TRIM command can’t be invoked when you’re simply overwriting a file, for example when you save changes to a document. In those situations you’ll still have to pay the performance penalty.

Every controller manufacturer I’ve talked to intends on supporting TRIM whenever there’s an OS that takes advantage of it. The big unknown is whether or not current drives will be firmware-upgradeable to supporting TRIM as no manufacturer has a clear firmware upgrade strategy at this point.

I expect that whenever Windows 7 supports TRIM we’ll see a new generation of drives with support for the command. Whether or not existing drives will be upgraded remains to be seen, but I’d highly encourage it.

To the manufacturers making these drives: your customers buying them today at exorbitant prices deserve your utmost support. If it’s possible to enable TRIM on existing hardware, you owe it to them to offer the upgrade. Their gratitude would most likely be expressed by continuing to purchase SSDs and encouraging others to do so as well. Upset them, and you’ll simply be delaying the migration to solid state storage.

Free Space to the Rescue Restoring Your Drive to Peak Performance
POST A COMMENT

347 Comments

View All Comments

  • Basilisk - Wednesday, March 18, 2009 - link

    I think your concerns parallel mine, allbeit we have different conclusions.

    Parag.1: I think you misunderstand the ERASE concept: as I read it, after an ERASE parts of the block are re-written and parts are left erased -- those latter parts NEED NOT be re-erased before they are written, later. If the TRIM function can be accomplished at an idle moment, access time will be "saved"; if the TRIM can erase (release) multiple clusters in one block [unlikely?], that will reduce both wear & time.

    Parag.2: This argument reverses the concept that OS's should largely be ignorant about device internals. As devices with different internal structures have proliferated over the years -- and will continue so with SSD's -- such OS differentiation is costly to support.

    Parag 3 and onwards: Herein lies the problem: we want to save wear by not re-writing files to make them contiguous, but we now have a situation where wear and erase times could be considerably reduced by having those files be contiguous. A 2MB file fragmented randomly in 4KB clusters will result in around 500 erase cycles when it's deleted; if stored contiguously, that would only require 4-5 erase cycles (of 512KB SSD-blocks)... a 100:1 reduction in erases/wear.

    It would be nice to get the SSD blocks down to 4KB in size, but I have to infer there are counter arguments or it would've been done already.

    With current SSDs, I'd explore using larger cluster sizes -- and here we have a clash with MS [big surprise]. IIRC, NTFS clusters cannot exceed 4KB [for something to do with file compression!]. That makes it possible that FAT32 with 32KB clusters [IIRC clusters must be less than 64KB for all system tools to properly function] might be the best choice for systems actively rewriting large files. I'm unfamiliar with FAT32 issues that argue against this, but if the SSD's allocate clusters contiguously, wouldn't this reduce erases by a factor of 8 for large file deletions? 32KB clusters might ham-string caching efficiency and result in more disk accesses, but it might speed-up linear reads and s/w loads.

    The impact of very small file/directory usage and for small incremental file changes [like appending to logs] wouldn't be reduced -- it might be increased as data-transfer sizes would increase -- so the overall gain for having fewer clusters-per-SSD-block is hard to intuit, and it would vary in different environments.
    Reply
  • GourdFreeMan - Wednesday, March 18, 2009 - link

    RE Parag. 1: As I understand it, the entire 512 KiB block must always be erased if there is even a single page of valid data written to it... hence my concerns. You may save time reading and writing data if the device could know a block were partially full, but you still suffer the 2ms erase penalty. Please correct me if I am mistaken in my assumption.

    RE Parag. 2: The problem is the SSD itself only knows the physical map of empty and used space. It doesn't have any knowledge of the logical file system. NTFS, FAT32, ext3 -- it doesn't matter to the drive, that is the OS'es responsibility.

    RE Parag. 3: I would hope that reducing the physical block size would also reduce the block erase time from 2ms, but I am not a flash engineer and so cannot comment. One thing I can state for certain, however is that moving to smaller physical block sizes would not increase wear across the surface of the drive, except possibly for the necessity to keep track of a map of used blocks. Rewriting 128 blocks on a hypothetical SSD with 4 KiB blocks versus 1 512 KiB block still erases 512 KiB of disk space (excepting the overhead in tracking which blocks are filled).

    Regarding using large filesystem clusters: 4 KiB clusters offer a nice tradeoff between filesystem size, performance and slack (lost space due to cluster size). If you wanted to make an SSD look artificially good versus a hard drive, a 512 KiB cluster size would do so admirably, but no one would use such a large cluster size except for a data drive used to store extremely large files (e.g. video) exclusively. BTW, in case you are unaware, you can format a non-OS partition with NTFS to cluster sizes other than 4 KiB. You can also force the OS to use a different cluster size by first formating the drive for the OS as a data drive with a different cluster size under Windows and then installing Windows on that partition. I have a 2 KiB cluster size on a drive that has many hundreds of thousands of small files. However, I should note that since virtual memory pages are by default 4 KiB (another compelling reason for the 4 KiB default cluster size), most people don't have a use for other cluster sizes if they intend to have a page file on the drive.
    Reply
  • ssj4Gogeta - Wednesday, March 18, 2009 - link

    Thanks for the wonderful article. And yes, I read every single word. LOL Reply
  • rudolphna - Wednesday, March 18, 2009 - link

    Hey anand, page 3, the random read latency graph, they are mixed up. it is listed as the WD Velociraptor having a .11ms latency, I think you might want to fix that. :) Reply
  • SkullOne - Wednesday, March 18, 2009 - link

    Fantastic article. Definitely one of the best I've read in a long time. Incredibly informative. Everyone who reads this article is a little bit smarter afterwards.

    All the great information about SSDs aside, I think the best part though is how OCZ is willing to take blame for failure earlier and fix the problems. Companies like that are the ones who will get my money in the future especially when it is time for me to move from HDD to SSD.
    Reply
  • Apache2009 - Wednesday, March 18, 2009 - link

    i got one Vertex SSD. Why suspend will cause system halt ? My laptop is nVidia chipset and it is work fine with HDD. Somebody know it ? Reply
  • MarcHFR - Wednesday, March 18, 2009 - link

    Hi,

    You wrote that there is spare-area on X25-M :

    "Intel ships its X25-M with 80GB of MLC flash on it, but only 74.5GB is available to the user"

    It's a mistake. 80 GB of Flash look like 74.5GB for the user because 80,000,000,000 bytes of flash is 74.5 Go for the user point of view (with 1 KB = 1024 byte).

    You did'nt point out the other problem of the X25-M : LBA "optimisation". After doing a lot of I/O random write the speed in sequential write can get down to only 10 MB /s :/
    Reply
  • Kary - Thursday, March 19, 2009 - link

    The extra space would be invisible to the end user (it is used internally)

    Also, addressing is normally done in binary..as a result actual sizes are typically in binary in memory devices (flash, RAM...):
    64gb
    128gb

    80 GB...not compatible with binary addressing

    (though 48GB of a 128GB drive being used for this seems pretty high)
    Reply
  • ssj4Gogeta - Wednesday, March 18, 2009 - link

    Did you bother reading the article? He pointed out that you can get any SSD (NOT just Intel's) stuck into a situation when only a secure erase will help you out. The problem is not specific to Intel's SSD, and it doesn't occur during normal usage. Reply
  • MarcHFR - Wednesday, March 18, 2009 - link

    The problem i've pointed out has nothing to do with the performance dregradation related to the write on a filled page, it's a performance degradation related to an LBA optimisation that is specific to Intel SSD.
    Reply

Log in

Don't have an account? Sign up now