Management Granularity

Much of Apple’s marketing on Fusion Drive talks about moving data at the file and application level, but in reality data can be moved between the SSD and HDD portions in 128KB blocks.

Ars actually confirmed this a while ago, but I wanted to see for myself. Using fs_usage I got to see the inner workings of Apple's Fusion Drive. Data is moved between drives in 128KB blocks, likely determined by frequency of use of those blocks. Since client workloads tend to be fairly sequential (or pseudo-random at worst) in nature, it's a safe bet that if you're accessing a single LBA within a 128KB block that you're actually going to be accessing more LBAs in the same space. The migration process seems to happen mostly during idle periods, although I have seen some movement between drives during light IO.

What’s very interesting is just how quickly the migration is triggered after a transfer occurs. As soon as file copy/creation, application launch or other IO activity completes, there’s immediate back and forth between the SSD and HDD. As you fill up the Fusion Drive, the amount of data moved between the SSD and HDD shrinks considerably. Over time I suspect this is what should happen. Infrequently accessed data should settle on the hard drive and what really matters will stay on the SSD. Apple being less aggressive about evicting data from the SSD as the Fusion Drive fills up makes sense.

The migration process itself is pretty simple with data being marked for promotion/demotion, it being physically copied to the new storage device and only then is it moved. In the event of a power failure during migration there shouldn't be any data loss caused by the Fusion Drive, it looks like only after two copies of the 128KB block are in place is the source block removed. Apple told me as much last year, but it's good to see it for myself.

By moving data in 128KB blocks between the HDD and SSD, Apple enjoys the side benefit of partially defragmenting the SSD with all writes to it. Even though the Fusion Drive will prefer the SSD for all incoming writes (which can include smaller than 128KB, potentially random/pseudo-random writes), any migration from the HDD to the SSD happens as large block sequential writes, which will trigger a garbage collection/block recycling routine in cases of a heavily fragmented drive. Performance of the SSD can definitely degrade over time, but this helps keep it higher than it would otherwise given that the SSD is almost always running at full capacity and the recipient of all sorts of unrelated writes. As I mentioned earlier, I would’ve preferred a controller with more consistent IO latency or for Apple to set aside even more of the PM830’s NAND as spare area. I suspect cost was the deciding factor in sticking with the standard amount of overprovisioning.

Fusion Drive: Under the Hood The Application Experience
Comments Locked

127 Comments

View All Comments

  • tipoo - Friday, January 18, 2013 - link

    To your last point Name99, indeed they will.
  • name99 - Friday, January 18, 2013 - link

    As compared to all those other tablets out there with 128 and 256GB of storage? Like uuh, huh, wait, the names will come to me...

    When EVERYONE is doing things a certain way, not just Apple, it may be worth asking if there are other issues going on here (limited manufacturing capacity and exploding demand, for one) rather than immediately assuming Apple is out to screw you.
  • Death666Angel - Friday, January 18, 2013 - link

    Tons of Archos stuff, Samsung XE700, Gigabyte and Dell tablets etc. have >120GB storage.
  • name99 - Friday, January 18, 2013 - link

    So in other words the tablets that are trying to be laptop replacements, and that have to cope with the massive footprint of Windows 8.

    You may consider this to be proof against my point; I don't.
  • Hrel - Friday, January 18, 2013 - link

    "You can create Boot Camp or other additional partitions on a Fusion Drive, however these partitions will reside on the HDD portion exclusively."

    So you CAN create a Boot Camp partition on a Fusion Drive, it just won't utilize the SSD portion of that fusion drive at all. Or am I not understanding you?
  • Hrel - Friday, January 18, 2013 - link

    *facepalm, I read "you can't create..." nm me... whistle whistle whistle
  • Shadowmaster625 - Friday, January 18, 2013 - link

    May as well take that $400 to downtown detroit...

    Seriously though why in blazes are HDD manufacturers having such a hard time with this. How hard is it just to throw 4GB of SLC onto the little circuit board of a 1TB HDD? Yes, all you need is 4GB. The controller simply needs to perform a very simple algorithm... If the file you are writing is greater than 4MB in size, write directly to the HDD. It is a large sequential write and thus HDD performance will be adequate. If its a small write (< 4MB), write that to the SLC cache. That one tiny little optimization will get you 90% of the performance of a Vertex 4. (Depending on the bandwidth of this 4GB of SLC of course). But really it doesnt need to be as fast as a vertex 4. It just needs to be in that ballpark, for small random I/O. Large sequential I/O can just skip the NAND altogether.
  • Ben90 - Friday, January 18, 2013 - link

    Lol, stupid. System32 and SysWOW64 would fill your NAND on installation.
  • Hrel - Friday, January 18, 2013 - link

    Those entire folders wouldn't go on the NAND, they'd go on the HDD. Read the article on here about the MomentusXT from Seagate.
  • Hrel - Friday, January 18, 2013 - link

    found it for you http://www.anandtech.com/show/5160/seagate-2nd-gen...

Log in

Don't have an account? Sign up now