Live Long and Prosper: The Logical Page

Computers are all about abstraction. In the early days of computing you had to write assembly code to get your hardware to do anything. Programming languages like C and C++ created a layer of abstraction between the programmer and the hardware, simplifying the development process. The key word there is simplification. You can be more efficient writing directly for the hardware, but it’s far simpler (and much more manageable) to write high level code and let a compiler optimize it.

The same principles apply within SSDs.

The smallest writable location in NAND flash is a page; that doesn’t mean that it’s the largest size a controller can choose to write. Today I’d like to introduce the concept of a logical page, an abstraction of a physical page in NAND flash.

Confused? Let’s start with a (hopefully, I'm no artist) helpful diagram:

On one side of the fence we have how the software views storage: as a long list of logical block addresses. It’s a bit more complicated than that since a traditional hard drive is faster at certain LBAs than others but to keep things simple we’ll ignore that.

On the other side we have how NAND flash stores data, in groups of cells called pages. These days a 4KB page size is common.

In reality there’s no fence that separates the two, rather a lot of logic, several busses and eventually the SSD controller. The latter determines how the LBAs map to the NAND flash pages.

The most straightforward way for the controller to write to flash is by writing in pages. In that case the logical page size would equal the physical page size.

Unfortunately, there’s a huge downside to this approach: tracking overhead. If your logical page size is 4KB then an 80GB drive will have no less than twenty million logical pages to keep track of (20,971,520 to be exact). You need a fast controller to sort through and deal with that many pages, a lot of storage to keep tables in and larger caches/buffers.

The benefit of this approach however is very high 4KB write performance. If the majority of your writes are 4KB in size, this approach will yield the best performance.

If you don’t have the expertise, time or support structure to make a big honkin controller that can handle page level mapping, you go to a larger logical page size. One such example would involve making your logical page equal to an erase block (128 x 4KB pages). This significantly reduces the number of pages you need to track and optimize around; instead of 20.9 million entries, you now have approximately 163 thousand. All of your controller’s internal structures shrink in size and you don’t need as powerful of a microprocessor inside the controller.

The benefit of this approach is very high large file sequential write performance. If you’re streaming large chunks of data, having big logical pages will be optimal. You’ll find that most flash controllers that come from the digital camera space are optimized for this sort of access pattern where you’re writing 2MB - 12MB images all the time.

Unfortunately, the sequential write performance comes at the expense of poor small file write speed. Remember that writing to MLC NAND flash already takes 3x as long as reading, but writing small files when your controller needs large ones worsens the penalty. If you want to write an 8KB file, the controller will need to write 512KB (in this case) of data since that’s the smallest size it knows to write. Write amplification goes up considerably.

Remember the first OCZ Vertex drive based on the Indilinx Barefoot controller? Its logical page size was equal to a 512KB block. OCZ asked for a firmware that enabled page level mapping and Indilinx responded. The result was much improved 4KB write performance:

Iometer 4KB Random Writes, IOqueue=1, 8GB sector space Logical Block Size = 128 pages Logical Block Size = 1 Page
Pre-Release OCZ Vertex 0.08 MB/s 8.2 MB/s

A Quick Flash Refresher The Cleaning Lady and Write Amplification
Comments Locked

295 Comments

View All Comments

  • zodiacfml - Wednesday, September 2, 2009 - link

    Very informative, answered more than anything in my mind. Hope to see this again in the future with these drive capacities around $100.
  • mgrmgr - Wednesday, September 2, 2009 - link

    Any idea if the (mid-Sept release?) OCZ Colossus's internal RAID setup will handle the problem of RAID controllers not being able to pass Windows 7's TRIM command to the SSD array. I'm intent on getting a new Photoshop machine with two SSDs in Raid-0 as soon as Win7 releases, but the word here and elsewhere so far is that RAID will block the TRIM function.
  • kunedog - Wednesday, September 2, 2009 - link

    All the Gen2 X-25M 80GB drives are apparently gone from Newegg . . . so they've marked up the Gen1 drives to $360 (from $230):
    http://www.newegg.com/Product/Product.aspx?Item=N8...">http://www.newegg.com/Product/Product.aspx?Item=N8...

    Unbelievable.
  • gfody - Wednesday, September 2, 2009 - link

    What happened to the gen2 160gb on Newegg? For a month the ETA was 9/2 (today) and now it's as if they never had it in the first place. The product page has been removed.

    It's like Newegg are holding the gen2 drives hostage until we buy out their remaining stock of gen1 drives.
  • iwodo - Tuesday, September 1, 2009 - link

    I think it acts as a good summary. However someone wrote last time about Intel drive handling Random Read / Write extremely poorly during Sequential Read / Write.

    Has Aanand investigate yet?

    I am hoping next Gen Intel SSD coming in Q2 10 will bring some substantial improvement.
  • statik213 - Tuesday, September 1, 2009 - link

    Does the RAID controller propagate TRIM commands to the SSD? Or will having RAID negate TRIM?
  • justaviking - Tuesday, September 1, 2009 - link

    Another great article, Anand! Thanks, and keep them coming.

    If this has already been discussed, I apologize. I'm still exhausted from reading the wonderful article, and have not read all 17 pages of comments.

    On PAGE 3, it talks about the trade-off of larger vs. smaller pages.

    I wonder if it would be feasible to make a hybrid drive, with a portion of the drive using small pages for faster performance when writing small files, and the majority of it being larger pages to keep the management of the drive reasonable.

    Any file could be written anywhere, but the controller would bias small writes to the small pages, and large writes to large files.

    Externally it would appear as a single drive, of course, but deep down in the internals, it would essentially be two drives. Each of the two portions would be tuned for maximum performance in different areas, but able to serve as backup or overflow if the other portion became full or ever got written to too many times.

    Interesting concept? Or a hair brained idea buy an ignorant amateur?
  • CList - Tuesday, September 1, 2009 - link

    Great article, wonderful to see insightful, in depth analysis.

    I'd be curious to hear anyone's thoughts on the implications are of running virtual hard disk files on SSD's. I do a lot of work these days on virtual machines, and I'd love to get them feeling more snappy - especially on my laptop which is limited to 4GB of ram.

    For example;
    What would the constant updates of those vmdk (or "vhd") files do to the disk's lifespan?

    If the OS hosting the VM is windows 7, but the virtual machine is WinServer2003 will the TRIM command be used properly?

    Cheers,
    CList
  • pcfxer - Tuesday, September 1, 2009 - link

    Great article!

    "It seems that building Pidgin is more CPU than IO bound.."

    Obviously, Mr. Anand doesnt' understand how compilers work ;). Compilers will always be CPU and memory bound, reduce your memory in the computer to say 256MB (or lower) and you'll see what I mean. The levels of recursion necessary to follow the production (grammars that define the language) use up memory but would rarely use the drive unless the OS had terrible resource management. :0.
  • CMGuy - Wednesday, September 2, 2009 - link

    While I can't comment on the specifics of software compilers I know that faster disk IO makes a big difference when your performing a full build (compilation and packaging) of software.
    IDEs these days spend a lot their time reading/writing small files (thats a lot of small, random, disk IO) and a good SSD can make a huge difference to this.

Log in

Don't have an account? Sign up now