Live Long and Prosper: The Logical Page

Computers are all about abstraction. In the early days of computing you had to write assembly code to get your hardware to do anything. Programming languages like C and C++ created a layer of abstraction between the programmer and the hardware, simplifying the development process. The key word there is simplification. You can be more efficient writing directly for the hardware, but it’s far simpler (and much more manageable) to write high level code and let a compiler optimize it.

The same principles apply within SSDs.

The smallest writable location in NAND flash is a page; that doesn’t mean that it’s the largest size a controller can choose to write. Today I’d like to introduce the concept of a logical page, an abstraction of a physical page in NAND flash.

Confused? Let’s start with a (hopefully, I'm no artist) helpful diagram:

On one side of the fence we have how the software views storage: as a long list of logical block addresses. It’s a bit more complicated than that since a traditional hard drive is faster at certain LBAs than others but to keep things simple we’ll ignore that.

On the other side we have how NAND flash stores data, in groups of cells called pages. These days a 4KB page size is common.

In reality there’s no fence that separates the two, rather a lot of logic, several busses and eventually the SSD controller. The latter determines how the LBAs map to the NAND flash pages.

The most straightforward way for the controller to write to flash is by writing in pages. In that case the logical page size would equal the physical page size.

Unfortunately, there’s a huge downside to this approach: tracking overhead. If your logical page size is 4KB then an 80GB drive will have no less than twenty million logical pages to keep track of (20,971,520 to be exact). You need a fast controller to sort through and deal with that many pages, a lot of storage to keep tables in and larger caches/buffers.

The benefit of this approach however is very high 4KB write performance. If the majority of your writes are 4KB in size, this approach will yield the best performance.

If you don’t have the expertise, time or support structure to make a big honkin controller that can handle page level mapping, you go to a larger logical page size. One such example would involve making your logical page equal to an erase block (128 x 4KB pages). This significantly reduces the number of pages you need to track and optimize around; instead of 20.9 million entries, you now have approximately 163 thousand. All of your controller’s internal structures shrink in size and you don’t need as powerful of a microprocessor inside the controller.

The benefit of this approach is very high large file sequential write performance. If you’re streaming large chunks of data, having big logical pages will be optimal. You’ll find that most flash controllers that come from the digital camera space are optimized for this sort of access pattern where you’re writing 2MB - 12MB images all the time.

Unfortunately, the sequential write performance comes at the expense of poor small file write speed. Remember that writing to MLC NAND flash already takes 3x as long as reading, but writing small files when your controller needs large ones worsens the penalty. If you want to write an 8KB file, the controller will need to write 512KB (in this case) of data since that’s the smallest size it knows to write. Write amplification goes up considerably.

Remember the first OCZ Vertex drive based on the Indilinx Barefoot controller? Its logical page size was equal to a 512KB block. OCZ asked for a firmware that enabled page level mapping and Indilinx responded. The result was much improved 4KB write performance:

Iometer 4KB Random Writes, IOqueue=1, 8GB sector space Logical Block Size = 128 pages Logical Block Size = 1 Page
Pre-Release OCZ Vertex 0.08 MB/s 8.2 MB/s

A Quick Flash Refresher The Cleaning Lady and Write Amplification
Comments Locked

295 Comments

View All Comments

  • GourdFreeMan - Tuesday, September 1, 2009 - link

    Yes, rewriting a cell will refill the floating gate with trapped electrons to the proper voltage level unless the gate has begun to wear out, so backing up your data, secure erasing your drive and copying the data back will preserve the life (within reason) of even drives that use minimalistic wear leveling to safeguard data. Charge retention is only a problem for users if they intend to use the drive for archival storage, or operate the drive at highly elevated temperatures.

    It is a bigger problem for flash engineers, however, and one of the reasons why MLC cannot be moved easily to more bits per cell without design changes. To store n-bits in a single cell you need 2^n separate energy levels to represent them, and thus each bit is only has approximately 1/(2^(n-1)) the amount of energy difference between states when compared to SLC using similar designs and materials.
  • Zheos - Tuesday, September 1, 2009 - link

    Man you seem to know a lot about what you're talking about :)

    Yeah now i understand why SSD for database and file storage server would be quite a bad idea.

    But for personal windows & everyday application storage, seems like a pure win to me if you can afford one :)

    I was only worried about its life-span but thankx to you and you're quick replys (and for the maths and technical stuff about how it realy work ;) im sold on the fact that i will buy one soon.

    The G2 from Intel seems like the best choice for now but I'll just wait and see how it's going when TRIM will become almost enable on every SSD and i'll make my decision there in a couple of months =)


  • GourdFreeMan - Wednesday, September 2, 2009 - link

    It isn't so much that SSDs make a bad storage server, but rather that you can't neglect to make periodic backups, as with any type of storage, if your data has great monetary or sentimental value. In addition to backups, RAID (1-6) is also an option if cost is no object and you want to use SSDs for long term storage in a running server. Database servers are a little more complicated, but SSDs can be an intelligent choice there as well if your usage patterns aren't continuous heavy small (i.e. <= 4K) writes.

    I plan on getting a G2 myself for my laptop after Intel updates the firmware to support TRIM and Anand reviews the effects in Windows 7, and I have already been using an Indilinx-based SLC drive in my home server.

    If you do anything that stresses your hard drive(s), or just like snappy boot times and application load times you will probably be impressed by the speeds of a new SSD. The cost per GB and lack of long term reliability studies are really the only things holding them back from taking the storage market by storm now.
  • ninevoltz - Thursday, September 17, 2009 - link

    GourdFreeMan could you please continue your explanation? I would like to learn more. You have really dived deeply into the physical properties of these drives.
  • GourdFreeMan - Tuesday, September 1, 2009 - link

    Minor correction to the second paragraph in my post above -- "each bit is only has" should read "each representation only has" in the last sentence.
  • philosofool - Monday, August 31, 2009 - link

    Nice job. This has been a great series.

    I'm getting a SSD once I can get one at $1/GB. I want a system/program files drive of at least 80GB and then a conventional HDD (a tenth of the cost/GB) for user data.

    Would keeping user data on a conventional HDD affect these results? It would seem like it wouldn't, but I would like to see the evidence.

    I would really like to see more benchmarks for these drives that aren't synthetic. Have you tried things like Crysis or The Witcher load times? (Both seemed to me to have pretty slow loads for maps.) I don't know if these would be affected, but as real world applications, I think it makes sense to try them out.
  • Anand Lal Shimpi - Monday, August 31, 2009 - link

    Personally I keep docs on my SSD but I keep pictures/music on a hard drive. Neither gets touched all that often in the grand scheme of things, but one is a lot smaller :)

    In The SSD Anthology I looked at Crysis load times. Performance didn't really improve when going to an SSD.

    Take care,
    Anand
  • Eeqmcsq - Monday, August 31, 2009 - link

    I would have thought that the read speed of an SSD would have helped cut down some of the compile time. Is there any tool that lets you analyze disk usage vs cpu usage during the compile time, to see what percentage of the compile was spent reading/writing to disk vs CPU processing?

    Is there any way you can add a temperature test between an HDD and an SSD? I read a couple of Newegg reviews that say their SSDs got HOT after use, though I think that may have just been 1 particular brand that I don't remember. Also, there was at least one article online that tested an SSD vs an HDD and the SSD ran a little warmer than the HDD.

    Also, garbage collection does have one advantage: It's OS independent. I'm still using Ubuntu 8.04 at work, and I'm stuck on 8.04 because my development environment WORKS, and I won't risk upgrading and destabilizing it. A garbage collecting SSD would certainly be helpful for my system... though your compiling tests are now swaying me against an SSD upgrade. Doh!

    And just for fun, have you thought about running some of your benchmarks on a RAM drive? I'd like to see how far SSDs and SATA have to go before matching the speed of RAM.

    Finally, any word from JMicron and their supposed update to the much "loved" JMF602 controller? I'd like to see some non-stuttering cheapo SSDs enter the market and really bring the $$$/GB down, like the Kingston V-series. Also, I'd like to see a refresh in the PATA SSD market.

    "Am I relieved to be done with this article? You betcha." And I give you a great THANK YOU!!! for spending the time working on it. As usual, it was a great read.
  • Per Hansson - Monday, August 31, 2009 - link

    Photofast have released Indilinx based PATA drives;
    http://www.photofastuk.com/engine/shop/category/G-...">http://www.photofastuk.com/engine/shop/category/G-...
  • aggressor - Monday, August 31, 2009 - link

    What ever happened to the price drops that OCZ announced when the Intel G2 drives came out? I want 128GB for $280!

Log in

Don't have an account? Sign up now