Back to home page

OSCL-LXR

 
 

    


0001 Notes on Filesystem Layout
0002 --------------------------
0003 
0004 These notes describe what mkcramfs generates.  Kernel requirements are
0005 a bit looser, e.g. it doesn't care if the <file_data> items are
0006 swapped around (though it does care that directory entries (inodes) in
0007 a given directory are contiguous, as this is used by readdir).
0008 
0009 All data is currently in host-endian format; neither mkcramfs nor the
0010 kernel ever do swabbing.  (See section `Block Size' below.)
0011 
0012 <filesystem>:
0013         <superblock>
0014         <directory_structure>
0015         <data>
0016 
0017 <superblock>: struct cramfs_super (see cramfs_fs.h).
0018 
0019 <directory_structure>:
0020         For each file:
0021                 struct cramfs_inode (see cramfs_fs.h).
0022                 Filename.  Not generally null-terminated, but it is
0023                  null-padded to a multiple of 4 bytes.
0024 
0025 The order of inode traversal is described as "width-first" (not to be
0026 confused with breadth-first); i.e. like depth-first but listing all of
0027 a directory's entries before recursing down its subdirectories: the
0028 same order as `ls -AUR' (but without the /^\..*:$/ directory header
0029 lines); put another way, the same order as `find -type d -exec
0030 ls -AU1 {} \;'.
0031 
0032 Beginning in 2.4.7, directory entries are sorted.  This optimization
0033 allows cramfs_lookup to return more quickly when a filename does not
0034 exist, speeds up user-space directory sorts, etc.
0035 
0036 <data>:
0037         One <file_data> for each file that's either a symlink or a
0038          regular file of non-zero st_size.
0039 
0040 <file_data>:
0041         nblocks * <block_pointer>
0042          (where nblocks = (st_size - 1) / blksize + 1)
0043         nblocks * <block>
0044         padding to multiple of 4 bytes
0045 
0046 The i'th <block_pointer> for a file stores the byte offset of the
0047 *end* of the i'th <block> (i.e. one past the last byte, which is the
0048 same as the start of the (i+1)'th <block> if there is one).  The first
0049 <block> immediately follows the last <block_pointer> for the file.
0050 <block_pointer>s are each 32 bits long.
0051 
0052 When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each
0053 <block_pointer>'s top bits may contain special flags as follows:
0054 
0055 CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31):
0056         The block data is not compressed and should be copied verbatim.
0057 
0058 CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30):
0059         The <block_pointer> stores the actual block start offset and not
0060         its end, shifted right by 2 bits. The block must therefore be
0061         aligned to a 4-byte boundary. The block size is either blksize
0062         if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise
0063         the compressed data length is included in the first 2 bytes of
0064         the block data. This is used to allow discontiguous data layout
0065         and specific data block alignments e.g. for XIP applications.
0066 
0067 
0068 The order of <file_data>'s is a depth-first descent of the directory
0069 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
0070 -print'.
0071 
0072 
0073 <block>: The i'th <block> is the output of zlib's compress function
0074 applied to the i'th blksize-sized chunk of the input data if the
0075 corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set,
0076 otherwise it is the input data directly.
0077 (For the last <block> of the file, the input may of course be smaller.)
0078 Each <block> may be a different size.  (See <block_pointer> above.)
0079 
0080 <block>s are merely byte-aligned, not generally u32-aligned.
0081 
0082 When CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding
0083 <block> may be located anywhere and not necessarily contiguous with
0084 the previous/next blocks. In that case it is minimally u32-aligned.
0085 If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always
0086 blksize except for the last block which is limited by the file length.
0087 If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED
0088 is not set then the first 2 bytes of the block contains the size of the
0089 remaining block data as this cannot be determined from the placement of
0090 logically adjacent blocks.
0091 
0092 
0093 Holes
0094 -----
0095 
0096 This kernel supports cramfs holes (i.e. [efficient representation of]
0097 blocks in uncompressed data consisting entirely of NUL bytes), but by
0098 default mkcramfs doesn't test for & create holes, since cramfs in
0099 kernels up to at least 2.3.39 didn't support holes.  Run mkcramfs
0100 with -z if you want it to create files that can have holes in them.
0101 
0102 
0103 Tools
0104 -----
0105 
0106 The cramfs user-space tools, including mkcramfs and cramfsck, are
0107 located at <http://sourceforge.net/projects/cramfs/>.
0108 
0109 
0110 Future Development
0111 ==================
0112 
0113 Block Size
0114 ----------
0115 
0116 (Block size in cramfs refers to the size of input data that is
0117 compressed at a time.  It's intended to be somewhere around
0118 PAGE_SIZE for cramfs_read_folio's convenience.)
0119 
0120 The superblock ought to indicate the block size that the fs was
0121 written for, since comments in <linux/pagemap.h> indicate that
0122 PAGE_SIZE may grow in future (if I interpret the comment
0123 correctly).
0124 
0125 Currently, mkcramfs #define's PAGE_SIZE as 4096 and uses that
0126 for blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in
0127 turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
0128 This discrepancy is a bug, though it's not clear which should be
0129 changed.
0130 
0131 One option is to change mkcramfs to take its PAGE_SIZE from
0132 <asm/page.h>.  Personally I don't like this option, but it does
0133 require the least amount of change: just change `#define
0134 PAGE_SIZE (4096)' to `#include <asm/page.h>'.  The disadvantage
0135 is that the generated cramfs cannot always be shared between different
0136 kernels, not even necessarily kernels of the same architecture if
0137 PAGE_SIZE is subject to change between kernel versions
0138 (currently possible with arm and ia64).
0139 
0140 The remaining options try to make cramfs more sharable.
0141 
0142 One part of that is addressing endianness.  The two options here are
0143 `always use little-endian' (like ext2fs) or `writer chooses
0144 endianness; kernel adapts at runtime'.  Little-endian wins because of
0145 code simplicity and little CPU overhead even on big-endian machines.
0146 
0147 The cost of swabbing is changing the code to use the le32_to_cpu
0148 etc. macros as used by ext2fs.  We don't need to swab the compressed
0149 data, only the superblock, inodes and block pointers.
0150 
0151 
0152 The other part of making cramfs more sharable is choosing a block
0153 size.  The options are:
0154 
0155   1. Always 4096 bytes.
0156 
0157   2. Writer chooses blocksize; kernel adapts but rejects blocksize >
0158      PAGE_SIZE.
0159 
0160   3. Writer chooses blocksize; kernel adapts even to blocksize >
0161      PAGE_SIZE.
0162 
0163 It's easy enough to change the kernel to use a smaller value than
0164 PAGE_SIZE: just make cramfs_read_folio read multiple blocks.
0165 
0166 The cost of option 1 is that kernels with a larger PAGE_SIZE
0167 value don't get as good compression as they can.
0168 
0169 The cost of option 2 relative to option 1 is that the code uses
0170 variables instead of #define'd constants.  The gain is that people
0171 with kernels having larger PAGE_SIZE can make use of that if
0172 they don't mind their cramfs being inaccessible to kernels with
0173 smaller PAGE_SIZE values.
0174 
0175 Option 3 is easy to implement if we don't mind being CPU-inefficient:
0176 e.g. get read_folio to decompress to a buffer of size MAX_BLKSIZE (which
0177 must be no larger than 32KB) and discard what it doesn't need.
0178 Getting read_folio to read into all the covered pages is harder.
0179 
0180 The main advantage of option 3 over 1, 2, is better compression.  The
0181 cost is greater complexity.  Probably not worth it, but I hope someone
0182 will disagree.  (If it is implemented, then I'll re-use that code in
0183 e2compr.)
0184 
0185 
0186 Another cost of 2 and 3 over 1 is making mkcramfs use a different
0187 block size, but that just means adding and parsing a -b option.
0188 
0189 
0190 Inode Size
0191 ----------
0192 
0193 Given that cramfs will probably be used for CDs etc. as well as just
0194 silicon ROMs, it might make sense to expand the inode a little from
0195 its current 12 bytes.  Inodes other than the root inode are followed
0196 by filename, so the expansion doesn't even have to be a multiple of 4
0197 bytes.