Back to home page

LXR

 
 

    


0001 Notes on Filesystem Layout
0002 --------------------------
0003 
0004 These notes describe what mkcramfs generates.  Kernel requirements are
0005 a bit looser, e.g. it doesn't care if the <file_data> items are
0006 swapped around (though it does care that directory entries (inodes) in
0007 a given directory are contiguous, as this is used by readdir).
0008 
0009 All data is currently in host-endian format; neither mkcramfs nor the
0010 kernel ever do swabbing.  (See section `Block Size' below.)
0011 
0012 <filesystem>:
0013         <superblock>
0014         <directory_structure>
0015         <data>
0016 
0017 <superblock>: struct cramfs_super (see cramfs_fs.h).
0018 
0019 <directory_structure>:
0020         For each file:
0021                 struct cramfs_inode (see cramfs_fs.h).
0022                 Filename.  Not generally null-terminated, but it is
0023                  null-padded to a multiple of 4 bytes.
0024 
0025 The order of inode traversal is described as "width-first" (not to be
0026 confused with breadth-first); i.e. like depth-first but listing all of
0027 a directory's entries before recursing down its subdirectories: the
0028 same order as `ls -AUR' (but without the /^\..*:$/ directory header
0029 lines); put another way, the same order as `find -type d -exec
0030 ls -AU1 {} \;'.
0031 
0032 Beginning in 2.4.7, directory entries are sorted.  This optimization
0033 allows cramfs_lookup to return more quickly when a filename does not
0034 exist, speeds up user-space directory sorts, etc.
0035 
0036 <data>:
0037         One <file_data> for each file that's either a symlink or a
0038          regular file of non-zero st_size.
0039 
0040 <file_data>:
0041         nblocks * <block_pointer>
0042          (where nblocks = (st_size - 1) / blksize + 1)
0043         nblocks * <block>
0044         padding to multiple of 4 bytes
0045 
0046 The i'th <block_pointer> for a file stores the byte offset of the
0047 *end* of the i'th <block> (i.e. one past the last byte, which is the
0048 same as the start of the (i+1)'th <block> if there is one).  The first
0049 <block> immediately follows the last <block_pointer> for the file.
0050 <block_pointer>s are each 32 bits long.
0051 
0052 The order of <file_data>'s is a depth-first descent of the directory
0053 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
0054 -print'.
0055 
0056 
0057 <block>: The i'th <block> is the output of zlib's compress function
0058 applied to the i'th blksize-sized chunk of the input data.
0059 (For the last <block> of the file, the input may of course be smaller.)
0060 Each <block> may be a different size.  (See <block_pointer> above.)
0061 <block>s are merely byte-aligned, not generally u32-aligned.
0062 
0063 
0064 Holes
0065 -----
0066 
0067 This kernel supports cramfs holes (i.e. [efficient representation of]
0068 blocks in uncompressed data consisting entirely of NUL bytes), but by
0069 default mkcramfs doesn't test for & create holes, since cramfs in
0070 kernels up to at least 2.3.39 didn't support holes.  Run mkcramfs
0071 with -z if you want it to create files that can have holes in them.
0072 
0073 
0074 Tools
0075 -----
0076 
0077 The cramfs user-space tools, including mkcramfs and cramfsck, are
0078 located at <http://sourceforge.net/projects/cramfs/>.
0079 
0080 
0081 Future Development
0082 ==================
0083 
0084 Block Size
0085 ----------
0086 
0087 (Block size in cramfs refers to the size of input data that is
0088 compressed at a time.  It's intended to be somewhere around
0089 PAGE_SIZE for cramfs_readpage's convenience.)
0090 
0091 The superblock ought to indicate the block size that the fs was
0092 written for, since comments in <linux/pagemap.h> indicate that
0093 PAGE_SIZE may grow in future (if I interpret the comment
0094 correctly).
0095 
0096 Currently, mkcramfs #define's PAGE_SIZE as 4096 and uses that
0097 for blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in
0098 turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
0099 This discrepancy is a bug, though it's not clear which should be
0100 changed.
0101 
0102 One option is to change mkcramfs to take its PAGE_SIZE from
0103 <asm/page.h>.  Personally I don't like this option, but it does
0104 require the least amount of change: just change `#define
0105 PAGE_SIZE (4096)' to `#include <asm/page.h>'.  The disadvantage
0106 is that the generated cramfs cannot always be shared between different
0107 kernels, not even necessarily kernels of the same architecture if
0108 PAGE_SIZE is subject to change between kernel versions
0109 (currently possible with arm and ia64).
0110 
0111 The remaining options try to make cramfs more sharable.
0112 
0113 One part of that is addressing endianness.  The two options here are
0114 `always use little-endian' (like ext2fs) or `writer chooses
0115 endianness; kernel adapts at runtime'.  Little-endian wins because of
0116 code simplicity and little CPU overhead even on big-endian machines.
0117 
0118 The cost of swabbing is changing the code to use the le32_to_cpu
0119 etc. macros as used by ext2fs.  We don't need to swab the compressed
0120 data, only the superblock, inodes and block pointers.
0121 
0122 
0123 The other part of making cramfs more sharable is choosing a block
0124 size.  The options are:
0125 
0126   1. Always 4096 bytes.
0127 
0128   2. Writer chooses blocksize; kernel adapts but rejects blocksize >
0129      PAGE_SIZE.
0130 
0131   3. Writer chooses blocksize; kernel adapts even to blocksize >
0132      PAGE_SIZE.
0133 
0134 It's easy enough to change the kernel to use a smaller value than
0135 PAGE_SIZE: just make cramfs_readpage read multiple blocks.
0136 
0137 The cost of option 1 is that kernels with a larger PAGE_SIZE
0138 value don't get as good compression as they can.
0139 
0140 The cost of option 2 relative to option 1 is that the code uses
0141 variables instead of #define'd constants.  The gain is that people
0142 with kernels having larger PAGE_SIZE can make use of that if
0143 they don't mind their cramfs being inaccessible to kernels with
0144 smaller PAGE_SIZE values.
0145 
0146 Option 3 is easy to implement if we don't mind being CPU-inefficient:
0147 e.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which
0148 must be no larger than 32KB) and discard what it doesn't need.
0149 Getting readpage to read into all the covered pages is harder.
0150 
0151 The main advantage of option 3 over 1, 2, is better compression.  The
0152 cost is greater complexity.  Probably not worth it, but I hope someone
0153 will disagree.  (If it is implemented, then I'll re-use that code in
0154 e2compr.)
0155 
0156 
0157 Another cost of 2 and 3 over 1 is making mkcramfs use a different
0158 block size, but that just means adding and parsing a -b option.
0159 
0160 
0161 Inode Size
0162 ----------
0163 
0164 Given that cramfs will probably be used for CDs etc. as well as just
0165 silicon ROMs, it might make sense to expand the inode a little from
0166 its current 12 bytes.  Inodes other than the root inode are followed
0167 by filename, so the expansion doesn't even have to be a multiple of 4
0168 bytes.