0001 Notes on Filesystem Layout
0002 --------------------------
0003
0004 These notes describe what mkcramfs generates. Kernel requirements are
0005 a bit looser, e.g. it doesn't care if the <file_data> items are
0006 swapped around (though it does care that directory entries (inodes) in
0007 a given directory are contiguous, as this is used by readdir).
0008
0009 All data is currently in host-endian format; neither mkcramfs nor the
0010 kernel ever do swabbing. (See section `Block Size' below.)
0011
0012 <filesystem>:
0013 <superblock>
0014 <directory_structure>
0015 <data>
0016
0017 <superblock>: struct cramfs_super (see cramfs_fs.h).
0018
0019 <directory_structure>:
0020 For each file:
0021 struct cramfs_inode (see cramfs_fs.h).
0022 Filename. Not generally null-terminated, but it is
0023 null-padded to a multiple of 4 bytes.
0024
0025 The order of inode traversal is described as "width-first" (not to be
0026 confused with breadth-first); i.e. like depth-first but listing all of
0027 a directory's entries before recursing down its subdirectories: the
0028 same order as `ls -AUR' (but without the /^\..*:$/ directory header
0029 lines); put another way, the same order as `find -type d -exec
0030 ls -AU1 {} \;'.
0031
0032 Beginning in 2.4.7, directory entries are sorted. This optimization
0033 allows cramfs_lookup to return more quickly when a filename does not
0034 exist, speeds up user-space directory sorts, etc.
0035
0036 <data>:
0037 One <file_data> for each file that's either a symlink or a
0038 regular file of non-zero st_size.
0039
0040 <file_data>:
0041 nblocks * <block_pointer>
0042 (where nblocks = (st_size - 1) / blksize + 1)
0043 nblocks * <block>
0044 padding to multiple of 4 bytes
0045
0046 The i'th <block_pointer> for a file stores the byte offset of the
0047 *end* of the i'th <block> (i.e. one past the last byte, which is the
0048 same as the start of the (i+1)'th <block> if there is one). The first
0049 <block> immediately follows the last <block_pointer> for the file.
0050 <block_pointer>s are each 32 bits long.
0051
0052 When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each
0053 <block_pointer>'s top bits may contain special flags as follows:
0054
0055 CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31):
0056 The block data is not compressed and should be copied verbatim.
0057
0058 CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30):
0059 The <block_pointer> stores the actual block start offset and not
0060 its end, shifted right by 2 bits. The block must therefore be
0061 aligned to a 4-byte boundary. The block size is either blksize
0062 if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise
0063 the compressed data length is included in the first 2 bytes of
0064 the block data. This is used to allow discontiguous data layout
0065 and specific data block alignments e.g. for XIP applications.
0066
0067
0068 The order of <file_data>'s is a depth-first descent of the directory
0069 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
0070 -print'.
0071
0072
0073 <block>: The i'th <block> is the output of zlib's compress function
0074 applied to the i'th blksize-sized chunk of the input data if the
0075 corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set,
0076 otherwise it is the input data directly.
0077 (For the last <block> of the file, the input may of course be smaller.)
0078 Each <block> may be a different size. (See <block_pointer> above.)
0079
0080 <block>s are merely byte-aligned, not generally u32-aligned.
0081
0082 When CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding
0083 <block> may be located anywhere and not necessarily contiguous with
0084 the previous/next blocks. In that case it is minimally u32-aligned.
0085 If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always
0086 blksize except for the last block which is limited by the file length.
0087 If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED
0088 is not set then the first 2 bytes of the block contains the size of the
0089 remaining block data as this cannot be determined from the placement of
0090 logically adjacent blocks.
0091
0092
0093 Holes
0094 -----
0095
0096 This kernel supports cramfs holes (i.e. [efficient representation of]
0097 blocks in uncompressed data consisting entirely of NUL bytes), but by
0098 default mkcramfs doesn't test for & create holes, since cramfs in
0099 kernels up to at least 2.3.39 didn't support holes. Run mkcramfs
0100 with -z if you want it to create files that can have holes in them.
0101
0102
0103 Tools
0104 -----
0105
0106 The cramfs user-space tools, including mkcramfs and cramfsck, are
0107 located at <http://sourceforge.net/projects/cramfs/>.
0108
0109
0110 Future Development
0111 ==================
0112
0113 Block Size
0114 ----------
0115
0116 (Block size in cramfs refers to the size of input data that is
0117 compressed at a time. It's intended to be somewhere around
0118 PAGE_SIZE for cramfs_read_folio's convenience.)
0119
0120 The superblock ought to indicate the block size that the fs was
0121 written for, since comments in <linux/pagemap.h> indicate that
0122 PAGE_SIZE may grow in future (if I interpret the comment
0123 correctly).
0124
0125 Currently, mkcramfs #define's PAGE_SIZE as 4096 and uses that
0126 for blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in
0127 turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
0128 This discrepancy is a bug, though it's not clear which should be
0129 changed.
0130
0131 One option is to change mkcramfs to take its PAGE_SIZE from
0132 <asm/page.h>. Personally I don't like this option, but it does
0133 require the least amount of change: just change `#define
0134 PAGE_SIZE (4096)' to `#include <asm/page.h>'. The disadvantage
0135 is that the generated cramfs cannot always be shared between different
0136 kernels, not even necessarily kernels of the same architecture if
0137 PAGE_SIZE is subject to change between kernel versions
0138 (currently possible with arm and ia64).
0139
0140 The remaining options try to make cramfs more sharable.
0141
0142 One part of that is addressing endianness. The two options here are
0143 `always use little-endian' (like ext2fs) or `writer chooses
0144 endianness; kernel adapts at runtime'. Little-endian wins because of
0145 code simplicity and little CPU overhead even on big-endian machines.
0146
0147 The cost of swabbing is changing the code to use the le32_to_cpu
0148 etc. macros as used by ext2fs. We don't need to swab the compressed
0149 data, only the superblock, inodes and block pointers.
0150
0151
0152 The other part of making cramfs more sharable is choosing a block
0153 size. The options are:
0154
0155 1. Always 4096 bytes.
0156
0157 2. Writer chooses blocksize; kernel adapts but rejects blocksize >
0158 PAGE_SIZE.
0159
0160 3. Writer chooses blocksize; kernel adapts even to blocksize >
0161 PAGE_SIZE.
0162
0163 It's easy enough to change the kernel to use a smaller value than
0164 PAGE_SIZE: just make cramfs_read_folio read multiple blocks.
0165
0166 The cost of option 1 is that kernels with a larger PAGE_SIZE
0167 value don't get as good compression as they can.
0168
0169 The cost of option 2 relative to option 1 is that the code uses
0170 variables instead of #define'd constants. The gain is that people
0171 with kernels having larger PAGE_SIZE can make use of that if
0172 they don't mind their cramfs being inaccessible to kernels with
0173 smaller PAGE_SIZE values.
0174
0175 Option 3 is easy to implement if we don't mind being CPU-inefficient:
0176 e.g. get read_folio to decompress to a buffer of size MAX_BLKSIZE (which
0177 must be no larger than 32KB) and discard what it doesn't need.
0178 Getting read_folio to read into all the covered pages is harder.
0179
0180 The main advantage of option 3 over 1, 2, is better compression. The
0181 cost is greater complexity. Probably not worth it, but I hope someone
0182 will disagree. (If it is implemented, then I'll re-use that code in
0183 e2compr.)
0184
0185
0186 Another cost of 2 and 3 over 1 is making mkcramfs use a different
0187 block size, but that just means adding and parsing a -b option.
0188
0189
0190 Inode Size
0191 ----------
0192
0193 Given that cramfs will probably be used for CDs etc. as well as just
0194 silicon ROMs, it might make sense to expand the inode a little from
0195 its current 12 bytes. Inodes other than the root inode are followed
0196 by filename, so the expansion doesn't even have to be a multiple of 4
0197 bytes.