0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 Directory Entries
0004 -----------------
0005
0006 In an ext4 filesystem, a directory is more or less a flat file that maps
0007 an arbitrary byte string (usually ASCII) to an inode number on the
0008 filesystem. There can be many directory entries across the filesystem
0009 that reference the same inode number--these are known as hard links, and
0010 that is why hard links cannot reference files on other filesystems. As
0011 such, directory entries are found by reading the data block(s)
0012 associated with a directory file for the particular directory entry that
0013 is desired.
0014
0015 Linear (Classic) Directories
0016 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0017
0018 By default, each directory lists its entries in an “almost-linear”
0019 array. I write “almost” because it's not a linear array in the memory
0020 sense because directory entries are not split across filesystem blocks.
0021 Therefore, it is more accurate to say that a directory is a series of
0022 data blocks and that each block contains a linear array of directory
0023 entries. The end of each per-block array is signified by reaching the
0024 end of the block; the last entry in the block has a record length that
0025 takes it all the way to the end of the block. The end of the entire
0026 directory is of course signified by reaching the end of the file. Unused
0027 directory entries are signified by inode = 0. By default the filesystem
0028 uses ``struct ext4_dir_entry_2`` for directory entries unless the
0029 “filetype” feature flag is not set, in which case it uses
0030 ``struct ext4_dir_entry``.
0031
0032 The original directory entry format is ``struct ext4_dir_entry``, which
0033 is at most 263 bytes long, though on disk you'll need to reference
0034 ``dirent.rec_len`` to know for sure.
0035
0036 .. list-table::
0037 :widths: 8 8 24 40
0038 :header-rows: 1
0039
0040 * - Offset
0041 - Size
0042 - Name
0043 - Description
0044 * - 0x0
0045 - __le32
0046 - inode
0047 - Number of the inode that this directory entry points to.
0048 * - 0x4
0049 - __le16
0050 - rec_len
0051 - Length of this directory entry. Must be a multiple of 4.
0052 * - 0x6
0053 - __le16
0054 - name_len
0055 - Length of the file name.
0056 * - 0x8
0057 - char
0058 - name[EXT4_NAME_LEN]
0059 - File name.
0060
0061 Since file names cannot be longer than 255 bytes, the new directory
0062 entry format shortens the name_len field and uses the space for a file
0063 type flag, probably to avoid having to load every inode during directory
0064 tree traversal. This format is ``ext4_dir_entry_2``, which is at most
0065 263 bytes long, though on disk you'll need to reference
0066 ``dirent.rec_len`` to know for sure.
0067
0068 .. list-table::
0069 :widths: 8 8 24 40
0070 :header-rows: 1
0071
0072 * - Offset
0073 - Size
0074 - Name
0075 - Description
0076 * - 0x0
0077 - __le32
0078 - inode
0079 - Number of the inode that this directory entry points to.
0080 * - 0x4
0081 - __le16
0082 - rec_len
0083 - Length of this directory entry.
0084 * - 0x6
0085 - __u8
0086 - name_len
0087 - Length of the file name.
0088 * - 0x7
0089 - __u8
0090 - file_type
0091 - File type code, see ftype_ table below.
0092 * - 0x8
0093 - char
0094 - name[EXT4_NAME_LEN]
0095 - File name.
0096
0097 .. _ftype:
0098
0099 The directory file type is one of the following values:
0100
0101 .. list-table::
0102 :widths: 16 64
0103 :header-rows: 1
0104
0105 * - Value
0106 - Description
0107 * - 0x0
0108 - Unknown.
0109 * - 0x1
0110 - Regular file.
0111 * - 0x2
0112 - Directory.
0113 * - 0x3
0114 - Character device file.
0115 * - 0x4
0116 - Block device file.
0117 * - 0x5
0118 - FIFO.
0119 * - 0x6
0120 - Socket.
0121 * - 0x7
0122 - Symbolic link.
0123
0124 To support directories that are both encrypted and casefolded directories, we
0125 must also include hash information in the directory entry. We append
0126 ``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries
0127 for dot and dotdot, which are kept the same. The structure follows immediately
0128 after ``name`` and is included in the size listed by ``rec_len`` If a directory
0129 entry uses this extension, it may be up to 271 bytes.
0130
0131 .. list-table::
0132 :widths: 8 8 24 40
0133 :header-rows: 1
0134
0135 * - Offset
0136 - Size
0137 - Name
0138 - Description
0139 * - 0x0
0140 - __le32
0141 - hash
0142 - The hash of the directory name
0143 * - 0x4
0144 - __le32
0145 - minor_hash
0146 - The minor hash of the directory name
0147
0148
0149 In order to add checksums to these classic directory blocks, a phony
0150 ``struct ext4_dir_entry`` is placed at the end of each leaf block to
0151 hold the checksum. The directory entry is 12 bytes long. The inode
0152 number and name_len fields are set to zero to fool old software into
0153 ignoring an apparently empty directory entry, and the checksum is stored
0154 in the place where the name normally goes. The structure is
0155 ``struct ext4_dir_entry_tail``:
0156
0157 .. list-table::
0158 :widths: 8 8 24 40
0159 :header-rows: 1
0160
0161 * - Offset
0162 - Size
0163 - Name
0164 - Description
0165 * - 0x0
0166 - __le32
0167 - det_reserved_zero1
0168 - Inode number, which must be zero.
0169 * - 0x4
0170 - __le16
0171 - det_rec_len
0172 - Length of this directory entry, which must be 12.
0173 * - 0x6
0174 - __u8
0175 - det_reserved_zero2
0176 - Length of the file name, which must be zero.
0177 * - 0x7
0178 - __u8
0179 - det_reserved_ft
0180 - File type, which must be 0xDE.
0181 * - 0x8
0182 - __le32
0183 - det_checksum
0184 - Directory leaf block checksum.
0185
0186 The leaf directory block checksum is calculated against the FS UUID, the
0187 directory's inode number, the directory's inode generation number, and
0188 the entire directory entry block up to (but not including) the fake
0189 directory entry.
0190
0191 Hash Tree Directories
0192 ~~~~~~~~~~~~~~~~~~~~~
0193
0194 A linear array of directory entries isn't great for performance, so a
0195 new feature was added to ext3 to provide a faster (but peculiar)
0196 balanced tree keyed off a hash of the directory entry name. If the
0197 EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
0198 hashed btree (htree) to organize and find directory entries. For
0199 backwards read-only compatibility with ext2, this tree is actually
0200 hidden inside the directory file, masquerading as “empty” directory data
0201 blocks! It was stated previously that the end of the linear directory
0202 entry table was signified with an entry pointing to inode 0; this is
0203 (ab)used to fool the old linear-scan algorithm into thinking that the
0204 rest of the directory block is empty so that it moves on.
0205
0206 The root of the tree always lives in the first data block of the
0207 directory. By ext2 custom, the '.' and '..' entries must appear at the
0208 beginning of this first block, so they are put here as two
0209 ``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of
0210 the root node contains metadata about the tree and finally a hash->block
0211 map to find nodes that are lower in the htree. If
0212 ``dx_root.info.indirect_levels`` is non-zero then the htree has two
0213 levels; the data block pointed to by the root node's map is an interior
0214 node, which is indexed by a minor hash. Interior nodes in this tree
0215 contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
0216 minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
0217 array of all ``struct ext4_dir_entry_2``; all of these entries
0218 (presumably) hash to the same value. If there is an overflow, the
0219 entries simply overflow into the next leaf node, and the
0220 least-significant bit of the hash (in the interior node map) that gets
0221 us to this next leaf node is set.
0222
0223 To traverse the directory as a htree, the code calculates the hash of
0224 the desired file name and uses it to find the corresponding block
0225 number. If the tree is flat, the block is a linear array of directory
0226 entries that can be searched; otherwise, the minor hash of the file name
0227 is computed and used against this second block to find the corresponding
0228 third block number. That third block number will be a linear array of
0229 directory entries.
0230
0231 To traverse the directory as a linear array (such as the old code does),
0232 the code simply reads every data block in the directory. The blocks used
0233 for the htree will appear to have no entries (aside from '.' and '..')
0234 and so only the leaf nodes will appear to have any interesting content.
0235
0236 The root of the htree is in ``struct dx_root``, which is the full length
0237 of a data block:
0238
0239 .. list-table::
0240 :widths: 8 8 24 40
0241 :header-rows: 1
0242
0243 * - Offset
0244 - Type
0245 - Name
0246 - Description
0247 * - 0x0
0248 - __le32
0249 - dot.inode
0250 - inode number of this directory.
0251 * - 0x4
0252 - __le16
0253 - dot.rec_len
0254 - Length of this record, 12.
0255 * - 0x6
0256 - u8
0257 - dot.name_len
0258 - Length of the name, 1.
0259 * - 0x7
0260 - u8
0261 - dot.file_type
0262 - File type of this entry, 0x2 (directory) (if the feature flag is set).
0263 * - 0x8
0264 - char
0265 - dot.name[4]
0266 - “.\0\0\0”
0267 * - 0xC
0268 - __le32
0269 - dotdot.inode
0270 - inode number of parent directory.
0271 * - 0x10
0272 - __le16
0273 - dotdot.rec_len
0274 - block_size - 12. The record length is long enough to cover all htree
0275 data.
0276 * - 0x12
0277 - u8
0278 - dotdot.name_len
0279 - Length of the name, 2.
0280 * - 0x13
0281 - u8
0282 - dotdot.file_type
0283 - File type of this entry, 0x2 (directory) (if the feature flag is set).
0284 * - 0x14
0285 - char
0286 - dotdot_name[4]
0287 - “..\0\0”
0288 * - 0x18
0289 - __le32
0290 - struct dx_root_info.reserved_zero
0291 - Zero.
0292 * - 0x1C
0293 - u8
0294 - struct dx_root_info.hash_version
0295 - Hash type, see dirhash_ table below.
0296 * - 0x1D
0297 - u8
0298 - struct dx_root_info.info_length
0299 - Length of the tree information, 0x8.
0300 * - 0x1E
0301 - u8
0302 - struct dx_root_info.indirect_levels
0303 - Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR
0304 feature is set; cannot be larger than 2 otherwise.
0305 * - 0x1F
0306 - u8
0307 - struct dx_root_info.unused_flags
0308 -
0309 * - 0x20
0310 - __le16
0311 - limit
0312 - Maximum number of dx_entries that can follow this header, plus 1 for
0313 the header itself.
0314 * - 0x22
0315 - __le16
0316 - count
0317 - Actual number of dx_entries that follow this header, plus 1 for the
0318 header itself.
0319 * - 0x24
0320 - __le32
0321 - block
0322 - The block number (within the directory file) that goes with hash=0.
0323 * - 0x28
0324 - struct dx_entry
0325 - entries[0]
0326 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
0327
0328 .. _dirhash:
0329
0330 The directory hash is one of the following values:
0331
0332 .. list-table::
0333 :widths: 16 64
0334 :header-rows: 1
0335
0336 * - Value
0337 - Description
0338 * - 0x0
0339 - Legacy.
0340 * - 0x1
0341 - Half MD4.
0342 * - 0x2
0343 - Tea.
0344 * - 0x3
0345 - Legacy, unsigned.
0346 * - 0x4
0347 - Half MD4, unsigned.
0348 * - 0x5
0349 - Tea, unsigned.
0350 * - 0x6
0351 - Siphash.
0352
0353 Interior nodes of an htree are recorded as ``struct dx_node``, which is
0354 also the full length of a data block:
0355
0356 .. list-table::
0357 :widths: 8 8 24 40
0358 :header-rows: 1
0359
0360 * - Offset
0361 - Type
0362 - Name
0363 - Description
0364 * - 0x0
0365 - __le32
0366 - fake.inode
0367 - Zero, to make it look like this entry is not in use.
0368 * - 0x4
0369 - __le16
0370 - fake.rec_len
0371 - The size of the block, in order to hide all of the dx_node data.
0372 * - 0x6
0373 - u8
0374 - name_len
0375 - Zero. There is no name for this “unused” directory entry.
0376 * - 0x7
0377 - u8
0378 - file_type
0379 - Zero. There is no file type for this “unused” directory entry.
0380 * - 0x8
0381 - __le16
0382 - limit
0383 - Maximum number of dx_entries that can follow this header, plus 1 for
0384 the header itself.
0385 * - 0xA
0386 - __le16
0387 - count
0388 - Actual number of dx_entries that follow this header, plus 1 for the
0389 header itself.
0390 * - 0xE
0391 - __le32
0392 - block
0393 - The block number (within the directory file) that goes with the lowest
0394 hash value of this block. This value is stored in the parent block.
0395 * - 0x12
0396 - struct dx_entry
0397 - entries[0]
0398 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
0399
0400 The hash maps that exist in both ``struct dx_root`` and
0401 ``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
0402 long:
0403
0404 .. list-table::
0405 :widths: 8 8 24 40
0406 :header-rows: 1
0407
0408 * - Offset
0409 - Type
0410 - Name
0411 - Description
0412 * - 0x0
0413 - __le32
0414 - hash
0415 - Hash code.
0416 * - 0x4
0417 - __le32
0418 - block
0419 - Block number (within the directory file, not filesystem blocks) of the
0420 next node in the htree.
0421
0422 (If you think this is all quite clever and peculiar, so does the
0423 author.)
0424
0425 If metadata checksums are enabled, the last 8 bytes of the directory
0426 block (precisely the length of one dx_entry) are used to store a
0427 ``struct dx_tail``, which contains the checksum. The ``limit`` and
0428 ``count`` entries in the dx_root/dx_node structures are adjusted as
0429 necessary to fit the dx_tail into the block. If there is no space for
0430 the dx_tail, the user is notified to run e2fsck -D to rebuild the
0431 directory index (which will ensure that there's space for the checksum.
0432 The dx_tail structure is 8 bytes long and looks like this:
0433
0434 .. list-table::
0435 :widths: 8 8 24 40
0436 :header-rows: 1
0437
0438 * - Offset
0439 - Type
0440 - Name
0441 - Description
0442 * - 0x0
0443 - u32
0444 - dt_reserved
0445 - Zero.
0446 * - 0x4
0447 - __le32
0448 - dt_checksum
0449 - Checksum of the htree directory block.
0450
0451 The checksum is calculated against the FS UUID, the htree index header
0452 (dx_root or dx_node), all of the htree indices (dx_entry) that are in
0453 use, and the tail block (dx_tail).