Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============================
0004 LC-trie implementation notes
0005 ============================
0006 
0007 Node types
0008 ----------
0009 leaf
0010         An end node with data. This has a copy of the relevant key, along
0011         with 'hlist' with routing table entries sorted by prefix length.
0012         See struct leaf and struct leaf_info.
0013 
0014 trie node or tnode
0015         An internal node, holding an array of child (leaf or tnode) pointers,
0016         indexed through a subset of the key. See Level Compression.
0017 
0018 A few concepts explained
0019 ------------------------
0020 Bits (tnode)
0021         The number of bits in the key segment used for indexing into the
0022         child array - the "child index". See Level Compression.
0023 
0024 Pos (tnode)
0025         The position (in the key) of the key segment used for indexing into
0026         the child array. See Path Compression.
0027 
0028 Path Compression / skipped bits
0029         Any given tnode is linked to from the child array of its parent, using
0030         a segment of the key specified by the parent's "pos" and "bits"
0031         In certain cases, this tnode's own "pos" will not be immediately
0032         adjacent to the parent (pos+bits), but there will be some bits
0033         in the key skipped over because they represent a single path with no
0034         deviations. These "skipped bits" constitute Path Compression.
0035         Note that the search algorithm will simply skip over these bits when
0036         searching, making it necessary to save the keys in the leaves to
0037         verify that they actually do match the key we are searching for.
0038 
0039 Level Compression / child arrays
0040         the trie is kept level balanced moving, under certain conditions, the
0041         children of a full child (see "full_children") up one level, so that
0042         instead of a pure binary tree, each internal node ("tnode") may
0043         contain an arbitrarily large array of links to several children.
0044         Conversely, a tnode with a mostly empty child array (see empty_children)
0045         may be "halved", having some of its children moved downwards one level,
0046         in order to avoid ever-increasing child arrays.
0047 
0048 empty_children
0049         the number of positions in the child array of a given tnode that are
0050         NULL.
0051 
0052 full_children
0053         the number of children of a given tnode that aren't path compressed.
0054         (in other words, they aren't NULL or leaves and their "pos" is equal
0055         to this tnode's "pos"+"bits").
0056 
0057         (The word "full" here is used more in the sense of "complete" than
0058         as the opposite of "empty", which might be a tad confusing.)
0059 
0060 Comments
0061 ---------
0062 
0063 We have tried to keep the structure of the code as close to fib_hash as
0064 possible to allow verification and help up reviewing.
0065 
0066 fib_find_node()
0067         A good start for understanding this code. This function implements a
0068         straightforward trie lookup.
0069 
0070 fib_insert_node()
0071         Inserts a new leaf node in the trie. This is bit more complicated than
0072         fib_find_node(). Inserting a new node means we might have to run the
0073         level compression algorithm on part of the trie.
0074 
0075 trie_leaf_remove()
0076         Looks up a key, deletes it and runs the level compression algorithm.
0077 
0078 trie_rebalance()
0079         The key function for the dynamic trie after any change in the trie
0080         it is run to optimize and reorganize. It will walk the trie upwards
0081         towards the root from a given tnode, doing a resize() at each step
0082         to implement level compression.
0083 
0084 resize()
0085         Analyzes a tnode and optimizes the child array size by either inflating
0086         or shrinking it repeatedly until it fulfills the criteria for optimal
0087         level compression. This part follows the original paper pretty closely
0088         and there may be some room for experimentation here.
0089 
0090 inflate()
0091         Doubles the size of the child array within a tnode. Used by resize().
0092 
0093 halve()
0094         Halves the size of the child array within a tnode - the inverse of
0095         inflate(). Used by resize();
0096 
0097 fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
0098         The route manipulation functions. Should conform pretty closely to the
0099         corresponding functions in fib_hash.
0100 
0101 fn_trie_flush()
0102         This walks the full trie (using nextleaf()) and searches for empty
0103         leaves which have to be removed.
0104 
0105 fn_trie_dump()
0106         Dumps the routing table ordered by prefix length. This is somewhat
0107         slower than the corresponding fib_hash function, as we have to walk the
0108         entire trie for each prefix length. In comparison, fib_hash is organized
0109         as one "zone"/hash per prefix length.
0110 
0111 Locking
0112 -------
0113 
0114 fib_lock is used for an RW-lock in the same way that this is done in fib_hash.
0115 However, the functions are somewhat separated for other possible locking
0116 scenarios. It might conceivably be possible to run trie_rebalance via RCU
0117 to avoid read_lock in the fn_trie_lookup() function.
0118 
0119 Main lookup mechanism
0120 ---------------------
0121 fn_trie_lookup() is the main lookup function.
0122 
0123 The lookup is in its simplest form just like fib_find_node(). We descend the
0124 trie, key segment by key segment, until we find a leaf. check_leaf() does
0125 the fib_semantic_match in the leaf's sorted prefix hlist.
0126 
0127 If we find a match, we are done.
0128 
0129 If we don't find a match, we enter prefix matching mode. The prefix length,
0130 starting out at the same as the key length, is reduced one step at a time,
0131 and we backtrack upwards through the trie trying to find a longest matching
0132 prefix. The goal is always to reach a leaf and get a positive result from the
0133 fib_semantic_match mechanism.
0134 
0135 Inside each tnode, the search for longest matching prefix consists of searching
0136 through the child array, chopping off (zeroing) the least significant "1" of
0137 the child index until we find a match or the child index consists of nothing but
0138 zeros.
0139 
0140 At this point we backtrack (t->stats.backtrack++) up the trie, continuing to
0141 chop off part of the key in order to find the longest matching prefix.
0142 
0143 At this point we will repeatedly descend subtries to look for a match, and there
0144 are some optimizations available that can provide us with "shortcuts" to avoid
0145 descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
0146 
0147 To alleviate any doubts about the correctness of the route selection process,
0148 a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
0149 gives userland access to fib_lookup().