Back to home page

LXR

 
 

    


0001 Red-black Trees (rbtree) in Linux
0002 January 18, 2007
0003 Rob Landley <rob@landley.net>
0004 =============================
0005 
0006 What are red-black trees, and what are they for?
0007 ------------------------------------------------
0008 
0009 Red-black trees are a type of self-balancing binary search tree, used for
0010 storing sortable key/value data pairs.  This differs from radix trees (which
0011 are used to efficiently store sparse arrays and thus use long integer indexes
0012 to insert/access/delete nodes) and hash tables (which are not kept sorted to
0013 be easily traversed in order, and must be tuned for a specific size and
0014 hash function where rbtrees scale gracefully storing arbitrary keys).
0015 
0016 Red-black trees are similar to AVL trees, but provide faster real-time bounded
0017 worst case performance for insertion and deletion (at most two rotations and
0018 three rotations, respectively, to balance the tree), with slightly slower
0019 (but still O(log n)) lookup time.
0020 
0021 To quote Linux Weekly News:
0022 
0023     There are a number of red-black trees in use in the kernel.
0024     The deadline and CFQ I/O schedulers employ rbtrees to
0025     track requests; the packet CD/DVD driver does the same.
0026     The high-resolution timer code uses an rbtree to organize outstanding
0027     timer requests.  The ext3 filesystem tracks directory entries in a
0028     red-black tree.  Virtual memory areas (VMAs) are tracked with red-black
0029     trees, as are epoll file descriptors, cryptographic keys, and network
0030     packets in the "hierarchical token bucket" scheduler.
0031 
0032 This document covers use of the Linux rbtree implementation.  For more
0033 information on the nature and implementation of Red Black Trees,  see:
0034 
0035   Linux Weekly News article on red-black trees
0036     http://lwn.net/Articles/184495/
0037 
0038   Wikipedia entry on red-black trees
0039     http://en.wikipedia.org/wiki/Red-black_tree
0040 
0041 Linux implementation of red-black trees
0042 ---------------------------------------
0043 
0044 Linux's rbtree implementation lives in the file "lib/rbtree.c".  To use it,
0045 "#include <linux/rbtree.h>".
0046 
0047 The Linux rbtree implementation is optimized for speed, and thus has one
0048 less layer of indirection (and better cache locality) than more traditional
0049 tree implementations.  Instead of using pointers to separate rb_node and data
0050 structures, each instance of struct rb_node is embedded in the data structure
0051 it organizes.  And instead of using a comparison callback function pointer,
0052 users are expected to write their own tree search and insert functions
0053 which call the provided rbtree functions.  Locking is also left up to the
0054 user of the rbtree code.
0055 
0056 Creating a new rbtree
0057 ---------------------
0058 
0059 Data nodes in an rbtree tree are structures containing a struct rb_node member:
0060 
0061   struct mytype {
0062         struct rb_node node;
0063         char *keystring;
0064   };
0065 
0066 When dealing with a pointer to the embedded struct rb_node, the containing data
0067 structure may be accessed with the standard container_of() macro.  In addition,
0068 individual members may be accessed directly via rb_entry(node, type, member).
0069 
0070 At the root of each rbtree is an rb_root structure, which is initialized to be
0071 empty via:
0072 
0073   struct rb_root mytree = RB_ROOT;
0074 
0075 Searching for a value in an rbtree
0076 ----------------------------------
0077 
0078 Writing a search function for your tree is fairly straightforward: start at the
0079 root, compare each value, and follow the left or right branch as necessary.
0080 
0081 Example:
0082 
0083   struct mytype *my_search(struct rb_root *root, char *string)
0084   {
0085         struct rb_node *node = root->rb_node;
0086 
0087         while (node) {
0088                 struct mytype *data = container_of(node, struct mytype, node);
0089                 int result;
0090 
0091                 result = strcmp(string, data->keystring);
0092 
0093                 if (result < 0)
0094                         node = node->rb_left;
0095                 else if (result > 0)
0096                         node = node->rb_right;
0097                 else
0098                         return data;
0099         }
0100         return NULL;
0101   }
0102 
0103 Inserting data into an rbtree
0104 -----------------------------
0105 
0106 Inserting data in the tree involves first searching for the place to insert the
0107 new node, then inserting the node and rebalancing ("recoloring") the tree.
0108 
0109 The search for insertion differs from the previous search by finding the
0110 location of the pointer on which to graft the new node.  The new node also
0111 needs a link to its parent node for rebalancing purposes.
0112 
0113 Example:
0114 
0115   int my_insert(struct rb_root *root, struct mytype *data)
0116   {
0117         struct rb_node **new = &(root->rb_node), *parent = NULL;
0118 
0119         /* Figure out where to put new node */
0120         while (*new) {
0121                 struct mytype *this = container_of(*new, struct mytype, node);
0122                 int result = strcmp(data->keystring, this->keystring);
0123 
0124                 parent = *new;
0125                 if (result < 0)
0126                         new = &((*new)->rb_left);
0127                 else if (result > 0)
0128                         new = &((*new)->rb_right);
0129                 else
0130                         return FALSE;
0131         }
0132 
0133         /* Add new node and rebalance tree. */
0134         rb_link_node(&data->node, parent, new);
0135         rb_insert_color(&data->node, root);
0136 
0137         return TRUE;
0138   }
0139 
0140 Removing or replacing existing data in an rbtree
0141 ------------------------------------------------
0142 
0143 To remove an existing node from a tree, call:
0144 
0145   void rb_erase(struct rb_node *victim, struct rb_root *tree);
0146 
0147 Example:
0148 
0149   struct mytype *data = mysearch(&mytree, "walrus");
0150 
0151   if (data) {
0152         rb_erase(&data->node, &mytree);
0153         myfree(data);
0154   }
0155 
0156 To replace an existing node in a tree with a new one with the same key, call:
0157 
0158   void rb_replace_node(struct rb_node *old, struct rb_node *new,
0159                         struct rb_root *tree);
0160 
0161 Replacing a node this way does not re-sort the tree: If the new node doesn't
0162 have the same key as the old node, the rbtree will probably become corrupted.
0163 
0164 Iterating through the elements stored in an rbtree (in sort order)
0165 ------------------------------------------------------------------
0166 
0167 Four functions are provided for iterating through an rbtree's contents in
0168 sorted order.  These work on arbitrary trees, and should not need to be
0169 modified or wrapped (except for locking purposes):
0170 
0171   struct rb_node *rb_first(struct rb_root *tree);
0172   struct rb_node *rb_last(struct rb_root *tree);
0173   struct rb_node *rb_next(struct rb_node *node);
0174   struct rb_node *rb_prev(struct rb_node *node);
0175 
0176 To start iterating, call rb_first() or rb_last() with a pointer to the root
0177 of the tree, which will return a pointer to the node structure contained in
0178 the first or last element in the tree.  To continue, fetch the next or previous
0179 node by calling rb_next() or rb_prev() on the current node.  This will return
0180 NULL when there are no more nodes left.
0181 
0182 The iterator functions return a pointer to the embedded struct rb_node, from
0183 which the containing data structure may be accessed with the container_of()
0184 macro, and individual members may be accessed directly via
0185 rb_entry(node, type, member).
0186 
0187 Example:
0188 
0189   struct rb_node *node;
0190   for (node = rb_first(&mytree); node; node = rb_next(node))
0191         printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
0192 
0193 Support for Augmented rbtrees
0194 -----------------------------
0195 
0196 Augmented rbtree is an rbtree with "some" additional data stored in
0197 each node, where the additional data for node N must be a function of
0198 the contents of all nodes in the subtree rooted at N. This data can
0199 be used to augment some new functionality to rbtree. Augmented rbtree
0200 is an optional feature built on top of basic rbtree infrastructure.
0201 An rbtree user who wants this feature will have to call the augmentation
0202 functions with the user provided augmentation callback when inserting
0203 and erasing nodes.
0204 
0205 C files implementing augmented rbtree manipulation must include
0206 <linux/rbtree_augmented.h> instead of <linux/rbtree.h>. Note that
0207 linux/rbtree_augmented.h exposes some rbtree implementations details
0208 you are not expected to rely on; please stick to the documented APIs
0209 there and do not include <linux/rbtree_augmented.h> from header files
0210 either so as to minimize chances of your users accidentally relying on
0211 such implementation details.
0212 
0213 On insertion, the user must update the augmented information on the path
0214 leading to the inserted node, then call rb_link_node() as usual and
0215 rb_augment_inserted() instead of the usual rb_insert_color() call.
0216 If rb_augment_inserted() rebalances the rbtree, it will callback into
0217 a user provided function to update the augmented information on the
0218 affected subtrees.
0219 
0220 When erasing a node, the user must call rb_erase_augmented() instead of
0221 rb_erase(). rb_erase_augmented() calls back into user provided functions
0222 to updated the augmented information on affected subtrees.
0223 
0224 In both cases, the callbacks are provided through struct rb_augment_callbacks.
0225 3 callbacks must be defined:
0226 
0227 - A propagation callback, which updates the augmented value for a given
0228   node and its ancestors, up to a given stop point (or NULL to update
0229   all the way to the root).
0230 
0231 - A copy callback, which copies the augmented value for a given subtree
0232   to a newly assigned subtree root.
0233 
0234 - A tree rotation callback, which copies the augmented value for a given
0235   subtree to a newly assigned subtree root AND recomputes the augmented
0236   information for the former subtree root.
0237 
0238 The compiled code for rb_erase_augmented() may inline the propagation and
0239 copy callbacks, which results in a large function, so each augmented rbtree
0240 user should have a single rb_erase_augmented() call site in order to limit
0241 compiled code size.
0242 
0243 
0244 Sample usage:
0245 
0246 Interval tree is an example of augmented rb tree. Reference -
0247 "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
0248 More details about interval trees:
0249 
0250 Classical rbtree has a single key and it cannot be directly used to store
0251 interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
0252 lo:hi or to find whether there is an exact match for a new lo:hi.
0253 
0254 However, rbtree can be augmented to store such interval ranges in a structured
0255 way making it possible to do efficient lookup and exact match.
0256 
0257 This "extra information" stored in each node is the maximum hi
0258 (max_hi) value among all the nodes that are its descendants. This
0259 information can be maintained at each node just be looking at the node
0260 and its immediate children. And this will be used in O(log n) lookup
0261 for lowest match (lowest start address among all possible matches)
0262 with something like:
0263 
0264 struct interval_tree_node *
0265 interval_tree_first_match(struct rb_root *root,
0266                           unsigned long start, unsigned long last)
0267 {
0268         struct interval_tree_node *node;
0269 
0270         if (!root->rb_node)
0271                 return NULL;
0272         node = rb_entry(root->rb_node, struct interval_tree_node, rb);
0273 
0274         while (true) {
0275                 if (node->rb.rb_left) {
0276                         struct interval_tree_node *left =
0277                                 rb_entry(node->rb.rb_left,
0278                                          struct interval_tree_node, rb);
0279                         if (left->__subtree_last >= start) {
0280                                 /*
0281                                  * Some nodes in left subtree satisfy Cond2.
0282                                  * Iterate to find the leftmost such node N.
0283                                  * If it also satisfies Cond1, that's the match
0284                                  * we are looking for. Otherwise, there is no
0285                                  * matching interval as nodes to the right of N
0286                                  * can't satisfy Cond1 either.
0287                                  */
0288                                 node = left;
0289                                 continue;
0290                         }
0291                 }
0292                 if (node->start <= last) {              /* Cond1 */
0293                         if (node->last >= start)        /* Cond2 */
0294                                 return node;    /* node is leftmost match */
0295                         if (node->rb.rb_right) {
0296                                 node = rb_entry(node->rb.rb_right,
0297                                         struct interval_tree_node, rb);
0298                                 if (node->__subtree_last >= start)
0299                                         continue;
0300                         }
0301                 }
0302                 return NULL;    /* No match */
0303         }
0304 }
0305 
0306 Insertion/removal are defined using the following augmented callbacks:
0307 
0308 static inline unsigned long
0309 compute_subtree_last(struct interval_tree_node *node)
0310 {
0311         unsigned long max = node->last, subtree_last;
0312         if (node->rb.rb_left) {
0313                 subtree_last = rb_entry(node->rb.rb_left,
0314                         struct interval_tree_node, rb)->__subtree_last;
0315                 if (max < subtree_last)
0316                         max = subtree_last;
0317         }
0318         if (node->rb.rb_right) {
0319                 subtree_last = rb_entry(node->rb.rb_right,
0320                         struct interval_tree_node, rb)->__subtree_last;
0321                 if (max < subtree_last)
0322                         max = subtree_last;
0323         }
0324         return max;
0325 }
0326 
0327 static void augment_propagate(struct rb_node *rb, struct rb_node *stop)
0328 {
0329         while (rb != stop) {
0330                 struct interval_tree_node *node =
0331                         rb_entry(rb, struct interval_tree_node, rb);
0332                 unsigned long subtree_last = compute_subtree_last(node);
0333                 if (node->__subtree_last == subtree_last)
0334                         break;
0335                 node->__subtree_last = subtree_last;
0336                 rb = rb_parent(&node->rb);
0337         }
0338 }
0339 
0340 static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new)
0341 {
0342         struct interval_tree_node *old =
0343                 rb_entry(rb_old, struct interval_tree_node, rb);
0344         struct interval_tree_node *new =
0345                 rb_entry(rb_new, struct interval_tree_node, rb);
0346 
0347         new->__subtree_last = old->__subtree_last;
0348 }
0349 
0350 static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new)
0351 {
0352         struct interval_tree_node *old =
0353                 rb_entry(rb_old, struct interval_tree_node, rb);
0354         struct interval_tree_node *new =
0355                 rb_entry(rb_new, struct interval_tree_node, rb);
0356 
0357         new->__subtree_last = old->__subtree_last;
0358         old->__subtree_last = compute_subtree_last(old);
0359 }
0360 
0361 static const struct rb_augment_callbacks augment_callbacks = {
0362         augment_propagate, augment_copy, augment_rotate
0363 };
0364 
0365 void interval_tree_insert(struct interval_tree_node *node,
0366                           struct rb_root *root)
0367 {
0368         struct rb_node **link = &root->rb_node, *rb_parent = NULL;
0369         unsigned long start = node->start, last = node->last;
0370         struct interval_tree_node *parent;
0371 
0372         while (*link) {
0373                 rb_parent = *link;
0374                 parent = rb_entry(rb_parent, struct interval_tree_node, rb);
0375                 if (parent->__subtree_last < last)
0376                         parent->__subtree_last = last;
0377                 if (start < parent->start)
0378                         link = &parent->rb.rb_left;
0379                 else
0380                         link = &parent->rb.rb_right;
0381         }
0382 
0383         node->__subtree_last = last;
0384         rb_link_node(&node->rb, rb_parent, link);
0385         rb_insert_augmented(&node->rb, root, &augment_callbacks);
0386 }
0387 
0388 void interval_tree_remove(struct interval_tree_node *node,
0389                           struct rb_root *root)
0390 {
0391         rb_erase_augmented(&node->rb, root, &augment_callbacks);
0392 }