0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =================================================
0004 Using RCU hlist_nulls to protect list and objects
0005 =================================================
0006
0007 This section describes how to use hlist_nulls to
0008 protect read-mostly linked lists and
0009 objects using SLAB_TYPESAFE_BY_RCU allocations.
0010
0011 Please read the basics in listRCU.rst.
0012
0013 Using 'nulls'
0014 =============
0015
0016 Using special makers (called 'nulls') is a convenient way
0017 to solve following problem :
0018
0019 A typical RCU linked list managing objects which are
0020 allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
0021 use following algos :
0022
0023 1) Lookup algo
0024 --------------
0025
0026 ::
0027
0028 rcu_read_lock()
0029 begin:
0030 obj = lockless_lookup(key);
0031 if (obj) {
0032 if (!try_get_ref(obj)) // might fail for free objects
0033 goto begin;
0034 /*
0035 * Because a writer could delete object, and a writer could
0036 * reuse these object before the RCU grace period, we
0037 * must check key after getting the reference on object
0038 */
0039 if (obj->key != key) { // not the object we expected
0040 put_ref(obj);
0041 goto begin;
0042 }
0043 }
0044 rcu_read_unlock();
0045
0046 Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
0047 but a version with an additional memory barrier (smp_rmb())
0048
0049 ::
0050
0051 lockless_lookup(key)
0052 {
0053 struct hlist_node *node, *next;
0054 for (pos = rcu_dereference((head)->first);
0055 pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
0056 ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
0057 pos = rcu_dereference(next))
0058 if (obj->key == key)
0059 return obj;
0060 return NULL;
0061 }
0062
0063 And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
0064
0065 struct hlist_node *node;
0066 for (pos = rcu_dereference((head)->first);
0067 pos && ({ prefetch(pos->next); 1; }) &&
0068 ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
0069 pos = rcu_dereference(pos->next))
0070 if (obj->key == key)
0071 return obj;
0072 return NULL;
0073
0074 Quoting Corey Minyard::
0075
0076 "If the object is moved from one list to another list in-between the
0077 time the hash is calculated and the next field is accessed, and the
0078 object has moved to the end of a new list, the traversal will not
0079 complete properly on the list it should have, since the object will
0080 be on the end of the new list and there's not a way to tell it's on a
0081 new list and restart the list traversal. I think that this can be
0082 solved by pre-fetching the "next" field (with proper barriers) before
0083 checking the key."
0084
0085 2) Insert algo
0086 --------------
0087
0088 We need to make sure a reader cannot read the new 'obj->obj_next' value
0089 and previous value of 'obj->key'. Or else, an item could be deleted
0090 from a chain, and inserted into another chain. If new chain was empty
0091 before the move, 'next' pointer is NULL, and lockless reader can
0092 not detect it missed following items in original chain.
0093
0094 ::
0095
0096 /*
0097 * Please note that new inserts are done at the head of list,
0098 * not in the middle or end.
0099 */
0100 obj = kmem_cache_alloc(...);
0101 lock_chain(); // typically a spin_lock()
0102 obj->key = key;
0103 /*
0104 * we need to make sure obj->key is updated before obj->next
0105 * or obj->refcnt
0106 */
0107 smp_wmb();
0108 atomic_set(&obj->refcnt, 1);
0109 hlist_add_head_rcu(&obj->obj_node, list);
0110 unlock_chain(); // typically a spin_unlock()
0111
0112
0113 3) Remove algo
0114 --------------
0115 Nothing special here, we can use a standard RCU hlist deletion.
0116 But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
0117 very very fast (before the end of RCU grace period)
0118
0119 ::
0120
0121 if (put_last_reference_on(obj) {
0122 lock_chain(); // typically a spin_lock()
0123 hlist_del_init_rcu(&obj->obj_node);
0124 unlock_chain(); // typically a spin_unlock()
0125 kmem_cache_free(cachep, obj);
0126 }
0127
0128
0129
0130 --------------------------------------------------------------------------
0131
0132 Avoiding extra smp_rmb()
0133 ========================
0134
0135 With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
0136 and extra smp_wmb() in insert function.
0137
0138 For example, if we choose to store the slot number as the 'nulls'
0139 end-of-list marker for each slot of the hash table, we can detect
0140 a race (some writer did a delete and/or a move of an object
0141 to another chain) checking the final 'nulls' value if
0142 the lookup met the end of chain. If final 'nulls' value
0143 is not the slot number, then we must restart the lookup at
0144 the beginning. If the object was moved to the same chain,
0145 then the reader doesn't care : It might eventually
0146 scan the list again without harm.
0147
0148
0149 1) lookup algo
0150 --------------
0151
0152 ::
0153
0154 head = &table[slot];
0155 rcu_read_lock();
0156 begin:
0157 hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
0158 if (obj->key == key) {
0159 if (!try_get_ref(obj)) // might fail for free objects
0160 goto begin;
0161 if (obj->key != key) { // not the object we expected
0162 put_ref(obj);
0163 goto begin;
0164 }
0165 goto out;
0166 }
0167 /*
0168 * if the nulls value we got at the end of this lookup is
0169 * not the expected one, we must restart lookup.
0170 * We probably met an item that was moved to another chain.
0171 */
0172 if (get_nulls_value(node) != slot)
0173 goto begin;
0174 obj = NULL;
0175
0176 out:
0177 rcu_read_unlock();
0178
0179 2) Insert function
0180 ------------------
0181
0182 ::
0183
0184 /*
0185 * Please note that new inserts are done at the head of list,
0186 * not in the middle or end.
0187 */
0188 obj = kmem_cache_alloc(cachep);
0189 lock_chain(); // typically a spin_lock()
0190 obj->key = key;
0191 /*
0192 * changes to obj->key must be visible before refcnt one
0193 */
0194 smp_wmb();
0195 atomic_set(&obj->refcnt, 1);
0196 /*
0197 * insert obj in RCU way (readers might be traversing chain)
0198 */
0199 hlist_nulls_add_head_rcu(&obj->obj_node, list);
0200 unlock_chain(); // typically a spin_unlock()