Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =================================================
0004 Using RCU hlist_nulls to protect list and objects
0005 =================================================
0006 
0007 This section describes how to use hlist_nulls to
0008 protect read-mostly linked lists and
0009 objects using SLAB_TYPESAFE_BY_RCU allocations.
0010 
0011 Please read the basics in listRCU.rst.
0012 
0013 Using 'nulls'
0014 =============
0015 
0016 Using special makers (called 'nulls') is a convenient way
0017 to solve following problem :
0018 
0019 A typical RCU linked list managing objects which are
0020 allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
0021 use following algos :
0022 
0023 1) Lookup algo
0024 --------------
0025 
0026 ::
0027 
0028   rcu_read_lock()
0029   begin:
0030   obj = lockless_lookup(key);
0031   if (obj) {
0032     if (!try_get_ref(obj)) // might fail for free objects
0033       goto begin;
0034     /*
0035     * Because a writer could delete object, and a writer could
0036     * reuse these object before the RCU grace period, we
0037     * must check key after getting the reference on object
0038     */
0039     if (obj->key != key) { // not the object we expected
0040       put_ref(obj);
0041       goto begin;
0042     }
0043   }
0044   rcu_read_unlock();
0045 
0046 Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
0047 but a version with an additional memory barrier (smp_rmb())
0048 
0049 ::
0050 
0051   lockless_lookup(key)
0052   {
0053     struct hlist_node *node, *next;
0054     for (pos = rcu_dereference((head)->first);
0055         pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
0056         ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
0057         pos = rcu_dereference(next))
0058       if (obj->key == key)
0059         return obj;
0060     return NULL;
0061   }
0062 
0063 And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
0064 
0065   struct hlist_node *node;
0066   for (pos = rcu_dereference((head)->first);
0067         pos && ({ prefetch(pos->next); 1; }) &&
0068         ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
0069         pos = rcu_dereference(pos->next))
0070    if (obj->key == key)
0071      return obj;
0072   return NULL;
0073 
0074 Quoting Corey Minyard::
0075 
0076   "If the object is moved from one list to another list in-between the
0077   time the hash is calculated and the next field is accessed, and the
0078   object has moved to the end of a new list, the traversal will not
0079   complete properly on the list it should have, since the object will
0080   be on the end of the new list and there's not a way to tell it's on a
0081   new list and restart the list traversal. I think that this can be
0082   solved by pre-fetching the "next" field (with proper barriers) before
0083   checking the key."
0084 
0085 2) Insert algo
0086 --------------
0087 
0088 We need to make sure a reader cannot read the new 'obj->obj_next' value
0089 and previous value of 'obj->key'. Or else, an item could be deleted
0090 from a chain, and inserted into another chain. If new chain was empty
0091 before the move, 'next' pointer is NULL, and lockless reader can
0092 not detect it missed following items in original chain.
0093 
0094 ::
0095 
0096   /*
0097   * Please note that new inserts are done at the head of list,
0098   * not in the middle or end.
0099   */
0100   obj = kmem_cache_alloc(...);
0101   lock_chain(); // typically a spin_lock()
0102   obj->key = key;
0103   /*
0104   * we need to make sure obj->key is updated before obj->next
0105   * or obj->refcnt
0106   */
0107   smp_wmb();
0108   atomic_set(&obj->refcnt, 1);
0109   hlist_add_head_rcu(&obj->obj_node, list);
0110   unlock_chain(); // typically a spin_unlock()
0111 
0112 
0113 3) Remove algo
0114 --------------
0115 Nothing special here, we can use a standard RCU hlist deletion.
0116 But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
0117 very very fast (before the end of RCU grace period)
0118 
0119 ::
0120 
0121   if (put_last_reference_on(obj) {
0122     lock_chain(); // typically a spin_lock()
0123     hlist_del_init_rcu(&obj->obj_node);
0124     unlock_chain(); // typically a spin_unlock()
0125     kmem_cache_free(cachep, obj);
0126   }
0127 
0128 
0129 
0130 --------------------------------------------------------------------------
0131 
0132 Avoiding extra smp_rmb()
0133 ========================
0134 
0135 With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
0136 and extra smp_wmb() in insert function.
0137 
0138 For example, if we choose to store the slot number as the 'nulls'
0139 end-of-list marker for each slot of the hash table, we can detect
0140 a race (some writer did a delete and/or a move of an object
0141 to another chain) checking the final 'nulls' value if
0142 the lookup met the end of chain. If final 'nulls' value
0143 is not the slot number, then we must restart the lookup at
0144 the beginning. If the object was moved to the same chain,
0145 then the reader doesn't care : It might eventually
0146 scan the list again without harm.
0147 
0148 
0149 1) lookup algo
0150 --------------
0151 
0152 ::
0153 
0154   head = &table[slot];
0155   rcu_read_lock();
0156   begin:
0157   hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
0158     if (obj->key == key) {
0159       if (!try_get_ref(obj)) // might fail for free objects
0160         goto begin;
0161       if (obj->key != key) { // not the object we expected
0162         put_ref(obj);
0163         goto begin;
0164       }
0165     goto out;
0166   }
0167   /*
0168   * if the nulls value we got at the end of this lookup is
0169   * not the expected one, we must restart lookup.
0170   * We probably met an item that was moved to another chain.
0171   */
0172   if (get_nulls_value(node) != slot)
0173   goto begin;
0174   obj = NULL;
0175 
0176   out:
0177   rcu_read_unlock();
0178 
0179 2) Insert function
0180 ------------------
0181 
0182 ::
0183 
0184   /*
0185   * Please note that new inserts are done at the head of list,
0186   * not in the middle or end.
0187   */
0188   obj = kmem_cache_alloc(cachep);
0189   lock_chain(); // typically a spin_lock()
0190   obj->key = key;
0191   /*
0192   * changes to obj->key must be visible before refcnt one
0193   */
0194   smp_wmb();
0195   atomic_set(&obj->refcnt, 1);
0196   /*
0197   * insert obj in RCU way (readers might be traversing chain)
0198   */
0199   hlist_nulls_add_head_rcu(&obj->obj_node, list);
0200   unlock_chain(); // typically a spin_unlock()