0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ====================================================================
0004 Reference-count design for elements of lists/arrays protected by RCU
0005 ====================================================================
0006
0007
0008 Please note that the percpu-ref feature is likely your first
0009 stop if you need to combine reference counts and RCU. Please see
0010 include/linux/percpu-refcount.h for more information. However, in
0011 those unusual cases where percpu-ref would consume too much memory,
0012 please read on.
0013
0014 ------------------------------------------------------------------------
0015
0016 Reference counting on elements of lists which are protected by traditional
0017 reader/writer spinlocks or semaphores are straightforward:
0018
0019 CODE LISTING A::
0020
0021 1. 2.
0022 add() search_and_reference()
0023 { {
0024 alloc_object read_lock(&list_lock);
0025 ... search_for_element
0026 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
0027 write_lock(&list_lock); ...
0028 add_element read_unlock(&list_lock);
0029 ... ...
0030 write_unlock(&list_lock); }
0031 }
0032
0033 3. 4.
0034 release_referenced() delete()
0035 { {
0036 ... write_lock(&list_lock);
0037 if(atomic_dec_and_test(&el->rc)) ...
0038 kfree(el);
0039 ... remove_element
0040 } write_unlock(&list_lock);
0041 ...
0042 if (atomic_dec_and_test(&el->rc))
0043 kfree(el);
0044 ...
0045 }
0046
0047 If this list/array is made lock free using RCU as in changing the
0048 write_lock() in add() and delete() to spin_lock() and changing read_lock()
0049 in search_and_reference() to rcu_read_lock(), the atomic_inc() in
0050 search_and_reference() could potentially hold reference to an element which
0051 has already been deleted from the list/array. Use atomic_inc_not_zero()
0052 in this scenario as follows:
0053
0054 CODE LISTING B::
0055
0056 1. 2.
0057 add() search_and_reference()
0058 { {
0059 alloc_object rcu_read_lock();
0060 ... search_for_element
0061 atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
0062 spin_lock(&list_lock); rcu_read_unlock();
0063 return FAIL;
0064 add_element }
0065 ... ...
0066 spin_unlock(&list_lock); rcu_read_unlock();
0067 } }
0068 3. 4.
0069 release_referenced() delete()
0070 { {
0071 ... spin_lock(&list_lock);
0072 if (atomic_dec_and_test(&el->rc)) ...
0073 call_rcu(&el->head, el_free); remove_element
0074 ... spin_unlock(&list_lock);
0075 } ...
0076 if (atomic_dec_and_test(&el->rc))
0077 call_rcu(&el->head, el_free);
0078 ...
0079 }
0080
0081 Sometimes, a reference to the element needs to be obtained in the
0082 update (write) stream. In such cases, atomic_inc_not_zero() might be
0083 overkill, since we hold the update-side spinlock. One might instead
0084 use atomic_inc() in such cases.
0085
0086 It is not always convenient to deal with "FAIL" in the
0087 search_and_reference() code path. In such cases, the
0088 atomic_dec_and_test() may be moved from delete() to el_free()
0089 as follows:
0090
0091 CODE LISTING C::
0092
0093 1. 2.
0094 add() search_and_reference()
0095 { {
0096 alloc_object rcu_read_lock();
0097 ... search_for_element
0098 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
0099 spin_lock(&list_lock); ...
0100
0101 add_element rcu_read_unlock();
0102 ... }
0103 spin_unlock(&list_lock); 4.
0104 } delete()
0105 3. {
0106 release_referenced() spin_lock(&list_lock);
0107 { ...
0108 ... remove_element
0109 if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
0110 kfree(el); ...
0111 ... call_rcu(&el->head, el_free);
0112 } ...
0113 5. }
0114 void el_free(struct rcu_head *rhp)
0115 {
0116 release_referenced();
0117 }
0118
0119 The key point is that the initial reference added by add() is not removed
0120 until after a grace period has elapsed following removal. This means that
0121 search_and_reference() cannot find this element, which means that the value
0122 of el->rc cannot increase. Thus, once it reaches zero, there are no
0123 readers that can or ever will be able to reference the element. The
0124 element can therefore safely be freed. This in turn guarantees that if
0125 any reader finds the element, that reader may safely acquire a reference
0126 without checking the value of the reference counter.
0127
0128 A clear advantage of the RCU-based pattern in listing C over the one
0129 in listing B is that any call to search_and_reference() that locates
0130 a given object will succeed in obtaining a reference to that object,
0131 even given a concurrent invocation of delete() for that same object.
0132 Similarly, a clear advantage of both listings B and C over listing A is
0133 that a call to delete() is not delayed even if there are an arbitrarily
0134 large number of calls to search_and_reference() searching for the same
0135 object that delete() was invoked on. Instead, all that is delayed is
0136 the eventual invocation of kfree(), which is usually not a problem on
0137 modern computer systems, even the small ones.
0138
0139 In cases where delete() can sleep, synchronize_rcu() can be called from
0140 delete(), so that el_free() can be subsumed into delete as follows::
0141
0142 4.
0143 delete()
0144 {
0145 spin_lock(&list_lock);
0146 ...
0147 remove_element
0148 spin_unlock(&list_lock);
0149 ...
0150 synchronize_rcu();
0151 if (atomic_dec_and_test(&el->rc))
0152 kfree(el);
0153 ...
0154 }
0155
0156 As additional examples in the kernel, the pattern in listing C is used by
0157 reference counting of struct pid, while the pattern in listing B is used by
0158 struct posix_acl.