0001 .. _rcu_dereference_doc:
0002
0003 PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
0004 ===============================================================
0005
0006 Most of the time, you can use values from rcu_dereference() or one of
0007 the similar primitives without worries. Dereferencing (prefix "*"),
0008 field selection ("->"), assignment ("="), address-of ("&"), addition and
0009 subtraction of constants, and casts all work quite naturally and safely.
0010
0011 It is nevertheless possible to get into trouble with other operations.
0012 Follow these rules to keep your RCU code working properly:
0013
0014 - You must use one of the rcu_dereference() family of primitives
0015 to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
0016 will complain. Worse yet, your code can see random memory-corruption
0017 bugs due to games that compilers and DEC Alpha can play.
0018 Without one of the rcu_dereference() primitives, compilers
0019 can reload the value, and won't your code have fun with two
0020 different values for a single pointer! Without rcu_dereference(),
0021 DEC Alpha can load a pointer, dereference that pointer, and
0022 return data preceding initialization that preceded the store of
0023 the pointer.
0024
0025 In addition, the volatile cast in rcu_dereference() prevents the
0026 compiler from deducing the resulting pointer value. Please see
0027 the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH"
0028 for an example where the compiler can in fact deduce the exact
0029 value of the pointer, and thus cause misordering.
0030
0031 - In the special case where data is added but is never removed
0032 while readers are accessing the structure, READ_ONCE() may be used
0033 instead of rcu_dereference(). In this case, use of READ_ONCE()
0034 takes on the role of the lockless_dereference() primitive that
0035 was removed in v4.15.
0036
0037 - You are only permitted to use rcu_dereference on pointer values.
0038 The compiler simply knows too much about integral values to
0039 trust it to carry dependencies through integer operations.
0040 There are a very few exceptions, namely that you can temporarily
0041 cast the pointer to uintptr_t in order to:
0042
0043 - Set bits and clear bits down in the must-be-zero low-order
0044 bits of that pointer. This clearly means that the pointer
0045 must have alignment constraints, for example, this does
0046 *not* work in general for char* pointers.
0047
0048 - XOR bits to translate pointers, as is done in some
0049 classic buddy-allocator algorithms.
0050
0051 It is important to cast the value back to pointer before
0052 doing much of anything else with it.
0053
0054 - Avoid cancellation when using the "+" and "-" infix arithmetic
0055 operators. For example, for a given variable "x", avoid
0056 "(x-(uintptr_t)x)" for char* pointers. The compiler is within its
0057 rights to substitute zero for this sort of expression, so that
0058 subsequent accesses no longer depend on the rcu_dereference(),
0059 again possibly resulting in bugs due to misordering.
0060
0061 Of course, if "p" is a pointer from rcu_dereference(), and "a"
0062 and "b" are integers that happen to be equal, the expression
0063 "p+a-b" is safe because its value still necessarily depends on
0064 the rcu_dereference(), thus maintaining proper ordering.
0065
0066 - If you are using RCU to protect JITed functions, so that the
0067 "()" function-invocation operator is applied to a value obtained
0068 (directly or indirectly) from rcu_dereference(), you may need to
0069 interact directly with the hardware to flush instruction caches.
0070 This issue arises on some systems when a newly JITed function is
0071 using the same memory that was used by an earlier JITed function.
0072
0073 - Do not use the results from relational operators ("==", "!=",
0074 ">", ">=", "<", or "<=") when dereferencing. For example,
0075 the following (quite strange) code is buggy::
0076
0077 int *p;
0078 int *q;
0079
0080 ...
0081
0082 p = rcu_dereference(gp)
0083 q = &global_q;
0084 q += p > &oom_p;
0085 r1 = *q; /* BUGGY!!! */
0086
0087 As before, the reason this is buggy is that relational operators
0088 are often compiled using branches. And as before, although
0089 weak-memory machines such as ARM or PowerPC do order stores
0090 after such branches, but can speculate loads, which can again
0091 result in misordering bugs.
0092
0093 - Be very careful about comparing pointers obtained from
0094 rcu_dereference() against non-NULL values. As Linus Torvalds
0095 explained, if the two pointers are equal, the compiler could
0096 substitute the pointer you are comparing against for the pointer
0097 obtained from rcu_dereference(). For example::
0098
0099 p = rcu_dereference(gp);
0100 if (p == &default_struct)
0101 do_default(p->a);
0102
0103 Because the compiler now knows that the value of "p" is exactly
0104 the address of the variable "default_struct", it is free to
0105 transform this code into the following::
0106
0107 p = rcu_dereference(gp);
0108 if (p == &default_struct)
0109 do_default(default_struct.a);
0110
0111 On ARM and Power hardware, the load from "default_struct.a"
0112 can now be speculated, such that it might happen before the
0113 rcu_dereference(). This could result in bugs due to misordering.
0114
0115 However, comparisons are OK in the following cases:
0116
0117 - The comparison was against the NULL pointer. If the
0118 compiler knows that the pointer is NULL, you had better
0119 not be dereferencing it anyway. If the comparison is
0120 non-equal, the compiler is none the wiser. Therefore,
0121 it is safe to compare pointers from rcu_dereference()
0122 against NULL pointers.
0123
0124 - The pointer is never dereferenced after being compared.
0125 Since there are no subsequent dereferences, the compiler
0126 cannot use anything it learned from the comparison
0127 to reorder the non-existent subsequent dereferences.
0128 This sort of comparison occurs frequently when scanning
0129 RCU-protected circular linked lists.
0130
0131 Note that if checks for being within an RCU read-side
0132 critical section are not required and the pointer is never
0133 dereferenced, rcu_access_pointer() should be used in place
0134 of rcu_dereference().
0135
0136 - The comparison is against a pointer that references memory
0137 that was initialized "a long time ago." The reason
0138 this is safe is that even if misordering occurs, the
0139 misordering will not affect the accesses that follow
0140 the comparison. So exactly how long ago is "a long
0141 time ago"? Here are some possibilities:
0142
0143 - Compile time.
0144
0145 - Boot time.
0146
0147 - Module-init time for module code.
0148
0149 - Prior to kthread creation for kthread code.
0150
0151 - During some prior acquisition of the lock that
0152 we now hold.
0153
0154 - Before mod_timer() time for a timer handler.
0155
0156 There are many other possibilities involving the Linux
0157 kernel's wide array of primitives that cause code to
0158 be invoked at a later time.
0159
0160 - The pointer being compared against also came from
0161 rcu_dereference(). In this case, both pointers depend
0162 on one rcu_dereference() or another, so you get proper
0163 ordering either way.
0164
0165 That said, this situation can make certain RCU usage
0166 bugs more likely to happen. Which can be a good thing,
0167 at least if they happen during testing. An example
0168 of such an RCU usage bug is shown in the section titled
0169 "EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
0170
0171 - All of the accesses following the comparison are stores,
0172 so that a control dependency preserves the needed ordering.
0173 That said, it is easy to get control dependencies wrong.
0174 Please see the "CONTROL DEPENDENCIES" section of
0175 Documentation/memory-barriers.txt for more details.
0176
0177 - The pointers are not equal *and* the compiler does
0178 not have enough information to deduce the value of the
0179 pointer. Note that the volatile cast in rcu_dereference()
0180 will normally prevent the compiler from knowing too much.
0181
0182 However, please note that if the compiler knows that the
0183 pointer takes on only one of two values, a not-equal
0184 comparison will provide exactly the information that the
0185 compiler needs to deduce the value of the pointer.
0186
0187 - Disable any value-speculation optimizations that your compiler
0188 might provide, especially if you are making use of feedback-based
0189 optimizations that take data collected from prior runs. Such
0190 value-speculation optimizations reorder operations by design.
0191
0192 There is one exception to this rule: Value-speculation
0193 optimizations that leverage the branch-prediction hardware are
0194 safe on strongly ordered systems (such as x86), but not on weakly
0195 ordered systems (such as ARM or Power). Choose your compiler
0196 command-line options wisely!
0197
0198
0199 EXAMPLE OF AMPLIFIED RCU-USAGE BUG
0200 ----------------------------------
0201
0202 Because updaters can run concurrently with RCU readers, RCU readers can
0203 see stale and/or inconsistent values. If RCU readers need fresh or
0204 consistent values, which they sometimes do, they need to take proper
0205 precautions. To see this, consider the following code fragment::
0206
0207 struct foo {
0208 int a;
0209 int b;
0210 int c;
0211 };
0212 struct foo *gp1;
0213 struct foo *gp2;
0214
0215 void updater(void)
0216 {
0217 struct foo *p;
0218
0219 p = kmalloc(...);
0220 if (p == NULL)
0221 deal_with_it();
0222 p->a = 42; /* Each field in its own cache line. */
0223 p->b = 43;
0224 p->c = 44;
0225 rcu_assign_pointer(gp1, p);
0226 p->b = 143;
0227 p->c = 144;
0228 rcu_assign_pointer(gp2, p);
0229 }
0230
0231 void reader(void)
0232 {
0233 struct foo *p;
0234 struct foo *q;
0235 int r1, r2;
0236
0237 p = rcu_dereference(gp2);
0238 if (p == NULL)
0239 return;
0240 r1 = p->b; /* Guaranteed to get 143. */
0241 q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
0242 if (p == q) {
0243 /* The compiler decides that q->c is same as p->c. */
0244 r2 = p->c; /* Could get 44 on weakly order system. */
0245 }
0246 do_something_with(r1, r2);
0247 }
0248
0249 You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,
0250 but you should not be. After all, the updater might have been invoked
0251 a second time between the time reader() loaded into "r1" and the time
0252 that it loaded into "r2". The fact that this same result can occur due
0253 to some reordering from the compiler and CPUs is beside the point.
0254
0255 But suppose that the reader needs a consistent view?
0256
0257 Then one approach is to use locking, for example, as follows::
0258
0259 struct foo {
0260 int a;
0261 int b;
0262 int c;
0263 spinlock_t lock;
0264 };
0265 struct foo *gp1;
0266 struct foo *gp2;
0267
0268 void updater(void)
0269 {
0270 struct foo *p;
0271
0272 p = kmalloc(...);
0273 if (p == NULL)
0274 deal_with_it();
0275 spin_lock(&p->lock);
0276 p->a = 42; /* Each field in its own cache line. */
0277 p->b = 43;
0278 p->c = 44;
0279 spin_unlock(&p->lock);
0280 rcu_assign_pointer(gp1, p);
0281 spin_lock(&p->lock);
0282 p->b = 143;
0283 p->c = 144;
0284 spin_unlock(&p->lock);
0285 rcu_assign_pointer(gp2, p);
0286 }
0287
0288 void reader(void)
0289 {
0290 struct foo *p;
0291 struct foo *q;
0292 int r1, r2;
0293
0294 p = rcu_dereference(gp2);
0295 if (p == NULL)
0296 return;
0297 spin_lock(&p->lock);
0298 r1 = p->b; /* Guaranteed to get 143. */
0299 q = rcu_dereference(gp1); /* Guaranteed non-NULL. */
0300 if (p == q) {
0301 /* The compiler decides that q->c is same as p->c. */
0302 r2 = p->c; /* Locking guarantees r2 == 144. */
0303 }
0304 spin_unlock(&p->lock);
0305 do_something_with(r1, r2);
0306 }
0307
0308 As always, use the right tool for the job!
0309
0310
0311 EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
0312 -----------------------------------------
0313
0314 If a pointer obtained from rcu_dereference() compares not-equal to some
0315 other pointer, the compiler normally has no clue what the value of the
0316 first pointer might be. This lack of knowledge prevents the compiler
0317 from carrying out optimizations that otherwise might destroy the ordering
0318 guarantees that RCU depends on. And the volatile cast in rcu_dereference()
0319 should prevent the compiler from guessing the value.
0320
0321 But without rcu_dereference(), the compiler knows more than you might
0322 expect. Consider the following code fragment::
0323
0324 struct foo {
0325 int a;
0326 int b;
0327 };
0328 static struct foo variable1;
0329 static struct foo variable2;
0330 static struct foo *gp = &variable1;
0331
0332 void updater(void)
0333 {
0334 initialize_foo(&variable2);
0335 rcu_assign_pointer(gp, &variable2);
0336 /*
0337 * The above is the only store to gp in this translation unit,
0338 * and the address of gp is not exported in any way.
0339 */
0340 }
0341
0342 int reader(void)
0343 {
0344 struct foo *p;
0345
0346 p = gp;
0347 barrier();
0348 if (p == &variable1)
0349 return p->a; /* Must be variable1.a. */
0350 else
0351 return p->b; /* Must be variable2.b. */
0352 }
0353
0354 Because the compiler can see all stores to "gp", it knows that the only
0355 possible values of "gp" are "variable1" on the one hand and "variable2"
0356 on the other. The comparison in reader() therefore tells the compiler
0357 the exact value of "p" even in the not-equals case. This allows the
0358 compiler to make the return values independent of the load from "gp",
0359 in turn destroying the ordering between this load and the loads of the
0360 return values. This can result in "p->b" returning pre-initialization
0361 garbage values.
0362
0363 In short, rcu_dereference() is *not* optional when you are going to
0364 dereference the resulting pointer.
0365
0366
0367 WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
0368 ------------------------------------------------------------
0369
0370 First, please avoid using rcu_dereference_raw() and also please avoid
0371 using rcu_dereference_check() and rcu_dereference_protected() with a
0372 second argument with a constant value of 1 (or true, for that matter).
0373 With that caution out of the way, here is some guidance for which
0374 member of the rcu_dereference() to use in various situations:
0375
0376 1. If the access needs to be within an RCU read-side critical
0377 section, use rcu_dereference(). With the new consolidated
0378 RCU flavors, an RCU read-side critical section is entered
0379 using rcu_read_lock(), anything that disables bottom halves,
0380 anything that disables interrupts, or anything that disables
0381 preemption.
0382
0383 2. If the access might be within an RCU read-side critical section
0384 on the one hand, or protected by (say) my_lock on the other,
0385 use rcu_dereference_check(), for example::
0386
0387 p1 = rcu_dereference_check(p->rcu_protected_pointer,
0388 lockdep_is_held(&my_lock));
0389
0390
0391 3. If the access might be within an RCU read-side critical section
0392 on the one hand, or protected by either my_lock or your_lock on
0393 the other, again use rcu_dereference_check(), for example::
0394
0395 p1 = rcu_dereference_check(p->rcu_protected_pointer,
0396 lockdep_is_held(&my_lock) ||
0397 lockdep_is_held(&your_lock));
0398
0399 4. If the access is on the update side, so that it is always protected
0400 by my_lock, use rcu_dereference_protected()::
0401
0402 p1 = rcu_dereference_protected(p->rcu_protected_pointer,
0403 lockdep_is_held(&my_lock));
0404
0405 This can be extended to handle multiple locks as in #3 above,
0406 and both can be extended to check other conditions as well.
0407
0408 5. If the protection is supplied by the caller, and is thus unknown
0409 to this code, that is the rare case when rcu_dereference_raw()
0410 is appropriate. In addition, rcu_dereference_raw() might be
0411 appropriate when the lockdep expression would be excessively
0412 complex, except that a better approach in that case might be to
0413 take a long hard look at your synchronization design. Still,
0414 there are data-locking cases where any one of a very large number
0415 of locks or reference counters suffices to protect the pointer,
0416 so rcu_dereference_raw() does have its place.
0417
0418 However, its place is probably quite a bit smaller than one
0419 might expect given the number of uses in the current kernel.
0420 Ditto for its synonym, rcu_dereference_check( ... , 1), and
0421 its close relative, rcu_dereference_protected(... , 1).
0422
0423
0424 SPARSE CHECKING OF RCU-PROTECTED POINTERS
0425 -----------------------------------------
0426
0427 The sparse static-analysis tool checks for direct access to RCU-protected
0428 pointers, which can result in "interesting" bugs due to compiler
0429 optimizations involving invented loads and perhaps also load tearing.
0430 For example, suppose someone mistakenly does something like this::
0431
0432 p = q->rcu_protected_pointer;
0433 do_something_with(p->a);
0434 do_something_else_with(p->b);
0435
0436 If register pressure is high, the compiler might optimize "p" out
0437 of existence, transforming the code to something like this::
0438
0439 do_something_with(q->rcu_protected_pointer->a);
0440 do_something_else_with(q->rcu_protected_pointer->b);
0441
0442 This could fatally disappoint your code if q->rcu_protected_pointer
0443 changed in the meantime. Nor is this a theoretical problem: Exactly
0444 this sort of bug cost Paul E. McKenney (and several of his innocent
0445 colleagues) a three-day weekend back in the early 1990s.
0446
0447 Load tearing could of course result in dereferencing a mashup of a pair
0448 of pointers, which also might fatally disappoint your code.
0449
0450 These problems could have been avoided simply by making the code instead
0451 read as follows::
0452
0453 p = rcu_dereference(q->rcu_protected_pointer);
0454 do_something_with(p->a);
0455 do_something_else_with(p->b);
0456
0457 Unfortunately, these sorts of bugs can be extremely hard to spot during
0458 review. This is where the sparse tool comes into play, along with the
0459 "__rcu" marker. If you mark a pointer declaration, whether in a structure
0460 or as a formal parameter, with "__rcu", which tells sparse to complain if
0461 this pointer is accessed directly. It will also cause sparse to complain
0462 if a pointer not marked with "__rcu" is accessed using rcu_dereference()
0463 and friends. For example, ->rcu_protected_pointer might be declared as
0464 follows::
0465
0466 struct foo __rcu *rcu_protected_pointer;
0467
0468 Use of "__rcu" is opt-in. If you choose not to use it, then you should
0469 ignore the sparse warnings.