Back to home page

OSCL-LXR

 
 

    


0001 .. _rcu_dereference_doc:
0002 
0003 PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
0004 ===============================================================
0005 
0006 Most of the time, you can use values from rcu_dereference() or one of
0007 the similar primitives without worries.  Dereferencing (prefix "*"),
0008 field selection ("->"), assignment ("="), address-of ("&"), addition and
0009 subtraction of constants, and casts all work quite naturally and safely.
0010 
0011 It is nevertheless possible to get into trouble with other operations.
0012 Follow these rules to keep your RCU code working properly:
0013 
0014 -       You must use one of the rcu_dereference() family of primitives
0015         to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
0016         will complain.  Worse yet, your code can see random memory-corruption
0017         bugs due to games that compilers and DEC Alpha can play.
0018         Without one of the rcu_dereference() primitives, compilers
0019         can reload the value, and won't your code have fun with two
0020         different values for a single pointer!  Without rcu_dereference(),
0021         DEC Alpha can load a pointer, dereference that pointer, and
0022         return data preceding initialization that preceded the store of
0023         the pointer.
0024 
0025         In addition, the volatile cast in rcu_dereference() prevents the
0026         compiler from deducing the resulting pointer value.  Please see
0027         the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH"
0028         for an example where the compiler can in fact deduce the exact
0029         value of the pointer, and thus cause misordering.
0030 
0031 -       In the special case where data is added but is never removed
0032         while readers are accessing the structure, READ_ONCE() may be used
0033         instead of rcu_dereference().  In this case, use of READ_ONCE()
0034         takes on the role of the lockless_dereference() primitive that
0035         was removed in v4.15.
0036 
0037 -       You are only permitted to use rcu_dereference on pointer values.
0038         The compiler simply knows too much about integral values to
0039         trust it to carry dependencies through integer operations.
0040         There are a very few exceptions, namely that you can temporarily
0041         cast the pointer to uintptr_t in order to:
0042 
0043         -       Set bits and clear bits down in the must-be-zero low-order
0044                 bits of that pointer.  This clearly means that the pointer
0045                 must have alignment constraints, for example, this does
0046                 *not* work in general for char* pointers.
0047 
0048         -       XOR bits to translate pointers, as is done in some
0049                 classic buddy-allocator algorithms.
0050 
0051         It is important to cast the value back to pointer before
0052         doing much of anything else with it.
0053 
0054 -       Avoid cancellation when using the "+" and "-" infix arithmetic
0055         operators.  For example, for a given variable "x", avoid
0056         "(x-(uintptr_t)x)" for char* pointers.  The compiler is within its
0057         rights to substitute zero for this sort of expression, so that
0058         subsequent accesses no longer depend on the rcu_dereference(),
0059         again possibly resulting in bugs due to misordering.
0060 
0061         Of course, if "p" is a pointer from rcu_dereference(), and "a"
0062         and "b" are integers that happen to be equal, the expression
0063         "p+a-b" is safe because its value still necessarily depends on
0064         the rcu_dereference(), thus maintaining proper ordering.
0065 
0066 -       If you are using RCU to protect JITed functions, so that the
0067         "()" function-invocation operator is applied to a value obtained
0068         (directly or indirectly) from rcu_dereference(), you may need to
0069         interact directly with the hardware to flush instruction caches.
0070         This issue arises on some systems when a newly JITed function is
0071         using the same memory that was used by an earlier JITed function.
0072 
0073 -       Do not use the results from relational operators ("==", "!=",
0074         ">", ">=", "<", or "<=") when dereferencing.  For example,
0075         the following (quite strange) code is buggy::
0076 
0077                 int *p;
0078                 int *q;
0079 
0080                 ...
0081 
0082                 p = rcu_dereference(gp)
0083                 q = &global_q;
0084                 q += p > &oom_p;
0085                 r1 = *q;  /* BUGGY!!! */
0086 
0087         As before, the reason this is buggy is that relational operators
0088         are often compiled using branches.  And as before, although
0089         weak-memory machines such as ARM or PowerPC do order stores
0090         after such branches, but can speculate loads, which can again
0091         result in misordering bugs.
0092 
0093 -       Be very careful about comparing pointers obtained from
0094         rcu_dereference() against non-NULL values.  As Linus Torvalds
0095         explained, if the two pointers are equal, the compiler could
0096         substitute the pointer you are comparing against for the pointer
0097         obtained from rcu_dereference().  For example::
0098 
0099                 p = rcu_dereference(gp);
0100                 if (p == &default_struct)
0101                         do_default(p->a);
0102 
0103         Because the compiler now knows that the value of "p" is exactly
0104         the address of the variable "default_struct", it is free to
0105         transform this code into the following::
0106 
0107                 p = rcu_dereference(gp);
0108                 if (p == &default_struct)
0109                         do_default(default_struct.a);
0110 
0111         On ARM and Power hardware, the load from "default_struct.a"
0112         can now be speculated, such that it might happen before the
0113         rcu_dereference().  This could result in bugs due to misordering.
0114 
0115         However, comparisons are OK in the following cases:
0116 
0117         -       The comparison was against the NULL pointer.  If the
0118                 compiler knows that the pointer is NULL, you had better
0119                 not be dereferencing it anyway.  If the comparison is
0120                 non-equal, the compiler is none the wiser.  Therefore,
0121                 it is safe to compare pointers from rcu_dereference()
0122                 against NULL pointers.
0123 
0124         -       The pointer is never dereferenced after being compared.
0125                 Since there are no subsequent dereferences, the compiler
0126                 cannot use anything it learned from the comparison
0127                 to reorder the non-existent subsequent dereferences.
0128                 This sort of comparison occurs frequently when scanning
0129                 RCU-protected circular linked lists.
0130 
0131                 Note that if checks for being within an RCU read-side
0132                 critical section are not required and the pointer is never
0133                 dereferenced, rcu_access_pointer() should be used in place
0134                 of rcu_dereference().
0135 
0136         -       The comparison is against a pointer that references memory
0137                 that was initialized "a long time ago."  The reason
0138                 this is safe is that even if misordering occurs, the
0139                 misordering will not affect the accesses that follow
0140                 the comparison.  So exactly how long ago is "a long
0141                 time ago"?  Here are some possibilities:
0142 
0143                 -       Compile time.
0144 
0145                 -       Boot time.
0146 
0147                 -       Module-init time for module code.
0148 
0149                 -       Prior to kthread creation for kthread code.
0150 
0151                 -       During some prior acquisition of the lock that
0152                         we now hold.
0153 
0154                 -       Before mod_timer() time for a timer handler.
0155 
0156                 There are many other possibilities involving the Linux
0157                 kernel's wide array of primitives that cause code to
0158                 be invoked at a later time.
0159 
0160         -       The pointer being compared against also came from
0161                 rcu_dereference().  In this case, both pointers depend
0162                 on one rcu_dereference() or another, so you get proper
0163                 ordering either way.
0164 
0165                 That said, this situation can make certain RCU usage
0166                 bugs more likely to happen.  Which can be a good thing,
0167                 at least if they happen during testing.  An example
0168                 of such an RCU usage bug is shown in the section titled
0169                 "EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
0170 
0171         -       All of the accesses following the comparison are stores,
0172                 so that a control dependency preserves the needed ordering.
0173                 That said, it is easy to get control dependencies wrong.
0174                 Please see the "CONTROL DEPENDENCIES" section of
0175                 Documentation/memory-barriers.txt for more details.
0176 
0177         -       The pointers are not equal *and* the compiler does
0178                 not have enough information to deduce the value of the
0179                 pointer.  Note that the volatile cast in rcu_dereference()
0180                 will normally prevent the compiler from knowing too much.
0181 
0182                 However, please note that if the compiler knows that the
0183                 pointer takes on only one of two values, a not-equal
0184                 comparison will provide exactly the information that the
0185                 compiler needs to deduce the value of the pointer.
0186 
0187 -       Disable any value-speculation optimizations that your compiler
0188         might provide, especially if you are making use of feedback-based
0189         optimizations that take data collected from prior runs.  Such
0190         value-speculation optimizations reorder operations by design.
0191 
0192         There is one exception to this rule:  Value-speculation
0193         optimizations that leverage the branch-prediction hardware are
0194         safe on strongly ordered systems (such as x86), but not on weakly
0195         ordered systems (such as ARM or Power).  Choose your compiler
0196         command-line options wisely!
0197 
0198 
0199 EXAMPLE OF AMPLIFIED RCU-USAGE BUG
0200 ----------------------------------
0201 
0202 Because updaters can run concurrently with RCU readers, RCU readers can
0203 see stale and/or inconsistent values.  If RCU readers need fresh or
0204 consistent values, which they sometimes do, they need to take proper
0205 precautions.  To see this, consider the following code fragment::
0206 
0207         struct foo {
0208                 int a;
0209                 int b;
0210                 int c;
0211         };
0212         struct foo *gp1;
0213         struct foo *gp2;
0214 
0215         void updater(void)
0216         {
0217                 struct foo *p;
0218 
0219                 p = kmalloc(...);
0220                 if (p == NULL)
0221                         deal_with_it();
0222                 p->a = 42;  /* Each field in its own cache line. */
0223                 p->b = 43;
0224                 p->c = 44;
0225                 rcu_assign_pointer(gp1, p);
0226                 p->b = 143;
0227                 p->c = 144;
0228                 rcu_assign_pointer(gp2, p);
0229         }
0230 
0231         void reader(void)
0232         {
0233                 struct foo *p;
0234                 struct foo *q;
0235                 int r1, r2;
0236 
0237                 p = rcu_dereference(gp2);
0238                 if (p == NULL)
0239                         return;
0240                 r1 = p->b;  /* Guaranteed to get 143. */
0241                 q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */
0242                 if (p == q) {
0243                         /* The compiler decides that q->c is same as p->c. */
0244                         r2 = p->c; /* Could get 44 on weakly order system. */
0245                 }
0246                 do_something_with(r1, r2);
0247         }
0248 
0249 You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,
0250 but you should not be.  After all, the updater might have been invoked
0251 a second time between the time reader() loaded into "r1" and the time
0252 that it loaded into "r2".  The fact that this same result can occur due
0253 to some reordering from the compiler and CPUs is beside the point.
0254 
0255 But suppose that the reader needs a consistent view?
0256 
0257 Then one approach is to use locking, for example, as follows::
0258 
0259         struct foo {
0260                 int a;
0261                 int b;
0262                 int c;
0263                 spinlock_t lock;
0264         };
0265         struct foo *gp1;
0266         struct foo *gp2;
0267 
0268         void updater(void)
0269         {
0270                 struct foo *p;
0271 
0272                 p = kmalloc(...);
0273                 if (p == NULL)
0274                         deal_with_it();
0275                 spin_lock(&p->lock);
0276                 p->a = 42;  /* Each field in its own cache line. */
0277                 p->b = 43;
0278                 p->c = 44;
0279                 spin_unlock(&p->lock);
0280                 rcu_assign_pointer(gp1, p);
0281                 spin_lock(&p->lock);
0282                 p->b = 143;
0283                 p->c = 144;
0284                 spin_unlock(&p->lock);
0285                 rcu_assign_pointer(gp2, p);
0286         }
0287 
0288         void reader(void)
0289         {
0290                 struct foo *p;
0291                 struct foo *q;
0292                 int r1, r2;
0293 
0294                 p = rcu_dereference(gp2);
0295                 if (p == NULL)
0296                         return;
0297                 spin_lock(&p->lock);
0298                 r1 = p->b;  /* Guaranteed to get 143. */
0299                 q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */
0300                 if (p == q) {
0301                         /* The compiler decides that q->c is same as p->c. */
0302                         r2 = p->c; /* Locking guarantees r2 == 144. */
0303                 }
0304                 spin_unlock(&p->lock);
0305                 do_something_with(r1, r2);
0306         }
0307 
0308 As always, use the right tool for the job!
0309 
0310 
0311 EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
0312 -----------------------------------------
0313 
0314 If a pointer obtained from rcu_dereference() compares not-equal to some
0315 other pointer, the compiler normally has no clue what the value of the
0316 first pointer might be.  This lack of knowledge prevents the compiler
0317 from carrying out optimizations that otherwise might destroy the ordering
0318 guarantees that RCU depends on.  And the volatile cast in rcu_dereference()
0319 should prevent the compiler from guessing the value.
0320 
0321 But without rcu_dereference(), the compiler knows more than you might
0322 expect.  Consider the following code fragment::
0323 
0324         struct foo {
0325                 int a;
0326                 int b;
0327         };
0328         static struct foo variable1;
0329         static struct foo variable2;
0330         static struct foo *gp = &variable1;
0331 
0332         void updater(void)
0333         {
0334                 initialize_foo(&variable2);
0335                 rcu_assign_pointer(gp, &variable2);
0336                 /*
0337                  * The above is the only store to gp in this translation unit,
0338                  * and the address of gp is not exported in any way.
0339                  */
0340         }
0341 
0342         int reader(void)
0343         {
0344                 struct foo *p;
0345 
0346                 p = gp;
0347                 barrier();
0348                 if (p == &variable1)
0349                         return p->a; /* Must be variable1.a. */
0350                 else
0351                         return p->b; /* Must be variable2.b. */
0352         }
0353 
0354 Because the compiler can see all stores to "gp", it knows that the only
0355 possible values of "gp" are "variable1" on the one hand and "variable2"
0356 on the other.  The comparison in reader() therefore tells the compiler
0357 the exact value of "p" even in the not-equals case.  This allows the
0358 compiler to make the return values independent of the load from "gp",
0359 in turn destroying the ordering between this load and the loads of the
0360 return values.  This can result in "p->b" returning pre-initialization
0361 garbage values.
0362 
0363 In short, rcu_dereference() is *not* optional when you are going to
0364 dereference the resulting pointer.
0365 
0366 
0367 WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
0368 ------------------------------------------------------------
0369 
0370 First, please avoid using rcu_dereference_raw() and also please avoid
0371 using rcu_dereference_check() and rcu_dereference_protected() with a
0372 second argument with a constant value of 1 (or true, for that matter).
0373 With that caution out of the way, here is some guidance for which
0374 member of the rcu_dereference() to use in various situations:
0375 
0376 1.      If the access needs to be within an RCU read-side critical
0377         section, use rcu_dereference().  With the new consolidated
0378         RCU flavors, an RCU read-side critical section is entered
0379         using rcu_read_lock(), anything that disables bottom halves,
0380         anything that disables interrupts, or anything that disables
0381         preemption.
0382 
0383 2.      If the access might be within an RCU read-side critical section
0384         on the one hand, or protected by (say) my_lock on the other,
0385         use rcu_dereference_check(), for example::
0386 
0387                 p1 = rcu_dereference_check(p->rcu_protected_pointer,
0388                                            lockdep_is_held(&my_lock));
0389 
0390 
0391 3.      If the access might be within an RCU read-side critical section
0392         on the one hand, or protected by either my_lock or your_lock on
0393         the other, again use rcu_dereference_check(), for example::
0394 
0395                 p1 = rcu_dereference_check(p->rcu_protected_pointer,
0396                                            lockdep_is_held(&my_lock) ||
0397                                            lockdep_is_held(&your_lock));
0398 
0399 4.      If the access is on the update side, so that it is always protected
0400         by my_lock, use rcu_dereference_protected()::
0401 
0402                 p1 = rcu_dereference_protected(p->rcu_protected_pointer,
0403                                                lockdep_is_held(&my_lock));
0404 
0405         This can be extended to handle multiple locks as in #3 above,
0406         and both can be extended to check other conditions as well.
0407 
0408 5.      If the protection is supplied by the caller, and is thus unknown
0409         to this code, that is the rare case when rcu_dereference_raw()
0410         is appropriate.  In addition, rcu_dereference_raw() might be
0411         appropriate when the lockdep expression would be excessively
0412         complex, except that a better approach in that case might be to
0413         take a long hard look at your synchronization design.  Still,
0414         there are data-locking cases where any one of a very large number
0415         of locks or reference counters suffices to protect the pointer,
0416         so rcu_dereference_raw() does have its place.
0417 
0418         However, its place is probably quite a bit smaller than one
0419         might expect given the number of uses in the current kernel.
0420         Ditto for its synonym, rcu_dereference_check( ... , 1), and
0421         its close relative, rcu_dereference_protected(... , 1).
0422 
0423 
0424 SPARSE CHECKING OF RCU-PROTECTED POINTERS
0425 -----------------------------------------
0426 
0427 The sparse static-analysis tool checks for direct access to RCU-protected
0428 pointers, which can result in "interesting" bugs due to compiler
0429 optimizations involving invented loads and perhaps also load tearing.
0430 For example, suppose someone mistakenly does something like this::
0431 
0432         p = q->rcu_protected_pointer;
0433         do_something_with(p->a);
0434         do_something_else_with(p->b);
0435 
0436 If register pressure is high, the compiler might optimize "p" out
0437 of existence, transforming the code to something like this::
0438 
0439         do_something_with(q->rcu_protected_pointer->a);
0440         do_something_else_with(q->rcu_protected_pointer->b);
0441 
0442 This could fatally disappoint your code if q->rcu_protected_pointer
0443 changed in the meantime.  Nor is this a theoretical problem:  Exactly
0444 this sort of bug cost Paul E. McKenney (and several of his innocent
0445 colleagues) a three-day weekend back in the early 1990s.
0446 
0447 Load tearing could of course result in dereferencing a mashup of a pair
0448 of pointers, which also might fatally disappoint your code.
0449 
0450 These problems could have been avoided simply by making the code instead
0451 read as follows::
0452 
0453         p = rcu_dereference(q->rcu_protected_pointer);
0454         do_something_with(p->a);
0455         do_something_else_with(p->b);
0456 
0457 Unfortunately, these sorts of bugs can be extremely hard to spot during
0458 review.  This is where the sparse tool comes into play, along with the
0459 "__rcu" marker.  If you mark a pointer declaration, whether in a structure
0460 or as a formal parameter, with "__rcu", which tells sparse to complain if
0461 this pointer is accessed directly.  It will also cause sparse to complain
0462 if a pointer not marked with "__rcu" is accessed using rcu_dereference()
0463 and friends.  For example, ->rcu_protected_pointer might be declared as
0464 follows::
0465 
0466         struct foo __rcu *rcu_protected_pointer;
0467 
0468 Use of "__rcu" is opt-in.  If you choose not to use it, then you should
0469 ignore the sparse warnings.