Documentation/RCU/whatisRCU.rst

0001 .. _whatisrcu_doc:
0002
0003 What is RCU?  --  "Read, Copy, Update"
0004 ======================================
0005
0006 Please note that the "What is RCU?" LWN series is an excellent place
0007 to start learning about RCU:
0008
0009 | 1.    What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
0010 | 2.    What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
0011 | 3.    RCU part 3: the RCU API      http://lwn.net/Articles/264090/
0012 | 4.    The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
0013 |       2010 Big API Table           http://lwn.net/Articles/419086/
0014 | 5.    The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
0015 |       2014 Big API Table           http://lwn.net/Articles/609973/
0016
0017
0018 What is RCU?
0019
0020 RCU is a synchronization mechanism that was added to the Linux kernel
0021 during the 2.5 development effort that is optimized for read-mostly
0022 situations.  Although RCU is actually quite simple once you understand it,
0023 getting there can sometimes be a challenge.  Part of the problem is that
0024 most of the past descriptions of RCU have been written with the mistaken
0025 assumption that there is "one true way" to describe RCU.  Instead,
0026 the experience has been that different people must take different paths
0027 to arrive at an understanding of RCU.  This document provides several
0028 different paths, as follows:
0029
0030 :ref:`1.        RCU OVERVIEW <1_whatisRCU>`
0031
0032 :ref:`2.        WHAT IS RCU'S CORE API? <2_whatisRCU>`
0033
0034 :ref:`3.        WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
0035
0036 :ref:`4.        WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
0037
0038 :ref:`5.        WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
0039
0040 :ref:`6.        ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
0041
0042 :ref:`7.        ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
0043
0044 :ref:`8.        FULL LIST OF RCU APIs <8_whatisRCU>`
0045
0046 :ref:`9.        ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
0047
0048 People who prefer starting with a conceptual overview should focus on
0049 Section 1, though most readers will profit by reading this section at
0050 some point.  People who prefer to start with an API that they can then
0051 experiment with should focus on Section 2.  People who prefer to start
0052 with example uses should focus on Sections 3 and 4.  People who need to
0053 understand the RCU implementation should focus on Section 5, then dive
0054 into the kernel source code.  People who reason best by analogy should
0055 focus on Section 6.  Section 7 serves as an index to the docbook API
0056 documentation, and Section 8 is the traditional answer key.
0057
0058 So, start with the section that makes the most sense to you and your
0059 preferred method of learning.  If you need to know everything about
0060 everything, feel free to read the whole thing -- but if you are really
0061 that type of person, you have perused the source code and will therefore
0062 never need this document anyway.  ;-)
0063
0064 .. _1_whatisRCU:
0065
0066 1.  RCU OVERVIEW
0067 ----------------
0068
0069 The basic idea behind RCU is to split updates into "removal" and
0070 "reclamation" phases.  The removal phase removes references to data items
0071 within a data structure (possibly by replacing them with references to
0072 new versions of these data items), and can run concurrently with readers.
0073 The reason that it is safe to run the removal phase concurrently with
0074 readers is the semantics of modern CPUs guarantee that readers will see
0075 either the old or the new version of the data structure rather than a
0076 partially updated reference.  The reclamation phase does the work of reclaiming
0077 (e.g., freeing) the data items removed from the data structure during the
0078 removal phase.  Because reclaiming data items can disrupt any readers
0079 concurrently referencing those data items, the reclamation phase must
0080 not start until readers no longer hold references to those data items.
0081
0082 Splitting the update into removal and reclamation phases permits the
0083 updater to perform the removal phase immediately, and to defer the
0084 reclamation phase until all readers active during the removal phase have
0085 completed, either by blocking until they finish or by registering a
0086 callback that is invoked after they finish.  Only readers that are active
0087 during the removal phase need be considered, because any reader starting
0088 after the removal phase will be unable to gain a reference to the removed
0089 data items, and therefore cannot be disrupted by the reclamation phase.
0090
0091 So the typical RCU update sequence goes something like the following:
0092
0093 a.      Remove pointers to a data structure, so that subsequent
0094         readers cannot gain a reference to it.
0095
0096 b.      Wait for all previous readers to complete their RCU read-side
0097         critical sections.
0098
0099 c.      At this point, there cannot be any readers who hold references
0100         to the data structure, so it now may safely be reclaimed
0101         (e.g., kfree()d).
0102
0103 Step (b) above is the key idea underlying RCU's deferred destruction.
0104 The ability to wait until all readers are done allows RCU readers to
0105 use much lighter-weight synchronization, in some cases, absolutely no
0106 synchronization at all.  In contrast, in more conventional lock-based
0107 schemes, readers must use heavy-weight synchronization in order to
0108 prevent an updater from deleting the data structure out from under them.
0109 This is because lock-based updaters typically update data items in place,
0110 and must therefore exclude readers.  In contrast, RCU-based updaters
0111 typically take advantage of the fact that writes to single aligned
0112 pointers are atomic on modern CPUs, allowing atomic insertion, removal,
0113 and replacement of data items in a linked structure without disrupting
0114 readers.  Concurrent RCU readers can then continue accessing the old
0115 versions, and can dispense with the atomic operations, memory barriers,
0116 and communications cache misses that are so expensive on present-day
0117 SMP computer systems, even in absence of lock contention.
0118
0119 In the three-step procedure shown above, the updater is performing both
0120 the removal and the reclamation step, but it is often helpful for an
0121 entirely different thread to do the reclamation, as is in fact the case
0122 in the Linux kernel's directory-entry cache (dcache).  Even if the same
0123 thread performs both the update step (step (a) above) and the reclamation
0124 step (step (c) above), it is often helpful to think of them separately.
0125 For example, RCU readers and updaters need not communicate at all,
0126 but RCU provides implicit low-overhead communication between readers
0127 and reclaimers, namely, in step (b) above.
0128
0129 So how the heck can a reclaimer tell when a reader is done, given
0130 that readers are not doing any sort of synchronization operations???
0131 Read on to learn about how RCU's API makes this easy.
0132
0133 .. _2_whatisRCU:
0134
0135 2.  WHAT IS RCU'S CORE API?
0136 ---------------------------
0137
0138 The core RCU API is quite small:
0139
0140 a.      rcu_read_lock()
0141 b.      rcu_read_unlock()
0142 c.      synchronize_rcu() / call_rcu()
0143 d.      rcu_assign_pointer()
0144 e.      rcu_dereference()
0145
0146 There are many other members of the RCU API, but the rest can be
0147 expressed in terms of these five, though most implementations instead
0148 express synchronize_rcu() in terms of the call_rcu() callback API.
0149
0150 The five core RCU APIs are described below, the other 18 will be enumerated
0151 later.  See the kernel docbook documentation for more info, or look directly
0152 at the function header comments.
0153
0154 rcu_read_lock()
0155 ^^^^^^^^^^^^^^^
0156         void rcu_read_lock(void);
0157
0158         Used by a reader to inform the reclaimer that the reader is
0159         entering an RCU read-side critical section.  It is illegal
0160         to block while in an RCU read-side critical section, though
0161         kernels built with CONFIG_PREEMPT_RCU can preempt RCU
0162         read-side critical sections.  Any RCU-protected data structure
0163         accessed during an RCU read-side critical section is guaranteed to
0164         remain unreclaimed for the full duration of that critical section.
0165         Reference counts may be used in conjunction with RCU to maintain
0166         longer-term references to data structures.
0167
0168 rcu_read_unlock()
0169 ^^^^^^^^^^^^^^^^^
0170         void rcu_read_unlock(void);
0171
0172         Used by a reader to inform the reclaimer that the reader is
0173         exiting an RCU read-side critical section.  Note that RCU
0174         read-side critical sections may be nested and/or overlapping.
0175
0176 synchronize_rcu()
0177 ^^^^^^^^^^^^^^^^^
0178         void synchronize_rcu(void);
0179
0180         Marks the end of updater code and the beginning of reclaimer
0181         code.  It does this by blocking until all pre-existing RCU
0182         read-side critical sections on all CPUs have completed.
0183         Note that synchronize_rcu() will **not** necessarily wait for
0184         any subsequent RCU read-side critical sections to complete.
0185         For example, consider the following sequence of events::
0186
0187                  CPU 0                  CPU 1                 CPU 2
0188              ----------------- ------------------------- ---------------
0189          1.  rcu_read_lock()
0190          2.                    enters synchronize_rcu()
0191          3.                                               rcu_read_lock()
0192          4.  rcu_read_unlock()
0193          5.                     exits synchronize_rcu()
0194          6.                                              rcu_read_unlock()
0195
0196         To reiterate, synchronize_rcu() waits only for ongoing RCU
0197         read-side critical sections to complete, not necessarily for
0198         any that begin after synchronize_rcu() is invoked.
0199
0200         Of course, synchronize_rcu() does not necessarily return
0201         **immediately** after the last pre-existing RCU read-side critical
0202         section completes.  For one thing, there might well be scheduling
0203         delays.  For another thing, many RCU implementations process
0204         requests in batches in order to improve efficiencies, which can
0205         further delay synchronize_rcu().
0206
0207         Since synchronize_rcu() is the API that must figure out when
0208         readers are done, its implementation is key to RCU.  For RCU
0209         to be useful in all but the most read-intensive situations,
0210         synchronize_rcu()'s overhead must also be quite small.
0211
0212         The call_rcu() API is a callback form of synchronize_rcu(),
0213         and is described in more detail in a later section.  Instead of
0214         blocking, it registers a function and argument which are invoked
0215         after all ongoing RCU read-side critical sections have completed.
0216         This callback variant is particularly useful in situations where
0217         it is illegal to block or where update-side performance is
0218         critically important.
0219
0220         However, the call_rcu() API should not be used lightly, as use
0221         of the synchronize_rcu() API generally results in simpler code.
0222         In addition, the synchronize_rcu() API has the nice property
0223         of automatically limiting update rate should grace periods
0224         be delayed.  This property results in system resilience in face
0225         of denial-of-service attacks.  Code using call_rcu() should limit
0226         update rate in order to gain this same sort of resilience.  See
0227         checklist.rst for some approaches to limiting the update rate.
0228
0229 rcu_assign_pointer()
0230 ^^^^^^^^^^^^^^^^^^^^
0231         void rcu_assign_pointer(p, typeof(p) v);
0232
0233         Yes, rcu_assign_pointer() **is** implemented as a macro, though it
0234         would be cool to be able to declare a function in this manner.
0235         (Compiler experts will no doubt disagree.)
0236
0237         The updater uses this function to assign a new value to an
0238         RCU-protected pointer, in order to safely communicate the change
0239         in value from the updater to the reader.  This macro does not
0240         evaluate to an rvalue, but it does execute any memory-barrier
0241         instructions required for a given CPU architecture.
0242
0243         Perhaps just as important, it serves to document (1) which
0244         pointers are protected by RCU and (2) the point at which a
0245         given structure becomes accessible to other CPUs.  That said,
0246         rcu_assign_pointer() is most frequently used indirectly, via
0247         the _rcu list-manipulation primitives such as list_add_rcu().
0248
0249 rcu_dereference()
0250 ^^^^^^^^^^^^^^^^^
0251         typeof(p) rcu_dereference(p);
0252
0253         Like rcu_assign_pointer(), rcu_dereference() must be implemented
0254         as a macro.
0255
0256         The reader uses rcu_dereference() to fetch an RCU-protected
0257         pointer, which returns a value that may then be safely
0258         dereferenced.  Note that rcu_dereference() does not actually
0259         dereference the pointer, instead, it protects the pointer for
0260         later dereferencing.  It also executes any needed memory-barrier
0261         instructions for a given CPU architecture.  Currently, only Alpha
0262         needs memory barriers within rcu_dereference() -- on other CPUs,
0263         it compiles to nothing, not even a compiler directive.
0264
0265         Common coding practice uses rcu_dereference() to copy an
0266         RCU-protected pointer to a local variable, then dereferences
0267         this local variable, for example as follows::
0268
0269                 p = rcu_dereference(head.next);
0270                 return p->data;
0271
0272         However, in this case, one could just as easily combine these
0273         into one statement::
0274
0275                 return rcu_dereference(head.next)->data;
0276
0277         If you are going to be fetching multiple fields from the
0278         RCU-protected structure, using the local variable is of
0279         course preferred.  Repeated rcu_dereference() calls look
0280         ugly, do not guarantee that the same pointer will be returned
0281         if an update happened while in the critical section, and incur
0282         unnecessary overhead on Alpha CPUs.
0283
0284         Note that the value returned by rcu_dereference() is valid
0285         only within the enclosing RCU read-side critical section [1]_.
0286         For example, the following is **not** legal::
0287
0288                 rcu_read_lock();
0289                 p = rcu_dereference(head.next);
0290                 rcu_read_unlock();
0291                 x = p->address; /* BUG!!! */
0292                 rcu_read_lock();
0293                 y = p->data;    /* BUG!!! */
0294                 rcu_read_unlock();
0295
0296         Holding a reference from one RCU read-side critical section
0297         to another is just as illegal as holding a reference from
0298         one lock-based critical section to another!  Similarly,
0299         using a reference outside of the critical section in which
0300         it was acquired is just as illegal as doing so with normal
0301         locking.
0302
0303         As with rcu_assign_pointer(), an important function of
0304         rcu_dereference() is to document which pointers are protected by
0305         RCU, in particular, flagging a pointer that is subject to changing
0306         at any time, including immediately after the rcu_dereference().
0307         And, again like rcu_assign_pointer(), rcu_dereference() is
0308         typically used indirectly, via the _rcu list-manipulation
0309         primitives, such as list_for_each_entry_rcu() [2]_.
0310
0311 ..      [1] The variant rcu_dereference_protected() can be used outside
0312         of an RCU read-side critical section as long as the usage is
0313         protected by locks acquired by the update-side code.  This variant
0314         avoids the lockdep warning that would happen when using (for
0315         example) rcu_dereference() without rcu_read_lock() protection.
0316         Using rcu_dereference_protected() also has the advantage
0317         of permitting compiler optimizations that rcu_dereference()
0318         must prohibit.  The rcu_dereference_protected() variant takes
0319         a lockdep expression to indicate which locks must be acquired
0320         by the caller. If the indicated protection is not provided,
0321         a lockdep splat is emitted.  See Design/Requirements/Requirements.rst
0322         and the API's code comments for more details and example usage.
0323
0324 ..      [2] If the list_for_each_entry_rcu() instance might be used by
0325         update-side code as well as by RCU readers, then an additional
0326         lockdep expression can be added to its list of arguments.
0327         For example, given an additional "lock_is_held(&mylock)" argument,
0328         the RCU lockdep code would complain only if this instance was
0329         invoked outside of an RCU read-side critical section and without
0330         the protection of mylock.
0331
0332 The following diagram shows how each API communicates among the
0333 reader, updater, and reclaimer.
0334 ::
0335
0336
0337             rcu_assign_pointer()
0338                                     +--------+
0339             +---------------------->| reader |---------+
0340             |                       +--------+         |
0341             |                           |              |
0342             |                           |              | Protect:
0343             |                           |              | rcu_read_lock()
0344             |                           |              | rcu_read_unlock()
0345             |        rcu_dereference()  |              |
0346             +---------+                 |              |
0347             | updater |<----------------+              |
0348             +---------+                                V
0349             |                                    +-----------+
0350             +----------------------------------->| reclaimer |
0351                                                  +-----------+
0352               Defer:
0353               synchronize_rcu() & call_rcu()
0354
0355
0356 The RCU infrastructure observes the time sequence of rcu_read_lock(),
0357 rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
0358 order to determine when (1) synchronize_rcu() invocations may return
0359 to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
0360 implementations of the RCU infrastructure make heavy use of batching in
0361 order to amortize their overhead over many uses of the corresponding APIs.
0362
0363 There are at least three flavors of RCU usage in the Linux kernel. The diagram
0364 above shows the most common one. On the updater side, the rcu_assign_pointer(),
0365 synchronize_rcu() and call_rcu() primitives used are the same for all three
0366 flavors. However for protection (on the reader side), the primitives used vary
0367 depending on the flavor:
0368
0369 a.      rcu_read_lock() / rcu_read_unlock()
0370         rcu_dereference()
0371
0372 b.      rcu_read_lock_bh() / rcu_read_unlock_bh()
0373         local_bh_disable() / local_bh_enable()
0374         rcu_dereference_bh()
0375
0376 c.      rcu_read_lock_sched() / rcu_read_unlock_sched()
0377         preempt_disable() / preempt_enable()
0378         local_irq_save() / local_irq_restore()
0379         hardirq enter / hardirq exit
0380         NMI enter / NMI exit
0381         rcu_dereference_sched()
0382
0383 These three flavors are used as follows:
0384
0385 a.      RCU applied to normal data structures.
0386
0387 b.      RCU applied to networking data structures that may be subjected
0388         to remote denial-of-service attacks.
0389
0390 c.      RCU applied to scheduler and interrupt/NMI-handler tasks.
0391
0392 Again, most uses will be of (a).  The (b) and (c) cases are important
0393 for specialized uses, but are relatively uncommon.
0394
0395 .. _3_whatisRCU:
0396
0397 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
0398 -----------------------------------------------
0399
0400 This section shows a simple use of the core RCU API to protect a
0401 global pointer to a dynamically allocated structure.  More-typical
0402 uses of RCU may be found in listRCU.rst, arrayRCU.rst, and NMI-RCU.rst.
0403 ::
0404
0405         struct foo {
0406                 int a;
0407                 char b;
0408                 long c;
0409         };
0410         DEFINE_SPINLOCK(foo_mutex);
0411
0412         struct foo __rcu *gbl_foo;
0413
0414         /*
0415          * Create a new struct foo that is the same as the one currently
0416          * pointed to by gbl_foo, except that field "a" is replaced
0417          * with "new_a".  Points gbl_foo to the new structure, and
0418          * frees up the old structure after a grace period.
0419          *
0420          * Uses rcu_assign_pointer() to ensure that concurrent readers
0421          * see the initialized version of the new structure.
0422          *
0423          * Uses synchronize_rcu() to ensure that any readers that might
0424          * have references to the old structure complete before freeing
0425          * the old structure.
0426          */
0427         void foo_update_a(int new_a)
0428         {
0429                 struct foo *new_fp;
0430                 struct foo *old_fp;
0431
0432                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
0433                 spin_lock(&foo_mutex);
0434                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
0435                 *new_fp = *old_fp;
0436                 new_fp->a = new_a;
0437                 rcu_assign_pointer(gbl_foo, new_fp);
0438                 spin_unlock(&foo_mutex);
0439                 synchronize_rcu();
0440                 kfree(old_fp);
0441         }
0442
0443         /*
0444          * Return the value of field "a" of the current gbl_foo
0445          * structure.  Use rcu_read_lock() and rcu_read_unlock()
0446          * to ensure that the structure does not get deleted out
0447          * from under us, and use rcu_dereference() to ensure that
0448          * we see the initialized version of the structure (important
0449          * for DEC Alpha and for people reading the code).
0450          */
0451         int foo_get_a(void)
0452         {
0453                 int retval;
0454
0455                 rcu_read_lock();
0456                 retval = rcu_dereference(gbl_foo)->a;
0457                 rcu_read_unlock();
0458                 return retval;
0459         }
0460
0461 So, to sum up:
0462
0463 -       Use rcu_read_lock() and rcu_read_unlock() to guard RCU
0464         read-side critical sections.
0465
0466 -       Within an RCU read-side critical section, use rcu_dereference()
0467         to dereference RCU-protected pointers.
0468
0469 -       Use some solid scheme (such as locks or semaphores) to
0470         keep concurrent updates from interfering with each other.
0471
0472 -       Use rcu_assign_pointer() to update an RCU-protected pointer.
0473         This primitive protects concurrent readers from the updater,
0474         **not** concurrent updates from each other!  You therefore still
0475         need to use locking (or something similar) to keep concurrent
0476         rcu_assign_pointer() primitives from interfering with each other.
0477
0478 -       Use synchronize_rcu() **after** removing a data element from an
0479         RCU-protected data structure, but **before** reclaiming/freeing
0480         the data element, in order to wait for the completion of all
0481         RCU read-side critical sections that might be referencing that
0482         data item.
0483
0484 See checklist.rst for additional rules to follow when using RCU.
0485 And again, more-typical uses of RCU may be found in listRCU.rst,
0486 arrayRCU.rst, and NMI-RCU.rst.
0487
0488 .. _4_whatisRCU:
0489
0490 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
0491 --------------------------------------------
0492
0493 In the example above, foo_update_a() blocks until a grace period elapses.
0494 This is quite simple, but in some cases one cannot afford to wait so
0495 long -- there might be other high-priority work to be done.
0496
0497 In such cases, one uses call_rcu() rather than synchronize_rcu().
0498 The call_rcu() API is as follows::
0499
0500         void call_rcu(struct rcu_head *head, rcu_callback_t func);
0501
0502 This function invokes func(head) after a grace period has elapsed.
0503 This invocation might happen from either softirq or process context,
0504 so the function is not permitted to block.  The foo struct needs to
0505 have an rcu_head structure added, perhaps as follows::
0506
0507         struct foo {
0508                 int a;
0509                 char b;
0510                 long c;
0511                 struct rcu_head rcu;
0512         };
0513
0514 The foo_update_a() function might then be written as follows::
0515
0516         /*
0517          * Create a new struct foo that is the same as the one currently
0518          * pointed to by gbl_foo, except that field "a" is replaced
0519          * with "new_a".  Points gbl_foo to the new structure, and
0520          * frees up the old structure after a grace period.
0521          *
0522          * Uses rcu_assign_pointer() to ensure that concurrent readers
0523          * see the initialized version of the new structure.
0524          *
0525          * Uses call_rcu() to ensure that any readers that might have
0526          * references to the old structure complete before freeing the
0527          * old structure.
0528          */
0529         void foo_update_a(int new_a)
0530         {
0531                 struct foo *new_fp;
0532                 struct foo *old_fp;
0533
0534                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
0535                 spin_lock(&foo_mutex);
0536                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
0537                 *new_fp = *old_fp;
0538                 new_fp->a = new_a;
0539                 rcu_assign_pointer(gbl_foo, new_fp);
0540                 spin_unlock(&foo_mutex);
0541                 call_rcu(&old_fp->rcu, foo_reclaim);
0542         }
0543
0544 The foo_reclaim() function might appear as follows::
0545
0546         void foo_reclaim(struct rcu_head *rp)
0547         {
0548                 struct foo *fp = container_of(rp, struct foo, rcu);
0549
0550                 foo_cleanup(fp->a);
0551
0552                 kfree(fp);
0553         }
0554
0555 The container_of() primitive is a macro that, given a pointer into a
0556 struct, the type of the struct, and the pointed-to field within the
0557 struct, returns a pointer to the beginning of the struct.
0558
0559 The use of call_rcu() permits the caller of foo_update_a() to
0560 immediately regain control, without needing to worry further about the
0561 old version of the newly updated element.  It also clearly shows the
0562 RCU distinction between updater, namely foo_update_a(), and reclaimer,
0563 namely foo_reclaim().
0564
0565 The summary of advice is the same as for the previous section, except
0566 that we are now using call_rcu() rather than synchronize_rcu():
0567
0568 -       Use call_rcu() **after** removing a data element from an
0569         RCU-protected data structure in order to register a callback
0570         function that will be invoked after the completion of all RCU
0571         read-side critical sections that might be referencing that
0572         data item.
0573
0574 If the callback for call_rcu() is not doing anything more than calling
0575 kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
0576 to avoid having to write your own callback::
0577
0578         kfree_rcu(old_fp, rcu);
0579
0580 Again, see checklist.rst for additional rules governing the use of RCU.
0581
0582 .. _5_whatisRCU:
0583
0584 5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
0585 ------------------------------------------------
0586
0587 One of the nice things about RCU is that it has extremely simple "toy"
0588 implementations that are a good first step towards understanding the
0589 production-quality implementations in the Linux kernel.  This section
0590 presents two such "toy" implementations of RCU, one that is implemented
0591 in terms of familiar locking primitives, and another that more closely
0592 resembles "classic" RCU.  Both are way too simple for real-world use,
0593 lacking both functionality and performance.  However, they are useful
0594 in getting a feel for how RCU works.  See kernel/rcu/update.c for a
0595 production-quality implementation, and see:
0596
0597         http://www.rdrop.com/users/paulmck/RCU
0598
0599 for papers describing the Linux kernel RCU implementation.  The OLS'01
0600 and OLS'02 papers are a good introduction, and the dissertation provides
0601 more details on the current implementation as of early 2004.
0602
0603
0604 5A.  "TOY" IMPLEMENTATION #1: LOCKING
0605 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0606 This section presents a "toy" RCU implementation that is based on
0607 familiar locking primitives.  Its overhead makes it a non-starter for
0608 real-life use, as does its lack of scalability.  It is also unsuitable
0609 for realtime use, since it allows scheduling latency to "bleed" from
0610 one read-side critical section to another.  It also assumes recursive
0611 reader-writer locks:  If you try this with non-recursive locks, and
0612 you allow nested rcu_read_lock() calls, you can deadlock.
0613
0614 However, it is probably the easiest implementation to relate to, so is
0615 a good starting point.
0616
0617 It is extremely simple::
0618
0619         static DEFINE_RWLOCK(rcu_gp_mutex);
0620
0621         void rcu_read_lock(void)
0622         {
0623                 read_lock(&rcu_gp_mutex);
0624         }
0625
0626         void rcu_read_unlock(void)
0627         {
0628                 read_unlock(&rcu_gp_mutex);
0629         }
0630
0631         void synchronize_rcu(void)
0632         {
0633                 write_lock(&rcu_gp_mutex);
0634                 smp_mb__after_spinlock();
0635                 write_unlock(&rcu_gp_mutex);
0636         }
0637
0638 [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
0639 much.  But here are simplified versions anyway.  And whatever you do,
0640 don't forget about them when submitting patches making use of RCU!]::
0641
0642         #define rcu_assign_pointer(p, v) \
0643         ({ \
0644                 smp_store_release(&(p), (v)); \
0645         })
0646
0647         #define rcu_dereference(p) \
0648         ({ \
0649                 typeof(p) _________p1 = READ_ONCE(p); \
0650                 (_________p1); \
0651         })
0652
0653
0654 The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
0655 and release a global reader-writer lock.  The synchronize_rcu()
0656 primitive write-acquires this same lock, then releases it.  This means
0657 that once synchronize_rcu() exits, all RCU read-side critical sections
0658 that were in progress before synchronize_rcu() was called are guaranteed
0659 to have completed -- there is no way that synchronize_rcu() would have
0660 been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
0661 promotes synchronize_rcu() to a full memory barrier in compliance with
0662 the "Memory-Barrier Guarantees" listed in:
0663
0664         Design/Requirements/Requirements.rst
0665
0666 It is possible to nest rcu_read_lock(), since reader-writer locks may
0667 be recursively acquired.  Note also that rcu_read_lock() is immune
0668 from deadlock (an important property of RCU).  The reason for this is
0669 that the only thing that can block rcu_read_lock() is a synchronize_rcu().
0670 But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
0671 so there can be no deadlock cycle.
0672
0673 .. _quiz_1:
0674
0675 Quick Quiz #1:
0676                 Why is this argument naive?  How could a deadlock
0677                 occur when using this algorithm in a real-world Linux
0678                 kernel?  How could this deadlock be avoided?
0679
0680 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0681
0682 5B.  "TOY" EXAMPLE #2: CLASSIC RCU
0683 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0684 This section presents a "toy" RCU implementation that is based on
0685 "classic RCU".  It is also short on performance (but only for updates) and
0686 on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
0687 kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
0688 are the same as those shown in the preceding section, so they are omitted.
0689 ::
0690
0691         void rcu_read_lock(void) { }
0692
0693         void rcu_read_unlock(void) { }
0694
0695         void synchronize_rcu(void)
0696         {
0697                 int cpu;
0698
0699                 for_each_possible_cpu(cpu)
0700                         run_on(cpu);
0701         }
0702
0703 Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
0704 This is the great strength of classic RCU in a non-preemptive kernel:
0705 read-side overhead is precisely zero, at least on non-Alpha CPUs.
0706 And there is absolutely no way that rcu_read_lock() can possibly
0707 participate in a deadlock cycle!
0708
0709 The implementation of synchronize_rcu() simply schedules itself on each
0710 CPU in turn.  The run_on() primitive can be implemented straightforwardly
0711 in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
0712 "toy" implementation would restore the affinity upon completion rather
0713 than just leaving all tasks running on the last CPU, but when I said
0714 "toy", I meant **toy**!
0715
0716 So how the heck is this supposed to work???
0717
0718 Remember that it is illegal to block while in an RCU read-side critical
0719 section.  Therefore, if a given CPU executes a context switch, we know
0720 that it must have completed all preceding RCU read-side critical sections.
0721 Once **all** CPUs have executed a context switch, then **all** preceding
0722 RCU read-side critical sections will have completed.
0723
0724 So, suppose that we remove a data item from its structure and then invoke
0725 synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
0726 that there are no RCU read-side critical sections holding a reference
0727 to that data item, so we can safely reclaim it.
0728
0729 .. _quiz_2:
0730
0731 Quick Quiz #2:
0732                 Give an example where Classic RCU's read-side
0733                 overhead is **negative**.
0734
0735 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0736
0737 .. _quiz_3:
0738
0739 Quick Quiz #3:
0740                 If it is illegal to block in an RCU read-side
0741                 critical section, what the heck do you do in
0742                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
0743
0744 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0745
0746 .. _6_whatisRCU:
0747
0748 6.  ANALOGY WITH READER-WRITER LOCKING
0749 --------------------------------------
0750
0751 Although RCU can be used in many different ways, a very common use of
0752 RCU is analogous to reader-writer locking.  The following unified
0753 diff shows how closely related RCU and reader-writer locking can be.
0754 ::
0755
0756         @@ -5,5 +5,5 @@ struct el {
0757                 int data;
0758                 /* Other data fields */
0759          };
0760         -rwlock_t listmutex;
0761         +spinlock_t listmutex;
0762          struct el head;
0763
0764         @@ -13,15 +14,15 @@
0765                 struct list_head *lp;
0766                 struct el *p;
0767
0768         -       read_lock(&listmutex);
0769         -       list_for_each_entry(p, head, lp) {
0770         +       rcu_read_lock();
0771         +       list_for_each_entry_rcu(p, head, lp) {
0772                         if (p->key == key) {
0773                                 *result = p->data;
0774         -                       read_unlock(&listmutex);
0775         +                       rcu_read_unlock();
0776                                 return 1;
0777                         }
0778                 }
0779         -       read_unlock(&listmutex);
0780         +       rcu_read_unlock();
0781                 return 0;
0782          }
0783
0784         @@ -29,15 +30,16 @@
0785          {
0786                 struct el *p;
0787
0788         -       write_lock(&listmutex);
0789         +       spin_lock(&listmutex);
0790                 list_for_each_entry(p, head, lp) {
0791                         if (p->key == key) {
0792         -                       list_del(&p->list);
0793         -                       write_unlock(&listmutex);
0794         +                       list_del_rcu(&p->list);
0795         +                       spin_unlock(&listmutex);
0796         +                       synchronize_rcu();
0797                                 kfree(p);
0798                                 return 1;
0799                         }
0800                 }
0801         -       write_unlock(&listmutex);
0802         +       spin_unlock(&listmutex);
0803                 return 0;
0804          }
0805
0806 Or, for those who prefer a side-by-side listing::
0807
0808  1 struct el {                          1 struct el {
0809  2   struct list_head list;             2   struct list_head list;
0810  3   long key;                          3   long key;
0811  4   spinlock_t mutex;                  4   spinlock_t mutex;
0812  5   int data;                          5   int data;
0813  6   /* Other data fields */            6   /* Other data fields */
0814  7 };                                   7 };
0815  8 rwlock_t listmutex;                  8 spinlock_t listmutex;
0816  9 struct el head;                      9 struct el head;
0817
0818 ::
0819
0820   1 int search(long key, int *result)    1 int search(long key, int *result)
0821   2 {                                    2 {
0822   3   struct list_head *lp;              3   struct list_head *lp;
0823   4   struct el *p;                      4   struct el *p;
0824   5                                      5
0825   6   read_lock(&listmutex);             6   rcu_read_lock();
0826   7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
0827   8     if (p->key == key) {             8     if (p->key == key) {
0828   9       *result = p->data;             9       *result = p->data;
0829  10       read_unlock(&listmutex);      10       rcu_read_unlock();
0830  11       return 1;                     11       return 1;
0831  12     }                               12     }
0832  13   }                                 13   }
0833  14   read_unlock(&listmutex);          14   rcu_read_unlock();
0834  15   return 0;                         15   return 0;
0835  16 }                                   16 }
0836
0837 ::
0838
0839   1 int delete(long key)                 1 int delete(long key)
0840   2 {                                    2 {
0841   3   struct el *p;                      3   struct el *p;
0842   4                                      4
0843   5   write_lock(&listmutex);            5   spin_lock(&listmutex);
0844   6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
0845   7     if (p->key == key) {             7     if (p->key == key) {
0846   8       list_del(&p->list);            8       list_del_rcu(&p->list);
0847   9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
0848                                         10       synchronize_rcu();
0849  10       kfree(p);                     11       kfree(p);
0850  11       return 1;                     12       return 1;
0851  12     }                               13     }
0852  13   }                                 14   }
0853  14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
0854  15   return 0;                         16   return 0;
0855  16 }                                   17 }
0856
0857 Either way, the differences are quite small.  Read-side locking moves
0858 to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
0859 a reader-writer lock to a simple spinlock, and a synchronize_rcu()
0860 precedes the kfree().
0861
0862 However, there is one potential catch: the read-side and update-side
0863 critical sections can now run concurrently.  In many cases, this will
0864 not be a problem, but it is necessary to check carefully regardless.
0865 For example, if multiple independent list updates must be seen as
0866 a single atomic update, converting to RCU will require special care.
0867
0868 Also, the presence of synchronize_rcu() means that the RCU version of
0869 delete() can now block.  If this is a problem, there is a callback-based
0870 mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
0871 be used in place of synchronize_rcu().
0872
0873 .. _7_whatisRCU:
0874
0875 7.  ANALOGY WITH REFERENCE COUNTING
0876 -----------------------------------
0877
0878 The reader-writer analogy (illustrated by the previous section) is not
0879 always the best way to think about using RCU.  Another helpful analogy
0880 considers RCU an effective reference count on everything which is
0881 protected by RCU.
0882
0883 A reference count typically does not prevent the referenced object's
0884 values from changing, but does prevent changes to type -- particularly the
0885 gross change of type that happens when that object's memory is freed and
0886 re-allocated for some other purpose.  Once a type-safe reference to the
0887 object is obtained, some other mechanism is needed to ensure consistent
0888 access to the data in the object.  This could involve taking a spinlock,
0889 but with RCU the typical approach is to perform reads with SMP-aware
0890 operations such as smp_load_acquire(), to perform updates with atomic
0891 read-modify-write operations, and to provide the necessary ordering.
0892 RCU provides a number of support functions that embed the required
0893 operations and ordering, such as the list_for_each_entry_rcu() macro
0894 used in the previous section.
0895
0896 A more focused view of the reference counting behavior is that,
0897 between rcu_read_lock() and rcu_read_unlock(), any reference taken with
0898 rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
0899 though a reference-count on that object has been temporarily increased.
0900 This prevents the object from changing type.  Exactly what this means
0901 will depend on normal expectations of objects of that type, but it
0902 typically includes that spinlocks can still be safely locked, normal
0903 reference counters can be safely manipulated, and ``__rcu`` pointers
0904 can be safely dereferenced.
0905
0906 Some operations that one might expect to see on an object for
0907 which an RCU reference is held include:
0908
0909  - Copying out data that is guaranteed to be stable by the object's type.
0910  - Using kref_get_unless_zero() or similar to get a longer-term
0911    reference.  This may fail of course.
0912  - Acquiring a spinlock in the object, and checking if the object still
0913    is the expected object and if so, manipulating it freely.
0914
0915 The understanding that RCU provides a reference that only prevents a
0916 change of type is particularly visible with objects allocated from a
0917 slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
0918 reference to an object from such a cache that has been concurrently
0919 freed and the memory reallocated to a completely different object,
0920 though of the same type.  In this case RCU doesn't even protect the
0921 identity of the object from changing, only its type.  So the object
0922 found may not be the one expected, but it will be one where it is safe
0923 to take a reference or spinlock and then confirm that the identity
0924 matches the expectations.
0925
0926 With traditional reference counting -- such as that implemented by the
0927 kref library in Linux -- there is typically code that runs when the last
0928 reference to an object is dropped.  With kref, this is the function
0929 passed to kref_put().  When RCU is being used, such finalization code
0930 must not be run until all ``__rcu`` pointers referencing the object have
0931 been updated, and then a grace period has passed.  Every remaining
0932 globally visible pointer to the object must be considered to be a
0933 potential counted reference, and the finalization code is typically run
0934 using call_rcu() only after all those pointers have been changed.
0935
0936 To see how to choose between these two analogies -- of RCU as a
0937 reader-writer lock and RCU as a reference counting system -- it is useful
0938 to reflect on the scale of the thing being protected.  The reader-writer
0939 lock analogy looks at larger multi-part objects such as a linked list
0940 and shows how RCU can facilitate concurrency while elements are added
0941 to, and removed from, the list.  The reference-count analogy looks at
0942 the individual objects and looks at how they can be accessed safely
0943 within whatever whole they are a part of.
0944
0945 .. _8_whatisRCU:
0946
0947 8.  FULL LIST OF RCU APIs
0948 -------------------------
0949
0950 The RCU APIs are documented in docbook-format header comments in the
0951 Linux-kernel source code, but it helps to have a full list of the
0952 APIs, since there does not appear to be a way to categorize them
0953 in docbook.  Here is the list, by category.
0954
0955 RCU list traversal::
0956
0957         list_entry_rcu
0958         list_entry_lockless
0959         list_first_entry_rcu
0960         list_next_rcu
0961         list_for_each_entry_rcu
0962         list_for_each_entry_continue_rcu
0963         list_for_each_entry_from_rcu
0964         list_first_or_null_rcu
0965         list_next_or_null_rcu
0966         hlist_first_rcu
0967         hlist_next_rcu
0968         hlist_pprev_rcu
0969         hlist_for_each_entry_rcu
0970         hlist_for_each_entry_rcu_bh
0971         hlist_for_each_entry_from_rcu
0972         hlist_for_each_entry_continue_rcu
0973         hlist_for_each_entry_continue_rcu_bh
0974         hlist_nulls_first_rcu
0975         hlist_nulls_for_each_entry_rcu
0976         hlist_bl_first_rcu
0977         hlist_bl_for_each_entry_rcu
0978
0979 RCU pointer/list update::
0980
0981         rcu_assign_pointer
0982         list_add_rcu
0983         list_add_tail_rcu
0984         list_del_rcu
0985         list_replace_rcu
0986         hlist_add_behind_rcu
0987         hlist_add_before_rcu
0988         hlist_add_head_rcu
0989         hlist_add_tail_rcu
0990         hlist_del_rcu
0991         hlist_del_init_rcu
0992         hlist_replace_rcu
0993         list_splice_init_rcu
0994         list_splice_tail_init_rcu
0995         hlist_nulls_del_init_rcu
0996         hlist_nulls_del_rcu
0997         hlist_nulls_add_head_rcu
0998         hlist_bl_add_head_rcu
0999         hlist_bl_del_init_rcu
1000         hlist_bl_del_rcu
1001         hlist_bl_set_first_rcu
1002
1003 RCU::
1004
1005         Critical sections       Grace period            Barrier
1006
1007         rcu_read_lock           synchronize_net         rcu_barrier
1008         rcu_read_unlock         synchronize_rcu
1009         rcu_dereference         synchronize_rcu_expedited
1010         rcu_read_lock_held      call_rcu
1011         rcu_dereference_check   kfree_rcu
1012         rcu_dereference_protected
1013
1014 bh::
1015
1016         Critical sections       Grace period            Barrier
1017
1018         rcu_read_lock_bh        call_rcu                rcu_barrier
1019         rcu_read_unlock_bh      synchronize_rcu
1020         [local_bh_disable]      synchronize_rcu_expedited
1021         [and friends]
1022         rcu_dereference_bh
1023         rcu_dereference_bh_check
1024         rcu_dereference_bh_protected
1025         rcu_read_lock_bh_held
1026
1027 sched::
1028
1029         Critical sections       Grace period            Barrier
1030
1031         rcu_read_lock_sched     call_rcu                rcu_barrier
1032         rcu_read_unlock_sched   synchronize_rcu
1033         [preempt_disable]       synchronize_rcu_expedited
1034         [and friends]
1035         rcu_read_lock_sched_notrace
1036         rcu_read_unlock_sched_notrace
1037         rcu_dereference_sched
1038         rcu_dereference_sched_check
1039         rcu_dereference_sched_protected
1040         rcu_read_lock_sched_held
1041
1042
1043 SRCU::
1044
1045         Critical sections       Grace period            Barrier
1046
1047         srcu_read_lock          call_srcu               srcu_barrier
1048         srcu_read_unlock        synchronize_srcu
1049         srcu_dereference        synchronize_srcu_expedited
1050         srcu_dereference_check
1051         srcu_read_lock_held
1052
1053 SRCU: Initialization/cleanup::
1054
1055         DEFINE_SRCU
1056         DEFINE_STATIC_SRCU
1057         init_srcu_struct
1058         cleanup_srcu_struct
1059
1060 All: lockdep-checked RCU-protected pointer access::
1061
1062         rcu_access_pointer
1063         rcu_dereference_raw
1064         RCU_LOCKDEP_WARN
1065         rcu_sleep_check
1066         RCU_NONIDLE
1067
1068 See the comment headers in the source code (or the docbook generated
1069 from them) for more information.
1070
1071 However, given that there are no fewer than four families of RCU APIs
1072 in the Linux kernel, how do you choose which one to use?  The following
1073 list can be helpful:
1074
1075 a.      Will readers need to block?  If so, you need SRCU.
1076
1077 b.      What about the -rt patchset?  If readers would need to block
1078         in an non-rt kernel, you need SRCU.  If readers would block
1079         in a -rt kernel, but not in a non-rt kernel, SRCU is not
1080         necessary.  (The -rt patchset turns spinlocks into sleeplocks,
1081         hence this distinction.)
1082
1083 c.      Do you need to treat NMI handlers, hardirq handlers,
1084         and code segments with preemption disabled (whether
1085         via preempt_disable(), local_irq_save(), local_bh_disable(),
1086         or some other mechanism) as if they were explicit RCU readers?
1087         If so, RCU-sched is the only choice that will work for you.
1088
1089 d.      Do you need RCU grace periods to complete even in the face
1090         of softirq monopolization of one or more of the CPUs?  For
1091         example, is your code subject to network-based denial-of-service
1092         attacks?  If so, you should disable softirq across your readers,
1093         for example, by using rcu_read_lock_bh().
1094
1095 e.      Is your workload too update-intensive for normal use of
1096         RCU, but inappropriate for other synchronization mechanisms?
1097         If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1098         named SLAB_DESTROY_BY_RCU).  But please be careful!
1099
1100 f.      Do you need read-side critical sections that are respected
1101         even though they are in the middle of the idle loop, during
1102         user-mode execution, or on an offlined CPU?  If so, SRCU is the
1103         only choice that will work for you.
1104
1105 g.      Otherwise, use RCU.
1106
1107 Of course, this all assumes that you have determined that RCU is in fact
1108 the right tool for your job.
1109
1110 .. _9_whatisRCU:
1111
1112 9.  ANSWERS TO QUICK QUIZZES
1113 ----------------------------
1114
1115 Quick Quiz #1:
1116                 Why is this argument naive?  How could a deadlock
1117                 occur when using this algorithm in a real-world Linux
1118                 kernel?  [Referring to the lock-based "toy" RCU
1119                 algorithm.]
1120
1121 Answer:
1122                 Consider the following sequence of events:
1123
1124                 1.      CPU 0 acquires some unrelated lock, call it
1125                         "problematic_lock", disabling irq via
1126                         spin_lock_irqsave().
1127
1128                 2.      CPU 1 enters synchronize_rcu(), write-acquiring
1129                         rcu_gp_mutex.
1130
1131                 3.      CPU 0 enters rcu_read_lock(), but must wait
1132                         because CPU 1 holds rcu_gp_mutex.
1133
1134                 4.      CPU 1 is interrupted, and the irq handler
1135                         attempts to acquire problematic_lock.
1136
1137                 The system is now deadlocked.
1138
1139                 One way to avoid this deadlock is to use an approach like
1140                 that of CONFIG_PREEMPT_RT, where all normal spinlocks
1141                 become blocking locks, and all irq handlers execute in
1142                 the context of special tasks.  In this case, in step 4
1143                 above, the irq handler would block, allowing CPU 1 to
1144                 release rcu_gp_mutex, avoiding the deadlock.
1145
1146                 Even in the absence of deadlock, this RCU implementation
1147                 allows latency to "bleed" from readers to other
1148                 readers through synchronize_rcu().  To see this,
1149                 consider task A in an RCU read-side critical section
1150                 (thus read-holding rcu_gp_mutex), task B blocked
1151                 attempting to write-acquire rcu_gp_mutex, and
1152                 task C blocked in rcu_read_lock() attempting to
1153                 read_acquire rcu_gp_mutex.  Task A's RCU read-side
1154                 latency is holding up task C, albeit indirectly via
1155                 task B.
1156
1157                 Realtime RCU implementations therefore use a counter-based
1158                 approach where tasks in RCU read-side critical sections
1159                 cannot be blocked by tasks executing synchronize_rcu().
1160
1161 :ref:`Back to Quick Quiz #1 <quiz_1>`
1162
1163 Quick Quiz #2:
1164                 Give an example where Classic RCU's read-side
1165                 overhead is **negative**.
1166
1167 Answer:
1168                 Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1169                 kernel where a routing table is used by process-context
1170                 code, but can be updated by irq-context code (for example,
1171                 by an "ICMP REDIRECT" packet).  The usual way of handling
1172                 this would be to have the process-context code disable
1173                 interrupts while searching the routing table.  Use of
1174                 RCU allows such interrupt-disabling to be dispensed with.
1175                 Thus, without RCU, you pay the cost of disabling interrupts,
1176                 and with RCU you don't.
1177
1178                 One can argue that the overhead of RCU in this
1179                 case is negative with respect to the single-CPU
1180                 interrupt-disabling approach.  Others might argue that
1181                 the overhead of RCU is merely zero, and that replacing
1182                 the positive overhead of the interrupt-disabling scheme
1183                 with the zero-overhead RCU scheme does not constitute
1184                 negative overhead.
1185
1186                 In real life, of course, things are more complex.  But
1187                 even the theoretical possibility of negative overhead for
1188                 a synchronization primitive is a bit unexpected.  ;-)
1189
1190 :ref:`Back to Quick Quiz #2 <quiz_2>`
1191
1192 Quick Quiz #3:
1193                 If it is illegal to block in an RCU read-side
1194                 critical section, what the heck do you do in
1195                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
1196
1197 Answer:
1198                 Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1199                 critical sections, it permits preemption of RCU
1200                 read-side critical sections.  It also permits
1201                 spinlocks blocking while in RCU read-side critical
1202                 sections.
1203
1204                 Why the apparent inconsistency?  Because it is
1205                 possible to use priority boosting to keep the RCU
1206                 grace periods short if need be (for example, if running
1207                 short of memory).  In contrast, if blocking waiting
1208                 for (say) network reception, there is no way to know
1209                 what should be boosted.  Especially given that the
1210                 process we need to boost might well be a human being
1211                 who just went out for a pizza or something.  And although
1212                 a computer-operated cattle prod might arouse serious
1213                 interest, it might also provoke serious objections.
1214                 Besides, how does the computer know what pizza parlor
1215                 the human being went to???
1216
1217 :ref:`Back to Quick Quiz #3 <quiz_3>`
1218
1219 ACKNOWLEDGEMENTS
1220
1221 My thanks to the people who helped make this human-readable, including
1222 Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1223
1224
1225 For more information, see http://www.rdrop.com/users/paulmck/RCU.