0001 .. _whatisrcu_doc:
0002
0003 What is RCU? -- "Read, Copy, Update"
0004 ======================================
0005
0006 Please note that the "What is RCU?" LWN series is an excellent place
0007 to start learning about RCU:
0008
0009 | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
0010 | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
0011 | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
0012 | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
0013 | 2010 Big API Table http://lwn.net/Articles/419086/
0014 | 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
0015 | 2014 Big API Table http://lwn.net/Articles/609973/
0016
0017
0018 What is RCU?
0019
0020 RCU is a synchronization mechanism that was added to the Linux kernel
0021 during the 2.5 development effort that is optimized for read-mostly
0022 situations. Although RCU is actually quite simple once you understand it,
0023 getting there can sometimes be a challenge. Part of the problem is that
0024 most of the past descriptions of RCU have been written with the mistaken
0025 assumption that there is "one true way" to describe RCU. Instead,
0026 the experience has been that different people must take different paths
0027 to arrive at an understanding of RCU. This document provides several
0028 different paths, as follows:
0029
0030 :ref:`1. RCU OVERVIEW <1_whatisRCU>`
0031
0032 :ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>`
0033
0034 :ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
0035
0036 :ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
0037
0038 :ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
0039
0040 :ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
0041
0042 :ref:`7. ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
0043
0044 :ref:`8. FULL LIST OF RCU APIs <8_whatisRCU>`
0045
0046 :ref:`9. ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
0047
0048 People who prefer starting with a conceptual overview should focus on
0049 Section 1, though most readers will profit by reading this section at
0050 some point. People who prefer to start with an API that they can then
0051 experiment with should focus on Section 2. People who prefer to start
0052 with example uses should focus on Sections 3 and 4. People who need to
0053 understand the RCU implementation should focus on Section 5, then dive
0054 into the kernel source code. People who reason best by analogy should
0055 focus on Section 6. Section 7 serves as an index to the docbook API
0056 documentation, and Section 8 is the traditional answer key.
0057
0058 So, start with the section that makes the most sense to you and your
0059 preferred method of learning. If you need to know everything about
0060 everything, feel free to read the whole thing -- but if you are really
0061 that type of person, you have perused the source code and will therefore
0062 never need this document anyway. ;-)
0063
0064 .. _1_whatisRCU:
0065
0066 1. RCU OVERVIEW
0067 ----------------
0068
0069 The basic idea behind RCU is to split updates into "removal" and
0070 "reclamation" phases. The removal phase removes references to data items
0071 within a data structure (possibly by replacing them with references to
0072 new versions of these data items), and can run concurrently with readers.
0073 The reason that it is safe to run the removal phase concurrently with
0074 readers is the semantics of modern CPUs guarantee that readers will see
0075 either the old or the new version of the data structure rather than a
0076 partially updated reference. The reclamation phase does the work of reclaiming
0077 (e.g., freeing) the data items removed from the data structure during the
0078 removal phase. Because reclaiming data items can disrupt any readers
0079 concurrently referencing those data items, the reclamation phase must
0080 not start until readers no longer hold references to those data items.
0081
0082 Splitting the update into removal and reclamation phases permits the
0083 updater to perform the removal phase immediately, and to defer the
0084 reclamation phase until all readers active during the removal phase have
0085 completed, either by blocking until they finish or by registering a
0086 callback that is invoked after they finish. Only readers that are active
0087 during the removal phase need be considered, because any reader starting
0088 after the removal phase will be unable to gain a reference to the removed
0089 data items, and therefore cannot be disrupted by the reclamation phase.
0090
0091 So the typical RCU update sequence goes something like the following:
0092
0093 a. Remove pointers to a data structure, so that subsequent
0094 readers cannot gain a reference to it.
0095
0096 b. Wait for all previous readers to complete their RCU read-side
0097 critical sections.
0098
0099 c. At this point, there cannot be any readers who hold references
0100 to the data structure, so it now may safely be reclaimed
0101 (e.g., kfree()d).
0102
0103 Step (b) above is the key idea underlying RCU's deferred destruction.
0104 The ability to wait until all readers are done allows RCU readers to
0105 use much lighter-weight synchronization, in some cases, absolutely no
0106 synchronization at all. In contrast, in more conventional lock-based
0107 schemes, readers must use heavy-weight synchronization in order to
0108 prevent an updater from deleting the data structure out from under them.
0109 This is because lock-based updaters typically update data items in place,
0110 and must therefore exclude readers. In contrast, RCU-based updaters
0111 typically take advantage of the fact that writes to single aligned
0112 pointers are atomic on modern CPUs, allowing atomic insertion, removal,
0113 and replacement of data items in a linked structure without disrupting
0114 readers. Concurrent RCU readers can then continue accessing the old
0115 versions, and can dispense with the atomic operations, memory barriers,
0116 and communications cache misses that are so expensive on present-day
0117 SMP computer systems, even in absence of lock contention.
0118
0119 In the three-step procedure shown above, the updater is performing both
0120 the removal and the reclamation step, but it is often helpful for an
0121 entirely different thread to do the reclamation, as is in fact the case
0122 in the Linux kernel's directory-entry cache (dcache). Even if the same
0123 thread performs both the update step (step (a) above) and the reclamation
0124 step (step (c) above), it is often helpful to think of them separately.
0125 For example, RCU readers and updaters need not communicate at all,
0126 but RCU provides implicit low-overhead communication between readers
0127 and reclaimers, namely, in step (b) above.
0128
0129 So how the heck can a reclaimer tell when a reader is done, given
0130 that readers are not doing any sort of synchronization operations???
0131 Read on to learn about how RCU's API makes this easy.
0132
0133 .. _2_whatisRCU:
0134
0135 2. WHAT IS RCU'S CORE API?
0136 ---------------------------
0137
0138 The core RCU API is quite small:
0139
0140 a. rcu_read_lock()
0141 b. rcu_read_unlock()
0142 c. synchronize_rcu() / call_rcu()
0143 d. rcu_assign_pointer()
0144 e. rcu_dereference()
0145
0146 There are many other members of the RCU API, but the rest can be
0147 expressed in terms of these five, though most implementations instead
0148 express synchronize_rcu() in terms of the call_rcu() callback API.
0149
0150 The five core RCU APIs are described below, the other 18 will be enumerated
0151 later. See the kernel docbook documentation for more info, or look directly
0152 at the function header comments.
0153
0154 rcu_read_lock()
0155 ^^^^^^^^^^^^^^^
0156 void rcu_read_lock(void);
0157
0158 Used by a reader to inform the reclaimer that the reader is
0159 entering an RCU read-side critical section. It is illegal
0160 to block while in an RCU read-side critical section, though
0161 kernels built with CONFIG_PREEMPT_RCU can preempt RCU
0162 read-side critical sections. Any RCU-protected data structure
0163 accessed during an RCU read-side critical section is guaranteed to
0164 remain unreclaimed for the full duration of that critical section.
0165 Reference counts may be used in conjunction with RCU to maintain
0166 longer-term references to data structures.
0167
0168 rcu_read_unlock()
0169 ^^^^^^^^^^^^^^^^^
0170 void rcu_read_unlock(void);
0171
0172 Used by a reader to inform the reclaimer that the reader is
0173 exiting an RCU read-side critical section. Note that RCU
0174 read-side critical sections may be nested and/or overlapping.
0175
0176 synchronize_rcu()
0177 ^^^^^^^^^^^^^^^^^
0178 void synchronize_rcu(void);
0179
0180 Marks the end of updater code and the beginning of reclaimer
0181 code. It does this by blocking until all pre-existing RCU
0182 read-side critical sections on all CPUs have completed.
0183 Note that synchronize_rcu() will **not** necessarily wait for
0184 any subsequent RCU read-side critical sections to complete.
0185 For example, consider the following sequence of events::
0186
0187 CPU 0 CPU 1 CPU 2
0188 ----------------- ------------------------- ---------------
0189 1. rcu_read_lock()
0190 2. enters synchronize_rcu()
0191 3. rcu_read_lock()
0192 4. rcu_read_unlock()
0193 5. exits synchronize_rcu()
0194 6. rcu_read_unlock()
0195
0196 To reiterate, synchronize_rcu() waits only for ongoing RCU
0197 read-side critical sections to complete, not necessarily for
0198 any that begin after synchronize_rcu() is invoked.
0199
0200 Of course, synchronize_rcu() does not necessarily return
0201 **immediately** after the last pre-existing RCU read-side critical
0202 section completes. For one thing, there might well be scheduling
0203 delays. For another thing, many RCU implementations process
0204 requests in batches in order to improve efficiencies, which can
0205 further delay synchronize_rcu().
0206
0207 Since synchronize_rcu() is the API that must figure out when
0208 readers are done, its implementation is key to RCU. For RCU
0209 to be useful in all but the most read-intensive situations,
0210 synchronize_rcu()'s overhead must also be quite small.
0211
0212 The call_rcu() API is a callback form of synchronize_rcu(),
0213 and is described in more detail in a later section. Instead of
0214 blocking, it registers a function and argument which are invoked
0215 after all ongoing RCU read-side critical sections have completed.
0216 This callback variant is particularly useful in situations where
0217 it is illegal to block or where update-side performance is
0218 critically important.
0219
0220 However, the call_rcu() API should not be used lightly, as use
0221 of the synchronize_rcu() API generally results in simpler code.
0222 In addition, the synchronize_rcu() API has the nice property
0223 of automatically limiting update rate should grace periods
0224 be delayed. This property results in system resilience in face
0225 of denial-of-service attacks. Code using call_rcu() should limit
0226 update rate in order to gain this same sort of resilience. See
0227 checklist.rst for some approaches to limiting the update rate.
0228
0229 rcu_assign_pointer()
0230 ^^^^^^^^^^^^^^^^^^^^
0231 void rcu_assign_pointer(p, typeof(p) v);
0232
0233 Yes, rcu_assign_pointer() **is** implemented as a macro, though it
0234 would be cool to be able to declare a function in this manner.
0235 (Compiler experts will no doubt disagree.)
0236
0237 The updater uses this function to assign a new value to an
0238 RCU-protected pointer, in order to safely communicate the change
0239 in value from the updater to the reader. This macro does not
0240 evaluate to an rvalue, but it does execute any memory-barrier
0241 instructions required for a given CPU architecture.
0242
0243 Perhaps just as important, it serves to document (1) which
0244 pointers are protected by RCU and (2) the point at which a
0245 given structure becomes accessible to other CPUs. That said,
0246 rcu_assign_pointer() is most frequently used indirectly, via
0247 the _rcu list-manipulation primitives such as list_add_rcu().
0248
0249 rcu_dereference()
0250 ^^^^^^^^^^^^^^^^^
0251 typeof(p) rcu_dereference(p);
0252
0253 Like rcu_assign_pointer(), rcu_dereference() must be implemented
0254 as a macro.
0255
0256 The reader uses rcu_dereference() to fetch an RCU-protected
0257 pointer, which returns a value that may then be safely
0258 dereferenced. Note that rcu_dereference() does not actually
0259 dereference the pointer, instead, it protects the pointer for
0260 later dereferencing. It also executes any needed memory-barrier
0261 instructions for a given CPU architecture. Currently, only Alpha
0262 needs memory barriers within rcu_dereference() -- on other CPUs,
0263 it compiles to nothing, not even a compiler directive.
0264
0265 Common coding practice uses rcu_dereference() to copy an
0266 RCU-protected pointer to a local variable, then dereferences
0267 this local variable, for example as follows::
0268
0269 p = rcu_dereference(head.next);
0270 return p->data;
0271
0272 However, in this case, one could just as easily combine these
0273 into one statement::
0274
0275 return rcu_dereference(head.next)->data;
0276
0277 If you are going to be fetching multiple fields from the
0278 RCU-protected structure, using the local variable is of
0279 course preferred. Repeated rcu_dereference() calls look
0280 ugly, do not guarantee that the same pointer will be returned
0281 if an update happened while in the critical section, and incur
0282 unnecessary overhead on Alpha CPUs.
0283
0284 Note that the value returned by rcu_dereference() is valid
0285 only within the enclosing RCU read-side critical section [1]_.
0286 For example, the following is **not** legal::
0287
0288 rcu_read_lock();
0289 p = rcu_dereference(head.next);
0290 rcu_read_unlock();
0291 x = p->address; /* BUG!!! */
0292 rcu_read_lock();
0293 y = p->data; /* BUG!!! */
0294 rcu_read_unlock();
0295
0296 Holding a reference from one RCU read-side critical section
0297 to another is just as illegal as holding a reference from
0298 one lock-based critical section to another! Similarly,
0299 using a reference outside of the critical section in which
0300 it was acquired is just as illegal as doing so with normal
0301 locking.
0302
0303 As with rcu_assign_pointer(), an important function of
0304 rcu_dereference() is to document which pointers are protected by
0305 RCU, in particular, flagging a pointer that is subject to changing
0306 at any time, including immediately after the rcu_dereference().
0307 And, again like rcu_assign_pointer(), rcu_dereference() is
0308 typically used indirectly, via the _rcu list-manipulation
0309 primitives, such as list_for_each_entry_rcu() [2]_.
0310
0311 .. [1] The variant rcu_dereference_protected() can be used outside
0312 of an RCU read-side critical section as long as the usage is
0313 protected by locks acquired by the update-side code. This variant
0314 avoids the lockdep warning that would happen when using (for
0315 example) rcu_dereference() without rcu_read_lock() protection.
0316 Using rcu_dereference_protected() also has the advantage
0317 of permitting compiler optimizations that rcu_dereference()
0318 must prohibit. The rcu_dereference_protected() variant takes
0319 a lockdep expression to indicate which locks must be acquired
0320 by the caller. If the indicated protection is not provided,
0321 a lockdep splat is emitted. See Design/Requirements/Requirements.rst
0322 and the API's code comments for more details and example usage.
0323
0324 .. [2] If the list_for_each_entry_rcu() instance might be used by
0325 update-side code as well as by RCU readers, then an additional
0326 lockdep expression can be added to its list of arguments.
0327 For example, given an additional "lock_is_held(&mylock)" argument,
0328 the RCU lockdep code would complain only if this instance was
0329 invoked outside of an RCU read-side critical section and without
0330 the protection of mylock.
0331
0332 The following diagram shows how each API communicates among the
0333 reader, updater, and reclaimer.
0334 ::
0335
0336
0337 rcu_assign_pointer()
0338 +--------+
0339 +---------------------->| reader |---------+
0340 | +--------+ |
0341 | | |
0342 | | | Protect:
0343 | | | rcu_read_lock()
0344 | | | rcu_read_unlock()
0345 | rcu_dereference() | |
0346 +---------+ | |
0347 | updater |<----------------+ |
0348 +---------+ V
0349 | +-----------+
0350 +----------------------------------->| reclaimer |
0351 +-----------+
0352 Defer:
0353 synchronize_rcu() & call_rcu()
0354
0355
0356 The RCU infrastructure observes the time sequence of rcu_read_lock(),
0357 rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
0358 order to determine when (1) synchronize_rcu() invocations may return
0359 to their callers and (2) call_rcu() callbacks may be invoked. Efficient
0360 implementations of the RCU infrastructure make heavy use of batching in
0361 order to amortize their overhead over many uses of the corresponding APIs.
0362
0363 There are at least three flavors of RCU usage in the Linux kernel. The diagram
0364 above shows the most common one. On the updater side, the rcu_assign_pointer(),
0365 synchronize_rcu() and call_rcu() primitives used are the same for all three
0366 flavors. However for protection (on the reader side), the primitives used vary
0367 depending on the flavor:
0368
0369 a. rcu_read_lock() / rcu_read_unlock()
0370 rcu_dereference()
0371
0372 b. rcu_read_lock_bh() / rcu_read_unlock_bh()
0373 local_bh_disable() / local_bh_enable()
0374 rcu_dereference_bh()
0375
0376 c. rcu_read_lock_sched() / rcu_read_unlock_sched()
0377 preempt_disable() / preempt_enable()
0378 local_irq_save() / local_irq_restore()
0379 hardirq enter / hardirq exit
0380 NMI enter / NMI exit
0381 rcu_dereference_sched()
0382
0383 These three flavors are used as follows:
0384
0385 a. RCU applied to normal data structures.
0386
0387 b. RCU applied to networking data structures that may be subjected
0388 to remote denial-of-service attacks.
0389
0390 c. RCU applied to scheduler and interrupt/NMI-handler tasks.
0391
0392 Again, most uses will be of (a). The (b) and (c) cases are important
0393 for specialized uses, but are relatively uncommon.
0394
0395 .. _3_whatisRCU:
0396
0397 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
0398 -----------------------------------------------
0399
0400 This section shows a simple use of the core RCU API to protect a
0401 global pointer to a dynamically allocated structure. More-typical
0402 uses of RCU may be found in listRCU.rst, arrayRCU.rst, and NMI-RCU.rst.
0403 ::
0404
0405 struct foo {
0406 int a;
0407 char b;
0408 long c;
0409 };
0410 DEFINE_SPINLOCK(foo_mutex);
0411
0412 struct foo __rcu *gbl_foo;
0413
0414 /*
0415 * Create a new struct foo that is the same as the one currently
0416 * pointed to by gbl_foo, except that field "a" is replaced
0417 * with "new_a". Points gbl_foo to the new structure, and
0418 * frees up the old structure after a grace period.
0419 *
0420 * Uses rcu_assign_pointer() to ensure that concurrent readers
0421 * see the initialized version of the new structure.
0422 *
0423 * Uses synchronize_rcu() to ensure that any readers that might
0424 * have references to the old structure complete before freeing
0425 * the old structure.
0426 */
0427 void foo_update_a(int new_a)
0428 {
0429 struct foo *new_fp;
0430 struct foo *old_fp;
0431
0432 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
0433 spin_lock(&foo_mutex);
0434 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
0435 *new_fp = *old_fp;
0436 new_fp->a = new_a;
0437 rcu_assign_pointer(gbl_foo, new_fp);
0438 spin_unlock(&foo_mutex);
0439 synchronize_rcu();
0440 kfree(old_fp);
0441 }
0442
0443 /*
0444 * Return the value of field "a" of the current gbl_foo
0445 * structure. Use rcu_read_lock() and rcu_read_unlock()
0446 * to ensure that the structure does not get deleted out
0447 * from under us, and use rcu_dereference() to ensure that
0448 * we see the initialized version of the structure (important
0449 * for DEC Alpha and for people reading the code).
0450 */
0451 int foo_get_a(void)
0452 {
0453 int retval;
0454
0455 rcu_read_lock();
0456 retval = rcu_dereference(gbl_foo)->a;
0457 rcu_read_unlock();
0458 return retval;
0459 }
0460
0461 So, to sum up:
0462
0463 - Use rcu_read_lock() and rcu_read_unlock() to guard RCU
0464 read-side critical sections.
0465
0466 - Within an RCU read-side critical section, use rcu_dereference()
0467 to dereference RCU-protected pointers.
0468
0469 - Use some solid scheme (such as locks or semaphores) to
0470 keep concurrent updates from interfering with each other.
0471
0472 - Use rcu_assign_pointer() to update an RCU-protected pointer.
0473 This primitive protects concurrent readers from the updater,
0474 **not** concurrent updates from each other! You therefore still
0475 need to use locking (or something similar) to keep concurrent
0476 rcu_assign_pointer() primitives from interfering with each other.
0477
0478 - Use synchronize_rcu() **after** removing a data element from an
0479 RCU-protected data structure, but **before** reclaiming/freeing
0480 the data element, in order to wait for the completion of all
0481 RCU read-side critical sections that might be referencing that
0482 data item.
0483
0484 See checklist.rst for additional rules to follow when using RCU.
0485 And again, more-typical uses of RCU may be found in listRCU.rst,
0486 arrayRCU.rst, and NMI-RCU.rst.
0487
0488 .. _4_whatisRCU:
0489
0490 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
0491 --------------------------------------------
0492
0493 In the example above, foo_update_a() blocks until a grace period elapses.
0494 This is quite simple, but in some cases one cannot afford to wait so
0495 long -- there might be other high-priority work to be done.
0496
0497 In such cases, one uses call_rcu() rather than synchronize_rcu().
0498 The call_rcu() API is as follows::
0499
0500 void call_rcu(struct rcu_head *head, rcu_callback_t func);
0501
0502 This function invokes func(head) after a grace period has elapsed.
0503 This invocation might happen from either softirq or process context,
0504 so the function is not permitted to block. The foo struct needs to
0505 have an rcu_head structure added, perhaps as follows::
0506
0507 struct foo {
0508 int a;
0509 char b;
0510 long c;
0511 struct rcu_head rcu;
0512 };
0513
0514 The foo_update_a() function might then be written as follows::
0515
0516 /*
0517 * Create a new struct foo that is the same as the one currently
0518 * pointed to by gbl_foo, except that field "a" is replaced
0519 * with "new_a". Points gbl_foo to the new structure, and
0520 * frees up the old structure after a grace period.
0521 *
0522 * Uses rcu_assign_pointer() to ensure that concurrent readers
0523 * see the initialized version of the new structure.
0524 *
0525 * Uses call_rcu() to ensure that any readers that might have
0526 * references to the old structure complete before freeing the
0527 * old structure.
0528 */
0529 void foo_update_a(int new_a)
0530 {
0531 struct foo *new_fp;
0532 struct foo *old_fp;
0533
0534 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
0535 spin_lock(&foo_mutex);
0536 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
0537 *new_fp = *old_fp;
0538 new_fp->a = new_a;
0539 rcu_assign_pointer(gbl_foo, new_fp);
0540 spin_unlock(&foo_mutex);
0541 call_rcu(&old_fp->rcu, foo_reclaim);
0542 }
0543
0544 The foo_reclaim() function might appear as follows::
0545
0546 void foo_reclaim(struct rcu_head *rp)
0547 {
0548 struct foo *fp = container_of(rp, struct foo, rcu);
0549
0550 foo_cleanup(fp->a);
0551
0552 kfree(fp);
0553 }
0554
0555 The container_of() primitive is a macro that, given a pointer into a
0556 struct, the type of the struct, and the pointed-to field within the
0557 struct, returns a pointer to the beginning of the struct.
0558
0559 The use of call_rcu() permits the caller of foo_update_a() to
0560 immediately regain control, without needing to worry further about the
0561 old version of the newly updated element. It also clearly shows the
0562 RCU distinction between updater, namely foo_update_a(), and reclaimer,
0563 namely foo_reclaim().
0564
0565 The summary of advice is the same as for the previous section, except
0566 that we are now using call_rcu() rather than synchronize_rcu():
0567
0568 - Use call_rcu() **after** removing a data element from an
0569 RCU-protected data structure in order to register a callback
0570 function that will be invoked after the completion of all RCU
0571 read-side critical sections that might be referencing that
0572 data item.
0573
0574 If the callback for call_rcu() is not doing anything more than calling
0575 kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
0576 to avoid having to write your own callback::
0577
0578 kfree_rcu(old_fp, rcu);
0579
0580 Again, see checklist.rst for additional rules governing the use of RCU.
0581
0582 .. _5_whatisRCU:
0583
0584 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
0585 ------------------------------------------------
0586
0587 One of the nice things about RCU is that it has extremely simple "toy"
0588 implementations that are a good first step towards understanding the
0589 production-quality implementations in the Linux kernel. This section
0590 presents two such "toy" implementations of RCU, one that is implemented
0591 in terms of familiar locking primitives, and another that more closely
0592 resembles "classic" RCU. Both are way too simple for real-world use,
0593 lacking both functionality and performance. However, they are useful
0594 in getting a feel for how RCU works. See kernel/rcu/update.c for a
0595 production-quality implementation, and see:
0596
0597 http://www.rdrop.com/users/paulmck/RCU
0598
0599 for papers describing the Linux kernel RCU implementation. The OLS'01
0600 and OLS'02 papers are a good introduction, and the dissertation provides
0601 more details on the current implementation as of early 2004.
0602
0603
0604 5A. "TOY" IMPLEMENTATION #1: LOCKING
0605 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0606 This section presents a "toy" RCU implementation that is based on
0607 familiar locking primitives. Its overhead makes it a non-starter for
0608 real-life use, as does its lack of scalability. It is also unsuitable
0609 for realtime use, since it allows scheduling latency to "bleed" from
0610 one read-side critical section to another. It also assumes recursive
0611 reader-writer locks: If you try this with non-recursive locks, and
0612 you allow nested rcu_read_lock() calls, you can deadlock.
0613
0614 However, it is probably the easiest implementation to relate to, so is
0615 a good starting point.
0616
0617 It is extremely simple::
0618
0619 static DEFINE_RWLOCK(rcu_gp_mutex);
0620
0621 void rcu_read_lock(void)
0622 {
0623 read_lock(&rcu_gp_mutex);
0624 }
0625
0626 void rcu_read_unlock(void)
0627 {
0628 read_unlock(&rcu_gp_mutex);
0629 }
0630
0631 void synchronize_rcu(void)
0632 {
0633 write_lock(&rcu_gp_mutex);
0634 smp_mb__after_spinlock();
0635 write_unlock(&rcu_gp_mutex);
0636 }
0637
0638 [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
0639 much. But here are simplified versions anyway. And whatever you do,
0640 don't forget about them when submitting patches making use of RCU!]::
0641
0642 #define rcu_assign_pointer(p, v) \
0643 ({ \
0644 smp_store_release(&(p), (v)); \
0645 })
0646
0647 #define rcu_dereference(p) \
0648 ({ \
0649 typeof(p) _________p1 = READ_ONCE(p); \
0650 (_________p1); \
0651 })
0652
0653
0654 The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
0655 and release a global reader-writer lock. The synchronize_rcu()
0656 primitive write-acquires this same lock, then releases it. This means
0657 that once synchronize_rcu() exits, all RCU read-side critical sections
0658 that were in progress before synchronize_rcu() was called are guaranteed
0659 to have completed -- there is no way that synchronize_rcu() would have
0660 been able to write-acquire the lock otherwise. The smp_mb__after_spinlock()
0661 promotes synchronize_rcu() to a full memory barrier in compliance with
0662 the "Memory-Barrier Guarantees" listed in:
0663
0664 Design/Requirements/Requirements.rst
0665
0666 It is possible to nest rcu_read_lock(), since reader-writer locks may
0667 be recursively acquired. Note also that rcu_read_lock() is immune
0668 from deadlock (an important property of RCU). The reason for this is
0669 that the only thing that can block rcu_read_lock() is a synchronize_rcu().
0670 But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
0671 so there can be no deadlock cycle.
0672
0673 .. _quiz_1:
0674
0675 Quick Quiz #1:
0676 Why is this argument naive? How could a deadlock
0677 occur when using this algorithm in a real-world Linux
0678 kernel? How could this deadlock be avoided?
0679
0680 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0681
0682 5B. "TOY" EXAMPLE #2: CLASSIC RCU
0683 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0684 This section presents a "toy" RCU implementation that is based on
0685 "classic RCU". It is also short on performance (but only for updates) and
0686 on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
0687 kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
0688 are the same as those shown in the preceding section, so they are omitted.
0689 ::
0690
0691 void rcu_read_lock(void) { }
0692
0693 void rcu_read_unlock(void) { }
0694
0695 void synchronize_rcu(void)
0696 {
0697 int cpu;
0698
0699 for_each_possible_cpu(cpu)
0700 run_on(cpu);
0701 }
0702
0703 Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
0704 This is the great strength of classic RCU in a non-preemptive kernel:
0705 read-side overhead is precisely zero, at least on non-Alpha CPUs.
0706 And there is absolutely no way that rcu_read_lock() can possibly
0707 participate in a deadlock cycle!
0708
0709 The implementation of synchronize_rcu() simply schedules itself on each
0710 CPU in turn. The run_on() primitive can be implemented straightforwardly
0711 in terms of the sched_setaffinity() primitive. Of course, a somewhat less
0712 "toy" implementation would restore the affinity upon completion rather
0713 than just leaving all tasks running on the last CPU, but when I said
0714 "toy", I meant **toy**!
0715
0716 So how the heck is this supposed to work???
0717
0718 Remember that it is illegal to block while in an RCU read-side critical
0719 section. Therefore, if a given CPU executes a context switch, we know
0720 that it must have completed all preceding RCU read-side critical sections.
0721 Once **all** CPUs have executed a context switch, then **all** preceding
0722 RCU read-side critical sections will have completed.
0723
0724 So, suppose that we remove a data item from its structure and then invoke
0725 synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
0726 that there are no RCU read-side critical sections holding a reference
0727 to that data item, so we can safely reclaim it.
0728
0729 .. _quiz_2:
0730
0731 Quick Quiz #2:
0732 Give an example where Classic RCU's read-side
0733 overhead is **negative**.
0734
0735 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0736
0737 .. _quiz_3:
0738
0739 Quick Quiz #3:
0740 If it is illegal to block in an RCU read-side
0741 critical section, what the heck do you do in
0742 CONFIG_PREEMPT_RT, where normal spinlocks can block???
0743
0744 :ref:`Answers to Quick Quiz <9_whatisRCU>`
0745
0746 .. _6_whatisRCU:
0747
0748 6. ANALOGY WITH READER-WRITER LOCKING
0749 --------------------------------------
0750
0751 Although RCU can be used in many different ways, a very common use of
0752 RCU is analogous to reader-writer locking. The following unified
0753 diff shows how closely related RCU and reader-writer locking can be.
0754 ::
0755
0756 @@ -5,5 +5,5 @@ struct el {
0757 int data;
0758 /* Other data fields */
0759 };
0760 -rwlock_t listmutex;
0761 +spinlock_t listmutex;
0762 struct el head;
0763
0764 @@ -13,15 +14,15 @@
0765 struct list_head *lp;
0766 struct el *p;
0767
0768 - read_lock(&listmutex);
0769 - list_for_each_entry(p, head, lp) {
0770 + rcu_read_lock();
0771 + list_for_each_entry_rcu(p, head, lp) {
0772 if (p->key == key) {
0773 *result = p->data;
0774 - read_unlock(&listmutex);
0775 + rcu_read_unlock();
0776 return 1;
0777 }
0778 }
0779 - read_unlock(&listmutex);
0780 + rcu_read_unlock();
0781 return 0;
0782 }
0783
0784 @@ -29,15 +30,16 @@
0785 {
0786 struct el *p;
0787
0788 - write_lock(&listmutex);
0789 + spin_lock(&listmutex);
0790 list_for_each_entry(p, head, lp) {
0791 if (p->key == key) {
0792 - list_del(&p->list);
0793 - write_unlock(&listmutex);
0794 + list_del_rcu(&p->list);
0795 + spin_unlock(&listmutex);
0796 + synchronize_rcu();
0797 kfree(p);
0798 return 1;
0799 }
0800 }
0801 - write_unlock(&listmutex);
0802 + spin_unlock(&listmutex);
0803 return 0;
0804 }
0805
0806 Or, for those who prefer a side-by-side listing::
0807
0808 1 struct el { 1 struct el {
0809 2 struct list_head list; 2 struct list_head list;
0810 3 long key; 3 long key;
0811 4 spinlock_t mutex; 4 spinlock_t mutex;
0812 5 int data; 5 int data;
0813 6 /* Other data fields */ 6 /* Other data fields */
0814 7 }; 7 };
0815 8 rwlock_t listmutex; 8 spinlock_t listmutex;
0816 9 struct el head; 9 struct el head;
0817
0818 ::
0819
0820 1 int search(long key, int *result) 1 int search(long key, int *result)
0821 2 { 2 {
0822 3 struct list_head *lp; 3 struct list_head *lp;
0823 4 struct el *p; 4 struct el *p;
0824 5 5
0825 6 read_lock(&listmutex); 6 rcu_read_lock();
0826 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
0827 8 if (p->key == key) { 8 if (p->key == key) {
0828 9 *result = p->data; 9 *result = p->data;
0829 10 read_unlock(&listmutex); 10 rcu_read_unlock();
0830 11 return 1; 11 return 1;
0831 12 } 12 }
0832 13 } 13 }
0833 14 read_unlock(&listmutex); 14 rcu_read_unlock();
0834 15 return 0; 15 return 0;
0835 16 } 16 }
0836
0837 ::
0838
0839 1 int delete(long key) 1 int delete(long key)
0840 2 { 2 {
0841 3 struct el *p; 3 struct el *p;
0842 4 4
0843 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
0844 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
0845 7 if (p->key == key) { 7 if (p->key == key) {
0846 8 list_del(&p->list); 8 list_del_rcu(&p->list);
0847 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
0848 10 synchronize_rcu();
0849 10 kfree(p); 11 kfree(p);
0850 11 return 1; 12 return 1;
0851 12 } 13 }
0852 13 } 14 }
0853 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
0854 15 return 0; 16 return 0;
0855 16 } 17 }
0856
0857 Either way, the differences are quite small. Read-side locking moves
0858 to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
0859 a reader-writer lock to a simple spinlock, and a synchronize_rcu()
0860 precedes the kfree().
0861
0862 However, there is one potential catch: the read-side and update-side
0863 critical sections can now run concurrently. In many cases, this will
0864 not be a problem, but it is necessary to check carefully regardless.
0865 For example, if multiple independent list updates must be seen as
0866 a single atomic update, converting to RCU will require special care.
0867
0868 Also, the presence of synchronize_rcu() means that the RCU version of
0869 delete() can now block. If this is a problem, there is a callback-based
0870 mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
0871 be used in place of synchronize_rcu().
0872
0873 .. _7_whatisRCU:
0874
0875 7. ANALOGY WITH REFERENCE COUNTING
0876 -----------------------------------
0877
0878 The reader-writer analogy (illustrated by the previous section) is not
0879 always the best way to think about using RCU. Another helpful analogy
0880 considers RCU an effective reference count on everything which is
0881 protected by RCU.
0882
0883 A reference count typically does not prevent the referenced object's
0884 values from changing, but does prevent changes to type -- particularly the
0885 gross change of type that happens when that object's memory is freed and
0886 re-allocated for some other purpose. Once a type-safe reference to the
0887 object is obtained, some other mechanism is needed to ensure consistent
0888 access to the data in the object. This could involve taking a spinlock,
0889 but with RCU the typical approach is to perform reads with SMP-aware
0890 operations such as smp_load_acquire(), to perform updates with atomic
0891 read-modify-write operations, and to provide the necessary ordering.
0892 RCU provides a number of support functions that embed the required
0893 operations and ordering, such as the list_for_each_entry_rcu() macro
0894 used in the previous section.
0895
0896 A more focused view of the reference counting behavior is that,
0897 between rcu_read_lock() and rcu_read_unlock(), any reference taken with
0898 rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
0899 though a reference-count on that object has been temporarily increased.
0900 This prevents the object from changing type. Exactly what this means
0901 will depend on normal expectations of objects of that type, but it
0902 typically includes that spinlocks can still be safely locked, normal
0903 reference counters can be safely manipulated, and ``__rcu`` pointers
0904 can be safely dereferenced.
0905
0906 Some operations that one might expect to see on an object for
0907 which an RCU reference is held include:
0908
0909 - Copying out data that is guaranteed to be stable by the object's type.
0910 - Using kref_get_unless_zero() or similar to get a longer-term
0911 reference. This may fail of course.
0912 - Acquiring a spinlock in the object, and checking if the object still
0913 is the expected object and if so, manipulating it freely.
0914
0915 The understanding that RCU provides a reference that only prevents a
0916 change of type is particularly visible with objects allocated from a
0917 slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
0918 reference to an object from such a cache that has been concurrently
0919 freed and the memory reallocated to a completely different object,
0920 though of the same type. In this case RCU doesn't even protect the
0921 identity of the object from changing, only its type. So the object
0922 found may not be the one expected, but it will be one where it is safe
0923 to take a reference or spinlock and then confirm that the identity
0924 matches the expectations.
0925
0926 With traditional reference counting -- such as that implemented by the
0927 kref library in Linux -- there is typically code that runs when the last
0928 reference to an object is dropped. With kref, this is the function
0929 passed to kref_put(). When RCU is being used, such finalization code
0930 must not be run until all ``__rcu`` pointers referencing the object have
0931 been updated, and then a grace period has passed. Every remaining
0932 globally visible pointer to the object must be considered to be a
0933 potential counted reference, and the finalization code is typically run
0934 using call_rcu() only after all those pointers have been changed.
0935
0936 To see how to choose between these two analogies -- of RCU as a
0937 reader-writer lock and RCU as a reference counting system -- it is useful
0938 to reflect on the scale of the thing being protected. The reader-writer
0939 lock analogy looks at larger multi-part objects such as a linked list
0940 and shows how RCU can facilitate concurrency while elements are added
0941 to, and removed from, the list. The reference-count analogy looks at
0942 the individual objects and looks at how they can be accessed safely
0943 within whatever whole they are a part of.
0944
0945 .. _8_whatisRCU:
0946
0947 8. FULL LIST OF RCU APIs
0948 -------------------------
0949
0950 The RCU APIs are documented in docbook-format header comments in the
0951 Linux-kernel source code, but it helps to have a full list of the
0952 APIs, since there does not appear to be a way to categorize them
0953 in docbook. Here is the list, by category.
0954
0955 RCU list traversal::
0956
0957 list_entry_rcu
0958 list_entry_lockless
0959 list_first_entry_rcu
0960 list_next_rcu
0961 list_for_each_entry_rcu
0962 list_for_each_entry_continue_rcu
0963 list_for_each_entry_from_rcu
0964 list_first_or_null_rcu
0965 list_next_or_null_rcu
0966 hlist_first_rcu
0967 hlist_next_rcu
0968 hlist_pprev_rcu
0969 hlist_for_each_entry_rcu
0970 hlist_for_each_entry_rcu_bh
0971 hlist_for_each_entry_from_rcu
0972 hlist_for_each_entry_continue_rcu
0973 hlist_for_each_entry_continue_rcu_bh
0974 hlist_nulls_first_rcu
0975 hlist_nulls_for_each_entry_rcu
0976 hlist_bl_first_rcu
0977 hlist_bl_for_each_entry_rcu
0978
0979 RCU pointer/list update::
0980
0981 rcu_assign_pointer
0982 list_add_rcu
0983 list_add_tail_rcu
0984 list_del_rcu
0985 list_replace_rcu
0986 hlist_add_behind_rcu
0987 hlist_add_before_rcu
0988 hlist_add_head_rcu
0989 hlist_add_tail_rcu
0990 hlist_del_rcu
0991 hlist_del_init_rcu
0992 hlist_replace_rcu
0993 list_splice_init_rcu
0994 list_splice_tail_init_rcu
0995 hlist_nulls_del_init_rcu
0996 hlist_nulls_del_rcu
0997 hlist_nulls_add_head_rcu
0998 hlist_bl_add_head_rcu
0999 hlist_bl_del_init_rcu
1000 hlist_bl_del_rcu
1001 hlist_bl_set_first_rcu
1002
1003 RCU::
1004
1005 Critical sections Grace period Barrier
1006
1007 rcu_read_lock synchronize_net rcu_barrier
1008 rcu_read_unlock synchronize_rcu
1009 rcu_dereference synchronize_rcu_expedited
1010 rcu_read_lock_held call_rcu
1011 rcu_dereference_check kfree_rcu
1012 rcu_dereference_protected
1013
1014 bh::
1015
1016 Critical sections Grace period Barrier
1017
1018 rcu_read_lock_bh call_rcu rcu_barrier
1019 rcu_read_unlock_bh synchronize_rcu
1020 [local_bh_disable] synchronize_rcu_expedited
1021 [and friends]
1022 rcu_dereference_bh
1023 rcu_dereference_bh_check
1024 rcu_dereference_bh_protected
1025 rcu_read_lock_bh_held
1026
1027 sched::
1028
1029 Critical sections Grace period Barrier
1030
1031 rcu_read_lock_sched call_rcu rcu_barrier
1032 rcu_read_unlock_sched synchronize_rcu
1033 [preempt_disable] synchronize_rcu_expedited
1034 [and friends]
1035 rcu_read_lock_sched_notrace
1036 rcu_read_unlock_sched_notrace
1037 rcu_dereference_sched
1038 rcu_dereference_sched_check
1039 rcu_dereference_sched_protected
1040 rcu_read_lock_sched_held
1041
1042
1043 SRCU::
1044
1045 Critical sections Grace period Barrier
1046
1047 srcu_read_lock call_srcu srcu_barrier
1048 srcu_read_unlock synchronize_srcu
1049 srcu_dereference synchronize_srcu_expedited
1050 srcu_dereference_check
1051 srcu_read_lock_held
1052
1053 SRCU: Initialization/cleanup::
1054
1055 DEFINE_SRCU
1056 DEFINE_STATIC_SRCU
1057 init_srcu_struct
1058 cleanup_srcu_struct
1059
1060 All: lockdep-checked RCU-protected pointer access::
1061
1062 rcu_access_pointer
1063 rcu_dereference_raw
1064 RCU_LOCKDEP_WARN
1065 rcu_sleep_check
1066 RCU_NONIDLE
1067
1068 See the comment headers in the source code (or the docbook generated
1069 from them) for more information.
1070
1071 However, given that there are no fewer than four families of RCU APIs
1072 in the Linux kernel, how do you choose which one to use? The following
1073 list can be helpful:
1074
1075 a. Will readers need to block? If so, you need SRCU.
1076
1077 b. What about the -rt patchset? If readers would need to block
1078 in an non-rt kernel, you need SRCU. If readers would block
1079 in a -rt kernel, but not in a non-rt kernel, SRCU is not
1080 necessary. (The -rt patchset turns spinlocks into sleeplocks,
1081 hence this distinction.)
1082
1083 c. Do you need to treat NMI handlers, hardirq handlers,
1084 and code segments with preemption disabled (whether
1085 via preempt_disable(), local_irq_save(), local_bh_disable(),
1086 or some other mechanism) as if they were explicit RCU readers?
1087 If so, RCU-sched is the only choice that will work for you.
1088
1089 d. Do you need RCU grace periods to complete even in the face
1090 of softirq monopolization of one or more of the CPUs? For
1091 example, is your code subject to network-based denial-of-service
1092 attacks? If so, you should disable softirq across your readers,
1093 for example, by using rcu_read_lock_bh().
1094
1095 e. Is your workload too update-intensive for normal use of
1096 RCU, but inappropriate for other synchronization mechanisms?
1097 If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1098 named SLAB_DESTROY_BY_RCU). But please be careful!
1099
1100 f. Do you need read-side critical sections that are respected
1101 even though they are in the middle of the idle loop, during
1102 user-mode execution, or on an offlined CPU? If so, SRCU is the
1103 only choice that will work for you.
1104
1105 g. Otherwise, use RCU.
1106
1107 Of course, this all assumes that you have determined that RCU is in fact
1108 the right tool for your job.
1109
1110 .. _9_whatisRCU:
1111
1112 9. ANSWERS TO QUICK QUIZZES
1113 ----------------------------
1114
1115 Quick Quiz #1:
1116 Why is this argument naive? How could a deadlock
1117 occur when using this algorithm in a real-world Linux
1118 kernel? [Referring to the lock-based "toy" RCU
1119 algorithm.]
1120
1121 Answer:
1122 Consider the following sequence of events:
1123
1124 1. CPU 0 acquires some unrelated lock, call it
1125 "problematic_lock", disabling irq via
1126 spin_lock_irqsave().
1127
1128 2. CPU 1 enters synchronize_rcu(), write-acquiring
1129 rcu_gp_mutex.
1130
1131 3. CPU 0 enters rcu_read_lock(), but must wait
1132 because CPU 1 holds rcu_gp_mutex.
1133
1134 4. CPU 1 is interrupted, and the irq handler
1135 attempts to acquire problematic_lock.
1136
1137 The system is now deadlocked.
1138
1139 One way to avoid this deadlock is to use an approach like
1140 that of CONFIG_PREEMPT_RT, where all normal spinlocks
1141 become blocking locks, and all irq handlers execute in
1142 the context of special tasks. In this case, in step 4
1143 above, the irq handler would block, allowing CPU 1 to
1144 release rcu_gp_mutex, avoiding the deadlock.
1145
1146 Even in the absence of deadlock, this RCU implementation
1147 allows latency to "bleed" from readers to other
1148 readers through synchronize_rcu(). To see this,
1149 consider task A in an RCU read-side critical section
1150 (thus read-holding rcu_gp_mutex), task B blocked
1151 attempting to write-acquire rcu_gp_mutex, and
1152 task C blocked in rcu_read_lock() attempting to
1153 read_acquire rcu_gp_mutex. Task A's RCU read-side
1154 latency is holding up task C, albeit indirectly via
1155 task B.
1156
1157 Realtime RCU implementations therefore use a counter-based
1158 approach where tasks in RCU read-side critical sections
1159 cannot be blocked by tasks executing synchronize_rcu().
1160
1161 :ref:`Back to Quick Quiz #1 <quiz_1>`
1162
1163 Quick Quiz #2:
1164 Give an example where Classic RCU's read-side
1165 overhead is **negative**.
1166
1167 Answer:
1168 Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1169 kernel where a routing table is used by process-context
1170 code, but can be updated by irq-context code (for example,
1171 by an "ICMP REDIRECT" packet). The usual way of handling
1172 this would be to have the process-context code disable
1173 interrupts while searching the routing table. Use of
1174 RCU allows such interrupt-disabling to be dispensed with.
1175 Thus, without RCU, you pay the cost of disabling interrupts,
1176 and with RCU you don't.
1177
1178 One can argue that the overhead of RCU in this
1179 case is negative with respect to the single-CPU
1180 interrupt-disabling approach. Others might argue that
1181 the overhead of RCU is merely zero, and that replacing
1182 the positive overhead of the interrupt-disabling scheme
1183 with the zero-overhead RCU scheme does not constitute
1184 negative overhead.
1185
1186 In real life, of course, things are more complex. But
1187 even the theoretical possibility of negative overhead for
1188 a synchronization primitive is a bit unexpected. ;-)
1189
1190 :ref:`Back to Quick Quiz #2 <quiz_2>`
1191
1192 Quick Quiz #3:
1193 If it is illegal to block in an RCU read-side
1194 critical section, what the heck do you do in
1195 CONFIG_PREEMPT_RT, where normal spinlocks can block???
1196
1197 Answer:
1198 Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1199 critical sections, it permits preemption of RCU
1200 read-side critical sections. It also permits
1201 spinlocks blocking while in RCU read-side critical
1202 sections.
1203
1204 Why the apparent inconsistency? Because it is
1205 possible to use priority boosting to keep the RCU
1206 grace periods short if need be (for example, if running
1207 short of memory). In contrast, if blocking waiting
1208 for (say) network reception, there is no way to know
1209 what should be boosted. Especially given that the
1210 process we need to boost might well be a human being
1211 who just went out for a pizza or something. And although
1212 a computer-operated cattle prod might arouse serious
1213 interest, it might also provoke serious objections.
1214 Besides, how does the computer know what pizza parlor
1215 the human being went to???
1216
1217 :ref:`Back to Quick Quiz #3 <quiz_3>`
1218
1219 ACKNOWLEDGEMENTS
1220
1221 My thanks to the people who helped make this human-readable, including
1222 Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1223
1224
1225 For more information, see http://www.rdrop.com/users/paulmck/RCU.