Back to home page

LXR

 
 

    


0001                   Proper Locking Under a Preemptible Kernel:
0002                        Keeping Kernel Code Preempt-Safe
0003                          Robert Love <rml@tech9.net>
0004                           Last Updated: 28 Aug 2002
0005 
0006 
0007 INTRODUCTION
0008 
0009 
0010 A preemptible kernel creates new locking issues.  The issues are the same as
0011 those under SMP: concurrency and reentrancy.  Thankfully, the Linux preemptible
0012 kernel model leverages existing SMP locking mechanisms.  Thus, the kernel
0013 requires explicit additional locking for very few additional situations.
0014 
0015 This document is for all kernel hackers.  Developing code in the kernel
0016 requires protecting these situations.
0017  
0018 
0019 RULE #1: Per-CPU data structures need explicit protection
0020 
0021 
0022 Two similar problems arise. An example code snippet:
0023 
0024         struct this_needs_locking tux[NR_CPUS];
0025         tux[smp_processor_id()] = some_value;
0026         /* task is preempted here... */
0027         something = tux[smp_processor_id()];
0028 
0029 First, since the data is per-CPU, it may not have explicit SMP locking, but
0030 require it otherwise.  Second, when a preempted task is finally rescheduled,
0031 the previous value of smp_processor_id may not equal the current.  You must
0032 protect these situations by disabling preemption around them.
0033 
0034 You can also use put_cpu() and get_cpu(), which will disable preemption.
0035 
0036 
0037 RULE #2: CPU state must be protected.
0038 
0039 
0040 Under preemption, the state of the CPU must be protected.  This is arch-
0041 dependent, but includes CPU structures and state not preserved over a context
0042 switch.  For example, on x86, entering and exiting FPU mode is now a critical
0043 section that must occur while preemption is disabled.  Think what would happen
0044 if the kernel is executing a floating-point instruction and is then preempted.
0045 Remember, the kernel does not save FPU state except for user tasks.  Therefore,
0046 upon preemption, the FPU registers will be sold to the lowest bidder.  Thus,
0047 preemption must be disabled around such regions.
0048 
0049 Note, some FPU functions are already explicitly preempt safe.  For example,
0050 kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
0051 However, fpu__restore() must be called with preemption disabled.
0052 
0053 
0054 RULE #3: Lock acquire and release must be performed by same task
0055 
0056 
0057 A lock acquired in one task must be released by the same task.  This
0058 means you can't do oddball things like acquire a lock and go off to
0059 play while another task releases it.  If you want to do something
0060 like this, acquire and release the task in the same code path and
0061 have the caller wait on an event by the other task.
0062 
0063 
0064 SOLUTION
0065 
0066 
0067 Data protection under preemption is achieved by disabling preemption for the
0068 duration of the critical region.
0069 
0070 preempt_enable()                decrement the preempt counter
0071 preempt_disable()               increment the preempt counter
0072 preempt_enable_no_resched()     decrement, but do not immediately preempt
0073 preempt_check_resched()         if needed, reschedule
0074 preempt_count()                 return the preempt counter
0075 
0076 The functions are nestable.  In other words, you can call preempt_disable
0077 n-times in a code path, and preemption will not be reenabled until the n-th
0078 call to preempt_enable.  The preempt statements define to nothing if
0079 preemption is not enabled.
0080 
0081 Note that you do not need to explicitly prevent preemption if you are holding
0082 any locks or interrupts are disabled, since preemption is implicitly disabled
0083 in those cases.
0084 
0085 But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
0086 disabling preemption - any spin_unlock() decreasing the preemption count
0087 to 0 might trigger a reschedule. A simple printk() might trigger a reschedule.
0088 So use this implicit preemption-disabling property only if you know that the
0089 affected codepath does not do any of this. Best policy is to use this only for
0090 small, atomic code that you wrote and which calls no complex functions.
0091 
0092 Example:
0093 
0094         cpucache_t *cc; /* this is per-CPU */
0095         preempt_disable();
0096         cc = cc_data(searchp);
0097         if (cc && cc->avail) {
0098                 __free_block(searchp, cc_entry(cc), cc->avail);
0099                 cc->avail = 0;
0100         }
0101         preempt_enable();
0102         return 0;
0103 
0104 Notice how the preemption statements must encompass every reference of the
0105 critical variables.  Another example:
0106 
0107         int buf[NR_CPUS];
0108         set_cpu_val(buf);
0109         if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
0110         spin_lock(&buf_lock);
0111         /* ... */
0112 
0113 This code is not preempt-safe, but see how easily we can fix it by simply
0114 moving the spin_lock up two lines.
0115 
0116 
0117 PREVENTING PREEMPTION USING INTERRUPT DISABLING
0118 
0119 
0120 It is possible to prevent a preemption event using local_irq_disable and
0121 local_irq_save.  Note, when doing so, you must be very careful to not cause
0122 an event that would set need_resched and result in a preemption check.  When
0123 in doubt, rely on locking or explicit preemption disabling.
0124 
0125 Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
0126 
0127 An additional concern is proper usage of local_irq_disable and local_irq_save.
0128 These may be used to protect from preemption, however, on exit, if preemption
0129 may be enabled, a test to see if preemption is required should be done.  If
0130 these are called from the spin_lock and read/write lock macros, the right thing
0131 is done.  They may also be called within a spin-lock protected region, however,
0132 if they are ever called outside of this context, a test for preemption should
0133 be made. Do note that calls from interrupt context or bottom half/ tasklets
0134 are also protected by preemption locks and so may use the versions which do
0135 not check preemption.