Back to home page

OSCL-LXR

 
 

    


0001 ===========================================================================
0002 Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
0003 ===========================================================================
0004 
0005 :Author: Robert Love <rml@tech9.net>
0006 
0007 
0008 Introduction
0009 ============
0010 
0011 
0012 A preemptible kernel creates new locking issues.  The issues are the same as
0013 those under SMP: concurrency and reentrancy.  Thankfully, the Linux preemptible
0014 kernel model leverages existing SMP locking mechanisms.  Thus, the kernel
0015 requires explicit additional locking for very few additional situations.
0016 
0017 This document is for all kernel hackers.  Developing code in the kernel
0018 requires protecting these situations.
0019  
0020 
0021 RULE #1: Per-CPU data structures need explicit protection
0022 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0023 
0024 
0025 Two similar problems arise. An example code snippet::
0026 
0027         struct this_needs_locking tux[NR_CPUS];
0028         tux[smp_processor_id()] = some_value;
0029         /* task is preempted here... */
0030         something = tux[smp_processor_id()];
0031 
0032 First, since the data is per-CPU, it may not have explicit SMP locking, but
0033 require it otherwise.  Second, when a preempted task is finally rescheduled,
0034 the previous value of smp_processor_id may not equal the current.  You must
0035 protect these situations by disabling preemption around them.
0036 
0037 You can also use put_cpu() and get_cpu(), which will disable preemption.
0038 
0039 
0040 RULE #2: CPU state must be protected.
0041 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0042 
0043 
0044 Under preemption, the state of the CPU must be protected.  This is arch-
0045 dependent, but includes CPU structures and state not preserved over a context
0046 switch.  For example, on x86, entering and exiting FPU mode is now a critical
0047 section that must occur while preemption is disabled.  Think what would happen
0048 if the kernel is executing a floating-point instruction and is then preempted.
0049 Remember, the kernel does not save FPU state except for user tasks.  Therefore,
0050 upon preemption, the FPU registers will be sold to the lowest bidder.  Thus,
0051 preemption must be disabled around such regions.
0052 
0053 Note, some FPU functions are already explicitly preempt safe.  For example,
0054 kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
0055 
0056 
0057 RULE #3: Lock acquire and release must be performed by same task
0058 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0059 
0060 
0061 A lock acquired in one task must be released by the same task.  This
0062 means you can't do oddball things like acquire a lock and go off to
0063 play while another task releases it.  If you want to do something
0064 like this, acquire and release the task in the same code path and
0065 have the caller wait on an event by the other task.
0066 
0067 
0068 Solution
0069 ========
0070 
0071 
0072 Data protection under preemption is achieved by disabling preemption for the
0073 duration of the critical region.
0074 
0075 ::
0076 
0077   preempt_enable()              decrement the preempt counter
0078   preempt_disable()             increment the preempt counter
0079   preempt_enable_no_resched()   decrement, but do not immediately preempt
0080   preempt_check_resched()       if needed, reschedule
0081   preempt_count()               return the preempt counter
0082 
0083 The functions are nestable.  In other words, you can call preempt_disable
0084 n-times in a code path, and preemption will not be reenabled until the n-th
0085 call to preempt_enable.  The preempt statements define to nothing if
0086 preemption is not enabled.
0087 
0088 Note that you do not need to explicitly prevent preemption if you are holding
0089 any locks or interrupts are disabled, since preemption is implicitly disabled
0090 in those cases.
0091 
0092 But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
0093 disabling preemption - any cond_resched() or cond_resched_lock() might trigger
0094 a reschedule if the preempt count is 0. A simple printk() might trigger a
0095 reschedule. So use this implicit preemption-disabling property only if you
0096 know that the affected codepath does not do any of this. Best policy is to use
0097 this only for small, atomic code that you wrote and which calls no complex
0098 functions.
0099 
0100 Example::
0101 
0102         cpucache_t *cc; /* this is per-CPU */
0103         preempt_disable();
0104         cc = cc_data(searchp);
0105         if (cc && cc->avail) {
0106                 __free_block(searchp, cc_entry(cc), cc->avail);
0107                 cc->avail = 0;
0108         }
0109         preempt_enable();
0110         return 0;
0111 
0112 Notice how the preemption statements must encompass every reference of the
0113 critical variables.  Another example::
0114 
0115         int buf[NR_CPUS];
0116         set_cpu_val(buf);
0117         if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
0118         spin_lock(&buf_lock);
0119         /* ... */
0120 
0121 This code is not preempt-safe, but see how easily we can fix it by simply
0122 moving the spin_lock up two lines.
0123 
0124 
0125 Preventing preemption using interrupt disabling
0126 ===============================================
0127 
0128 
0129 It is possible to prevent a preemption event using local_irq_disable and
0130 local_irq_save.  Note, when doing so, you must be very careful to not cause
0131 an event that would set need_resched and result in a preemption check.  When
0132 in doubt, rely on locking or explicit preemption disabling.
0133 
0134 Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
0135 
0136 An additional concern is proper usage of local_irq_disable and local_irq_save.
0137 These may be used to protect from preemption, however, on exit, if preemption
0138 may be enabled, a test to see if preemption is required should be done.  If
0139 these are called from the spin_lock and read/write lock macros, the right thing
0140 is done.  They may also be called within a spin-lock protected region, however,
0141 if they are ever called outside of this context, a test for preemption should
0142 be made. Do note that calls from interrupt context or bottom half/ tasklets
0143 are also protected by preemption locks and so may use the versions which do
0144 not check preemption.