0001 ===========================================================================
0002 Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
0003 ===========================================================================
0004
0005 :Author: Robert Love <rml@tech9.net>
0006
0007
0008 Introduction
0009 ============
0010
0011
0012 A preemptible kernel creates new locking issues. The issues are the same as
0013 those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible
0014 kernel model leverages existing SMP locking mechanisms. Thus, the kernel
0015 requires explicit additional locking for very few additional situations.
0016
0017 This document is for all kernel hackers. Developing code in the kernel
0018 requires protecting these situations.
0019
0020
0021 RULE #1: Per-CPU data structures need explicit protection
0022 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0023
0024
0025 Two similar problems arise. An example code snippet::
0026
0027 struct this_needs_locking tux[NR_CPUS];
0028 tux[smp_processor_id()] = some_value;
0029 /* task is preempted here... */
0030 something = tux[smp_processor_id()];
0031
0032 First, since the data is per-CPU, it may not have explicit SMP locking, but
0033 require it otherwise. Second, when a preempted task is finally rescheduled,
0034 the previous value of smp_processor_id may not equal the current. You must
0035 protect these situations by disabling preemption around them.
0036
0037 You can also use put_cpu() and get_cpu(), which will disable preemption.
0038
0039
0040 RULE #2: CPU state must be protected.
0041 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0042
0043
0044 Under preemption, the state of the CPU must be protected. This is arch-
0045 dependent, but includes CPU structures and state not preserved over a context
0046 switch. For example, on x86, entering and exiting FPU mode is now a critical
0047 section that must occur while preemption is disabled. Think what would happen
0048 if the kernel is executing a floating-point instruction and is then preempted.
0049 Remember, the kernel does not save FPU state except for user tasks. Therefore,
0050 upon preemption, the FPU registers will be sold to the lowest bidder. Thus,
0051 preemption must be disabled around such regions.
0052
0053 Note, some FPU functions are already explicitly preempt safe. For example,
0054 kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
0055
0056
0057 RULE #3: Lock acquire and release must be performed by same task
0058 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0059
0060
0061 A lock acquired in one task must be released by the same task. This
0062 means you can't do oddball things like acquire a lock and go off to
0063 play while another task releases it. If you want to do something
0064 like this, acquire and release the task in the same code path and
0065 have the caller wait on an event by the other task.
0066
0067
0068 Solution
0069 ========
0070
0071
0072 Data protection under preemption is achieved by disabling preemption for the
0073 duration of the critical region.
0074
0075 ::
0076
0077 preempt_enable() decrement the preempt counter
0078 preempt_disable() increment the preempt counter
0079 preempt_enable_no_resched() decrement, but do not immediately preempt
0080 preempt_check_resched() if needed, reschedule
0081 preempt_count() return the preempt counter
0082
0083 The functions are nestable. In other words, you can call preempt_disable
0084 n-times in a code path, and preemption will not be reenabled until the n-th
0085 call to preempt_enable. The preempt statements define to nothing if
0086 preemption is not enabled.
0087
0088 Note that you do not need to explicitly prevent preemption if you are holding
0089 any locks or interrupts are disabled, since preemption is implicitly disabled
0090 in those cases.
0091
0092 But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
0093 disabling preemption - any cond_resched() or cond_resched_lock() might trigger
0094 a reschedule if the preempt count is 0. A simple printk() might trigger a
0095 reschedule. So use this implicit preemption-disabling property only if you
0096 know that the affected codepath does not do any of this. Best policy is to use
0097 this only for small, atomic code that you wrote and which calls no complex
0098 functions.
0099
0100 Example::
0101
0102 cpucache_t *cc; /* this is per-CPU */
0103 preempt_disable();
0104 cc = cc_data(searchp);
0105 if (cc && cc->avail) {
0106 __free_block(searchp, cc_entry(cc), cc->avail);
0107 cc->avail = 0;
0108 }
0109 preempt_enable();
0110 return 0;
0111
0112 Notice how the preemption statements must encompass every reference of the
0113 critical variables. Another example::
0114
0115 int buf[NR_CPUS];
0116 set_cpu_val(buf);
0117 if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
0118 spin_lock(&buf_lock);
0119 /* ... */
0120
0121 This code is not preempt-safe, but see how easily we can fix it by simply
0122 moving the spin_lock up two lines.
0123
0124
0125 Preventing preemption using interrupt disabling
0126 ===============================================
0127
0128
0129 It is possible to prevent a preemption event using local_irq_disable and
0130 local_irq_save. Note, when doing so, you must be very careful to not cause
0131 an event that would set need_resched and result in a preemption check. When
0132 in doubt, rely on locking or explicit preemption disabling.
0133
0134 Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
0135
0136 An additional concern is proper usage of local_irq_disable and local_irq_save.
0137 These may be used to protect from preemption, however, on exit, if preemption
0138 may be enabled, a test to see if preemption is required should be done. If
0139 these are called from the spin_lock and read/write lock macros, the right thing
0140 is done. They may also be called within a spin-lock protected region, however,
0141 if they are ever called outside of this context, a test for preemption should
0142 be made. Do note that calls from interrupt context or bottom half/ tasklets
0143 are also protected by preemption locks and so may use the versions which do
0144 not check preemption.