0001 ======================================
0002 Sequence counters and sequential locks
0003 ======================================
0004
0005 Introduction
0006 ============
0007
0008 Sequence counters are a reader-writer consistency mechanism with
0009 lockless readers (read-only retry loops), and no writer starvation. They
0010 are used for data that's rarely written to (e.g. system time), where the
0011 reader wants a consistent set of information and is willing to retry if
0012 that information changes.
0013
0014 A data set is consistent when the sequence count at the beginning of the
0015 read side critical section is even and the same sequence count value is
0016 read again at the end of the critical section. The data in the set must
0017 be copied out inside the read side critical section. If the sequence
0018 count has changed between the start and the end of the critical section,
0019 the reader must retry.
0020
0021 Writers increment the sequence count at the start and the end of their
0022 critical section. After starting the critical section the sequence count
0023 is odd and indicates to the readers that an update is in progress. At
0024 the end of the write side critical section the sequence count becomes
0025 even again which lets readers make progress.
0026
0027 A sequence counter write side critical section must never be preempted
0028 or interrupted by read side sections. Otherwise the reader will spin for
0029 the entire scheduler tick due to the odd sequence count value and the
0030 interrupted writer. If that reader belongs to a real-time scheduling
0031 class, it can spin forever and the kernel will livelock.
0032
0033 This mechanism cannot be used if the protected data contains pointers,
0034 as the writer can invalidate a pointer that the reader is following.
0035
0036
0037 .. _seqcount_t:
0038
0039 Sequence counters (``seqcount_t``)
0040 ==================================
0041
0042 This is the the raw counting mechanism, which does not protect against
0043 multiple writers. Write side critical sections must thus be serialized
0044 by an external lock.
0045
0046 If the write serialization primitive is not implicitly disabling
0047 preemption, preemption must be explicitly disabled before entering the
0048 write side section. If the read section can be invoked from hardirq or
0049 softirq contexts, interrupts or bottom halves must also be respectively
0050 disabled before entering the write section.
0051
0052 If it's desired to automatically handle the sequence counter
0053 requirements of writer serialization and non-preemptibility, use
0054 :ref:`seqlock_t` instead.
0055
0056 Initialization::
0057
0058 /* dynamic */
0059 seqcount_t foo_seqcount;
0060 seqcount_init(&foo_seqcount);
0061
0062 /* static */
0063 static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
0064
0065 /* C99 struct init */
0066 struct {
0067 .seq = SEQCNT_ZERO(foo.seq),
0068 } foo;
0069
0070 Write path::
0071
0072 /* Serialized context with disabled preemption */
0073
0074 write_seqcount_begin(&foo_seqcount);
0075
0076 /* ... [[write-side critical section]] ... */
0077
0078 write_seqcount_end(&foo_seqcount);
0079
0080 Read path::
0081
0082 do {
0083 seq = read_seqcount_begin(&foo_seqcount);
0084
0085 /* ... [[read-side critical section]] ... */
0086
0087 } while (read_seqcount_retry(&foo_seqcount, seq));
0088
0089
0090 .. _seqcount_locktype_t:
0091
0092 Sequence counters with associated locks (``seqcount_LOCKNAME_t``)
0093 -----------------------------------------------------------------
0094
0095 As discussed at :ref:`seqcount_t`, sequence count write side critical
0096 sections must be serialized and non-preemptible. This variant of
0097 sequence counters associate the lock used for writer serialization at
0098 initialization time, which enables lockdep to validate that the write
0099 side critical sections are properly serialized.
0100
0101 This lock association is a NOOP if lockdep is disabled and has neither
0102 storage nor runtime overhead. If lockdep is enabled, the lock pointer is
0103 stored in struct seqcount and lockdep's "lock is held" assertions are
0104 injected at the beginning of the write side critical section to validate
0105 that it is properly protected.
0106
0107 For lock types which do not implicitly disable preemption, preemption
0108 protection is enforced in the write side function.
0109
0110 The following sequence counters with associated locks are defined:
0111
0112 - ``seqcount_spinlock_t``
0113 - ``seqcount_raw_spinlock_t``
0114 - ``seqcount_rwlock_t``
0115 - ``seqcount_mutex_t``
0116 - ``seqcount_ww_mutex_t``
0117
0118 The sequence counter read and write APIs can take either a plain
0119 seqcount_t or any of the seqcount_LOCKNAME_t variants above.
0120
0121 Initialization (replace "LOCKNAME" with one of the supported locks)::
0122
0123 /* dynamic */
0124 seqcount_LOCKNAME_t foo_seqcount;
0125 seqcount_LOCKNAME_init(&foo_seqcount, &lock);
0126
0127 /* static */
0128 static seqcount_LOCKNAME_t foo_seqcount =
0129 SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock);
0130
0131 /* C99 struct init */
0132 struct {
0133 .seq = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock),
0134 } foo;
0135
0136 Write path: same as in :ref:`seqcount_t`, while running from a context
0137 with the associated write serialization lock acquired.
0138
0139 Read path: same as in :ref:`seqcount_t`.
0140
0141
0142 .. _seqcount_latch_t:
0143
0144 Latch sequence counters (``seqcount_latch_t``)
0145 ----------------------------------------------
0146
0147 Latch sequence counters are a multiversion concurrency control mechanism
0148 where the embedded seqcount_t counter even/odd value is used to switch
0149 between two copies of protected data. This allows the sequence counter
0150 read path to safely interrupt its own write side critical section.
0151
0152 Use seqcount_latch_t when the write side sections cannot be protected
0153 from interruption by readers. This is typically the case when the read
0154 side can be invoked from NMI handlers.
0155
0156 Check `raw_write_seqcount_latch()` for more information.
0157
0158
0159 .. _seqlock_t:
0160
0161 Sequential locks (``seqlock_t``)
0162 ================================
0163
0164 This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
0165 embedded spinlock for writer serialization and non-preemptibility.
0166
0167 If the read side section can be invoked from hardirq or softirq context,
0168 use the write side function variants which disable interrupts or bottom
0169 halves respectively.
0170
0171 Initialization::
0172
0173 /* dynamic */
0174 seqlock_t foo_seqlock;
0175 seqlock_init(&foo_seqlock);
0176
0177 /* static */
0178 static DEFINE_SEQLOCK(foo_seqlock);
0179
0180 /* C99 struct init */
0181 struct {
0182 .seql = __SEQLOCK_UNLOCKED(foo.seql)
0183 } foo;
0184
0185 Write path::
0186
0187 write_seqlock(&foo_seqlock);
0188
0189 /* ... [[write-side critical section]] ... */
0190
0191 write_sequnlock(&foo_seqlock);
0192
0193 Read path, three categories:
0194
0195 1. Normal Sequence readers which never block a writer but they must
0196 retry if a writer is in progress by detecting change in the sequence
0197 number. Writers do not wait for a sequence reader::
0198
0199 do {
0200 seq = read_seqbegin(&foo_seqlock);
0201
0202 /* ... [[read-side critical section]] ... */
0203
0204 } while (read_seqretry(&foo_seqlock, seq));
0205
0206 2. Locking readers which will wait if a writer or another locking reader
0207 is in progress. A locking reader in progress will also block a writer
0208 from entering its critical section. This read lock is
0209 exclusive. Unlike rwlock_t, only one locking reader can acquire it::
0210
0211 read_seqlock_excl(&foo_seqlock);
0212
0213 /* ... [[read-side critical section]] ... */
0214
0215 read_sequnlock_excl(&foo_seqlock);
0216
0217 3. Conditional lockless reader (as in 1), or locking reader (as in 2),
0218 according to a passed marker. This is used to avoid lockless readers
0219 starvation (too much retry loops) in case of a sharp spike in write
0220 activity. First, a lockless read is tried (even marker passed). If
0221 that trial fails (odd sequence counter is returned, which is used as
0222 the next iteration marker), the lockless read is transformed to a
0223 full locking read and no retry loop is necessary::
0224
0225 /* marker; even initialization */
0226 int seq = 0;
0227 do {
0228 read_seqbegin_or_lock(&foo_seqlock, &seq);
0229
0230 /* ... [[read-side critical section]] ... */
0231
0232 } while (need_seqretry(&foo_seqlock, seq));
0233 done_seqretry(&foo_seqlock, seq);
0234
0235
0236 API documentation
0237 =================
0238
0239 .. kernel-doc:: include/linux/seqlock.h