the-tree/Documentation/atomic_t.txt

0001
0002 On atomic types (atomic_t atomic64_t and atomic_long_t).
0003
0004 The atomic type provides an interface to the architecture's means of atomic
0005 RMW operations between CPUs (atomic operations on MMIO are not supported and
0006 can lead to fatal traps on some platforms).
0007
0008 API
0009 ---
0010
0011 The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
0012 brevity):
0013
0014 Non-RMW ops:
0015
0016   atomic_read(), atomic_set()
0017   atomic_read_acquire(), atomic_set_release()
0018
0019
0020 RMW atomic operations:
0021
0022 Arithmetic:
0023
0024   atomic_{add,sub,inc,dec}()
0025   atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
0026   atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
0027
0028
0029 Bitwise:
0030
0031   atomic_{and,or,xor,andnot}()
0032   atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
0033
0034
0035 Swap:
0036
0037   atomic_xchg{,_relaxed,_acquire,_release}()
0038   atomic_cmpxchg{,_relaxed,_acquire,_release}()
0039   atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
0040
0041
0042 Reference count (but please see refcount_t):
0043
0044   atomic_add_unless(), atomic_inc_not_zero()
0045   atomic_sub_and_test(), atomic_dec_and_test()
0046
0047
0048 Misc:
0049
0050   atomic_inc_and_test(), atomic_add_negative()
0051   atomic_dec_unless_positive(), atomic_inc_unless_negative()
0052
0053
0054 Barriers:
0055
0056   smp_mb__{before,after}_atomic()
0057
0058
0059 TYPES (signed vs unsigned)
0060 -----
0061
0062 While atomic_t, atomic_long_t and atomic64_t use int, long and s64
0063 respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
0064 (which implies -fwrapv) and defines signed overflow to behave like
0065 2s-complement.
0066
0067 Therefore, an explicitly unsigned variant of the atomic ops is strictly
0068 unnecessary and we can simply cast, there is no UB.
0069
0070 There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
0071 signed types.
0072
0073 With this we also conform to the C/C++ _Atomic behaviour and things like
0074 P1236R1.
0075
0076
0077 SEMANTICS
0078 ---------
0079
0080 Non-RMW ops:
0081
0082 The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
0083 implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
0084 smp_store_release() respectively. Therefore, if you find yourself only using
0085 the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
0086 and are doing it wrong.
0087
0088 A note for the implementation of atomic_set{}() is that it must not break the
0089 atomicity of the RMW ops. That is:
0090
0091   C Atomic-RMW-ops-are-atomic-WRT-atomic_set
0092
0093   {
0094     atomic_t v = ATOMIC_INIT(1);
0095   }
0096
0097   P0(atomic_t *v)
0098   {
0099     (void)atomic_add_unless(v, 1, 0);
0100   }
0101
0102   P1(atomic_t *v)
0103   {
0104     atomic_set(v, 0);
0105   }
0106
0107   exists
0108   (v=2)
0109
0110 In this case we would expect the atomic_set() from CPU1 to either happen
0111 before the atomic_add_unless(), in which case that latter one would no-op, or
0112 _after_ in which case we'd overwrite its result. In no case is "2" a valid
0113 outcome.
0114
0115 This is typically true on 'normal' platforms, where a regular competing STORE
0116 will invalidate a LL/SC or fail a CMPXCHG.
0117
0118 The obvious case where this is not so is when we need to implement atomic ops
0119 with a lock:
0120
0121   CPU0                                          CPU1
0122
0123   atomic_add_unless(v, 1, 0);
0124     lock();
0125     ret = READ_ONCE(v->counter); // == 1
0126                                                 atomic_set(v, 0);
0127     if (ret != u)                                 WRITE_ONCE(v->counter, 0);
0128       WRITE_ONCE(v->counter, ret + 1);
0129     unlock();
0130
0131 the typical solution is to then implement atomic_set{}() with atomic_xchg().
0132
0133
0134 RMW ops:
0135
0136 These come in various forms:
0137
0138  - plain operations without return value: atomic_{}()
0139
0140  - operations which return the modified value: atomic_{}_return()
0141
0142    these are limited to the arithmetic operations because those are
0143    reversible. Bitops are irreversible and therefore the modified value
0144    is of dubious utility.
0145
0146  - operations which return the original value: atomic_fetch_{}()
0147
0148  - swap operations: xchg(), cmpxchg() and try_cmpxchg()
0149
0150  - misc; the special purpose operations that are commonly used and would,
0151    given the interface, normally be implemented using (try_)cmpxchg loops but
0152    are time critical and can, (typically) on LL/SC architectures, be more
0153    efficiently implemented.
0154
0155 All these operations are SMP atomic; that is, the operations (for a single
0156 atomic variable) can be fully ordered and no intermediate state is lost or
0157 visible.
0158
0159
0160 ORDERING  (go read memory-barriers.txt first)
0161 --------
0162
0163 The rule of thumb:
0164
0165  - non-RMW operations are unordered;
0166
0167  - RMW operations that have no return value are unordered;
0168
0169  - RMW operations that have a return value are fully ordered;
0170
0171  - RMW operations that are conditional are unordered on FAILURE,
0172    otherwise the above rules apply.
0173
0174 Except of course when an operation has an explicit ordering like:
0175
0176  {}_relaxed: unordered
0177  {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
0178  {}_release: the W of the RMW (or atomic_set)  is a  RELEASE
0179
0180 Where 'unordered' is against other memory locations. Address dependencies are
0181 not defeated.
0182
0183 Fully ordered primitives are ordered against everything prior and everything
0184 subsequent. Therefore a fully ordered primitive is like having an smp_mb()
0185 before and an smp_mb() after the primitive.
0186
0187
0188 The barriers:
0189
0190   smp_mb__{before,after}_atomic()
0191
0192 only apply to the RMW atomic ops and can be used to augment/upgrade the
0193 ordering inherent to the op. These barriers act almost like a full smp_mb():
0194 smp_mb__before_atomic() orders all earlier accesses against the RMW op
0195 itself and all accesses following it, and smp_mb__after_atomic() orders all
0196 later accesses against the RMW op and all accesses preceding it. However,
0197 accesses between the smp_mb__{before,after}_atomic() and the RMW op are not
0198 ordered, so it is advisable to place the barrier right next to the RMW atomic
0199 op whenever possible.
0200
0201 These helper barriers exist because architectures have varying implicit
0202 ordering on their SMP atomic primitives. For example our TSO architectures
0203 provide full ordered atomics and these barriers are no-ops.
0204
0205 NOTE: when the atomic RmW ops are fully ordered, they should also imply a
0206 compiler barrier.
0207
0208 Thus:
0209
0210   atomic_fetch_add();
0211
0212 is equivalent to:
0213
0214   smp_mb__before_atomic();
0215   atomic_fetch_add_relaxed();
0216   smp_mb__after_atomic();
0217
0218 However the atomic_fetch_add() might be implemented more efficiently.
0219
0220 Further, while something like:
0221
0222   smp_mb__before_atomic();
0223   atomic_dec(&X);
0224
0225 is a 'typical' RELEASE pattern, the barrier is strictly stronger than
0226 a RELEASE because it orders preceding instructions against both the read
0227 and write parts of the atomic_dec(), and against all following instructions
0228 as well. Similarly, something like:
0229
0230   atomic_inc(&X);
0231   smp_mb__after_atomic();
0232
0233 is an ACQUIRE pattern (though very much not typical), but again the barrier is
0234 strictly stronger than ACQUIRE. As illustrated:
0235
0236   C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
0237
0238   {
0239   }
0240
0241   P0(int *x, atomic_t *y)
0242   {
0243     r0 = READ_ONCE(*x);
0244     smp_rmb();
0245     r1 = atomic_read(y);
0246   }
0247
0248   P1(int *x, atomic_t *y)
0249   {
0250     atomic_inc(y);
0251     smp_mb__after_atomic();
0252     WRITE_ONCE(*x, 1);
0253   }
0254
0255   exists
0256   (0:r0=1 /\ 0:r1=0)
0257
0258 This should not happen; but a hypothetical atomic_inc_acquire() --
0259 (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
0260 because it would not order the W part of the RMW against the following
0261 WRITE_ONCE.  Thus:
0262
0263   P0                    P1
0264
0265                         t = LL.acq *y (0)
0266                         t++;
0267                         *x = 1;
0268   r0 = *x (1)
0269   RMB
0270   r1 = *y (0)
0271                         SC *y, t;
0272
0273 is allowed.
0274
0275
0276 CMPXCHG vs TRY_CMPXCHG
0277 ----------------------
0278
0279   int atomic_cmpxchg(atomic_t *ptr, int old, int new);
0280   bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new);
0281
0282 Both provide the same functionality, but try_cmpxchg() can lead to more
0283 compact code. The functions relate like:
0284
0285   bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new)
0286   {
0287     int ret, old = *oldp;
0288     ret = atomic_cmpxchg(ptr, old, new);
0289     if (ret != old)
0290       *oldp = ret;
0291     return ret == old;
0292   }
0293
0294 and:
0295
0296   int atomic_cmpxchg(atomic_t *ptr, int old, int new)
0297   {
0298     (void)atomic_try_cmpxchg(ptr, &old, new);
0299     return old;
0300   }
0301
0302 Usage:
0303
0304   old = atomic_read(&v);                        old = atomic_read(&v);
0305   for (;;) {                                    do {
0306     new = func(old);                              new = func(old);
0307     tmp = atomic_cmpxchg(&v, old, new);         } while (!atomic_try_cmpxchg(&v, &old, new));
0308     if (tmp == old)
0309       break;
0310     old = tmp;
0311   }
0312
0313 NB. try_cmpxchg() also generates better code on some platforms (notably x86)
0314 where the function more closely matches the hardware instruction.
0315
0316
0317 FORWARD PROGRESS
0318 ----------------
0319
0320 In general strong forward progress is expected of all unconditional atomic
0321 operations -- those in the Arithmetic and Bitwise classes and xchg(). However
0322 a fair amount of code also requires forward progress from the conditional
0323 atomic operations.
0324
0325 Specifically 'simple' cmpxchg() loops are expected to not starve one another
0326 indefinitely. However, this is not evident on LL/SC architectures, because
0327 while an LL/SC architecure 'can/should/must' provide forward progress
0328 guarantees between competing LL/SC sections, such a guarantee does not
0329 transfer to cmpxchg() implemented using LL/SC. Consider:
0330
0331   old = atomic_read(&v);
0332   do {
0333     new = func(old);
0334   } while (!atomic_try_cmpxchg(&v, &old, new));
0335
0336 which on LL/SC becomes something like:
0337
0338   old = atomic_read(&v);
0339   do {
0340     new = func(old);
0341   } while (!({
0342     volatile asm ("1: LL  %[oldval], %[v]\n"
0343                   "   CMP %[oldval], %[old]\n"
0344                   "   BNE 2f\n"
0345                   "   SC  %[new], %[v]\n"
0346                   "   BNE 1b\n"
0347                   "2:\n"
0348                   : [oldval] "=&r" (oldval), [v] "m" (v)
0349                   : [old] "r" (old), [new] "r" (new)
0350                   : "memory");
0351     success = (oldval == old);
0352     if (!success)
0353       old = oldval;
0354     success; }));
0355
0356 However, even the forward branch from the failed compare can cause the LL/SC
0357 to fail on some architectures, let alone whatever the compiler makes of the C
0358 loop body. As a result there is no guarantee what so ever the cacheline
0359 containing @v will stay on the local CPU and progress is made.
0360
0361 Even native CAS architectures can fail to provide forward progress for their
0362 primitive (See Sparc64 for an example).
0363
0364 Such implementations are strongly encouraged to add exponential backoff loops
0365 to a failed CAS in order to ensure some progress. Affected architectures are
0366 also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
0367 their locking primitives.