Back to home page

OSCL-LXR

 
 

    


0001 
0002 On atomic types (atomic_t atomic64_t and atomic_long_t).
0003 
0004 The atomic type provides an interface to the architecture's means of atomic
0005 RMW operations between CPUs (atomic operations on MMIO are not supported and
0006 can lead to fatal traps on some platforms).
0007 
0008 API
0009 ---
0010 
0011 The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
0012 brevity):
0013 
0014 Non-RMW ops:
0015 
0016   atomic_read(), atomic_set()
0017   atomic_read_acquire(), atomic_set_release()
0018 
0019 
0020 RMW atomic operations:
0021 
0022 Arithmetic:
0023 
0024   atomic_{add,sub,inc,dec}()
0025   atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
0026   atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
0027 
0028 
0029 Bitwise:
0030 
0031   atomic_{and,or,xor,andnot}()
0032   atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
0033 
0034 
0035 Swap:
0036 
0037   atomic_xchg{,_relaxed,_acquire,_release}()
0038   atomic_cmpxchg{,_relaxed,_acquire,_release}()
0039   atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
0040 
0041 
0042 Reference count (but please see refcount_t):
0043 
0044   atomic_add_unless(), atomic_inc_not_zero()
0045   atomic_sub_and_test(), atomic_dec_and_test()
0046 
0047 
0048 Misc:
0049 
0050   atomic_inc_and_test(), atomic_add_negative()
0051   atomic_dec_unless_positive(), atomic_inc_unless_negative()
0052 
0053 
0054 Barriers:
0055 
0056   smp_mb__{before,after}_atomic()
0057 
0058 
0059 TYPES (signed vs unsigned)
0060 -----
0061 
0062 While atomic_t, atomic_long_t and atomic64_t use int, long and s64
0063 respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
0064 (which implies -fwrapv) and defines signed overflow to behave like
0065 2s-complement.
0066 
0067 Therefore, an explicitly unsigned variant of the atomic ops is strictly
0068 unnecessary and we can simply cast, there is no UB.
0069 
0070 There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
0071 signed types.
0072 
0073 With this we also conform to the C/C++ _Atomic behaviour and things like
0074 P1236R1.
0075 
0076 
0077 SEMANTICS
0078 ---------
0079 
0080 Non-RMW ops:
0081 
0082 The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
0083 implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
0084 smp_store_release() respectively. Therefore, if you find yourself only using
0085 the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
0086 and are doing it wrong.
0087 
0088 A note for the implementation of atomic_set{}() is that it must not break the
0089 atomicity of the RMW ops. That is:
0090 
0091   C Atomic-RMW-ops-are-atomic-WRT-atomic_set
0092 
0093   {
0094     atomic_t v = ATOMIC_INIT(1);
0095   }
0096 
0097   P0(atomic_t *v)
0098   {
0099     (void)atomic_add_unless(v, 1, 0);
0100   }
0101 
0102   P1(atomic_t *v)
0103   {
0104     atomic_set(v, 0);
0105   }
0106 
0107   exists
0108   (v=2)
0109 
0110 In this case we would expect the atomic_set() from CPU1 to either happen
0111 before the atomic_add_unless(), in which case that latter one would no-op, or
0112 _after_ in which case we'd overwrite its result. In no case is "2" a valid
0113 outcome.
0114 
0115 This is typically true on 'normal' platforms, where a regular competing STORE
0116 will invalidate a LL/SC or fail a CMPXCHG.
0117 
0118 The obvious case where this is not so is when we need to implement atomic ops
0119 with a lock:
0120 
0121   CPU0                                          CPU1
0122 
0123   atomic_add_unless(v, 1, 0);
0124     lock();
0125     ret = READ_ONCE(v->counter); // == 1
0126                                                 atomic_set(v, 0);
0127     if (ret != u)                                 WRITE_ONCE(v->counter, 0);
0128       WRITE_ONCE(v->counter, ret + 1);
0129     unlock();
0130 
0131 the typical solution is to then implement atomic_set{}() with atomic_xchg().
0132 
0133 
0134 RMW ops:
0135 
0136 These come in various forms:
0137 
0138  - plain operations without return value: atomic_{}()
0139 
0140  - operations which return the modified value: atomic_{}_return()
0141 
0142    these are limited to the arithmetic operations because those are
0143    reversible. Bitops are irreversible and therefore the modified value
0144    is of dubious utility.
0145 
0146  - operations which return the original value: atomic_fetch_{}()
0147 
0148  - swap operations: xchg(), cmpxchg() and try_cmpxchg()
0149 
0150  - misc; the special purpose operations that are commonly used and would,
0151    given the interface, normally be implemented using (try_)cmpxchg loops but
0152    are time critical and can, (typically) on LL/SC architectures, be more
0153    efficiently implemented.
0154 
0155 All these operations are SMP atomic; that is, the operations (for a single
0156 atomic variable) can be fully ordered and no intermediate state is lost or
0157 visible.
0158 
0159 
0160 ORDERING  (go read memory-barriers.txt first)
0161 --------
0162 
0163 The rule of thumb:
0164 
0165  - non-RMW operations are unordered;
0166 
0167  - RMW operations that have no return value are unordered;
0168 
0169  - RMW operations that have a return value are fully ordered;
0170 
0171  - RMW operations that are conditional are unordered on FAILURE,
0172    otherwise the above rules apply.
0173 
0174 Except of course when an operation has an explicit ordering like:
0175 
0176  {}_relaxed: unordered
0177  {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
0178  {}_release: the W of the RMW (or atomic_set)  is a  RELEASE
0179 
0180 Where 'unordered' is against other memory locations. Address dependencies are
0181 not defeated.
0182 
0183 Fully ordered primitives are ordered against everything prior and everything
0184 subsequent. Therefore a fully ordered primitive is like having an smp_mb()
0185 before and an smp_mb() after the primitive.
0186 
0187 
0188 The barriers:
0189 
0190   smp_mb__{before,after}_atomic()
0191 
0192 only apply to the RMW atomic ops and can be used to augment/upgrade the
0193 ordering inherent to the op. These barriers act almost like a full smp_mb():
0194 smp_mb__before_atomic() orders all earlier accesses against the RMW op
0195 itself and all accesses following it, and smp_mb__after_atomic() orders all
0196 later accesses against the RMW op and all accesses preceding it. However,
0197 accesses between the smp_mb__{before,after}_atomic() and the RMW op are not
0198 ordered, so it is advisable to place the barrier right next to the RMW atomic
0199 op whenever possible.
0200 
0201 These helper barriers exist because architectures have varying implicit
0202 ordering on their SMP atomic primitives. For example our TSO architectures
0203 provide full ordered atomics and these barriers are no-ops.
0204 
0205 NOTE: when the atomic RmW ops are fully ordered, they should also imply a
0206 compiler barrier.
0207 
0208 Thus:
0209 
0210   atomic_fetch_add();
0211 
0212 is equivalent to:
0213 
0214   smp_mb__before_atomic();
0215   atomic_fetch_add_relaxed();
0216   smp_mb__after_atomic();
0217 
0218 However the atomic_fetch_add() might be implemented more efficiently.
0219 
0220 Further, while something like:
0221 
0222   smp_mb__before_atomic();
0223   atomic_dec(&X);
0224 
0225 is a 'typical' RELEASE pattern, the barrier is strictly stronger than
0226 a RELEASE because it orders preceding instructions against both the read
0227 and write parts of the atomic_dec(), and against all following instructions
0228 as well. Similarly, something like:
0229 
0230   atomic_inc(&X);
0231   smp_mb__after_atomic();
0232 
0233 is an ACQUIRE pattern (though very much not typical), but again the barrier is
0234 strictly stronger than ACQUIRE. As illustrated:
0235 
0236   C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
0237 
0238   {
0239   }
0240 
0241   P0(int *x, atomic_t *y)
0242   {
0243     r0 = READ_ONCE(*x);
0244     smp_rmb();
0245     r1 = atomic_read(y);
0246   }
0247 
0248   P1(int *x, atomic_t *y)
0249   {
0250     atomic_inc(y);
0251     smp_mb__after_atomic();
0252     WRITE_ONCE(*x, 1);
0253   }
0254 
0255   exists
0256   (0:r0=1 /\ 0:r1=0)
0257 
0258 This should not happen; but a hypothetical atomic_inc_acquire() --
0259 (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
0260 because it would not order the W part of the RMW against the following
0261 WRITE_ONCE.  Thus:
0262 
0263   P0                    P1
0264 
0265                         t = LL.acq *y (0)
0266                         t++;
0267                         *x = 1;
0268   r0 = *x (1)
0269   RMB
0270   r1 = *y (0)
0271                         SC *y, t;
0272 
0273 is allowed.
0274 
0275 
0276 CMPXCHG vs TRY_CMPXCHG
0277 ----------------------
0278 
0279   int atomic_cmpxchg(atomic_t *ptr, int old, int new);
0280   bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new);
0281 
0282 Both provide the same functionality, but try_cmpxchg() can lead to more
0283 compact code. The functions relate like:
0284 
0285   bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new)
0286   {
0287     int ret, old = *oldp;
0288     ret = atomic_cmpxchg(ptr, old, new);
0289     if (ret != old)
0290       *oldp = ret;
0291     return ret == old;
0292   }
0293 
0294 and:
0295 
0296   int atomic_cmpxchg(atomic_t *ptr, int old, int new)
0297   {
0298     (void)atomic_try_cmpxchg(ptr, &old, new);
0299     return old;
0300   }
0301 
0302 Usage:
0303 
0304   old = atomic_read(&v);                        old = atomic_read(&v);
0305   for (;;) {                                    do {
0306     new = func(old);                              new = func(old);
0307     tmp = atomic_cmpxchg(&v, old, new);         } while (!atomic_try_cmpxchg(&v, &old, new));
0308     if (tmp == old)
0309       break;
0310     old = tmp;
0311   }
0312 
0313 NB. try_cmpxchg() also generates better code on some platforms (notably x86)
0314 where the function more closely matches the hardware instruction.
0315 
0316 
0317 FORWARD PROGRESS
0318 ----------------
0319 
0320 In general strong forward progress is expected of all unconditional atomic
0321 operations -- those in the Arithmetic and Bitwise classes and xchg(). However
0322 a fair amount of code also requires forward progress from the conditional
0323 atomic operations.
0324 
0325 Specifically 'simple' cmpxchg() loops are expected to not starve one another
0326 indefinitely. However, this is not evident on LL/SC architectures, because
0327 while an LL/SC architecure 'can/should/must' provide forward progress
0328 guarantees between competing LL/SC sections, such a guarantee does not
0329 transfer to cmpxchg() implemented using LL/SC. Consider:
0330 
0331   old = atomic_read(&v);
0332   do {
0333     new = func(old);
0334   } while (!atomic_try_cmpxchg(&v, &old, new));
0335 
0336 which on LL/SC becomes something like:
0337 
0338   old = atomic_read(&v);
0339   do {
0340     new = func(old);
0341   } while (!({
0342     volatile asm ("1: LL  %[oldval], %[v]\n"
0343                   "   CMP %[oldval], %[old]\n"
0344                   "   BNE 2f\n"
0345                   "   SC  %[new], %[v]\n"
0346                   "   BNE 1b\n"
0347                   "2:\n"
0348                   : [oldval] "=&r" (oldval), [v] "m" (v)
0349                   : [old] "r" (old), [new] "r" (new)
0350                   : "memory");
0351     success = (oldval == old);
0352     if (!success)
0353       old = oldval;
0354     success; }));
0355 
0356 However, even the forward branch from the failed compare can cause the LL/SC
0357 to fail on some architectures, let alone whatever the compiler makes of the C
0358 loop body. As a result there is no guarantee what so ever the cacheline
0359 containing @v will stay on the local CPU and progress is made.
0360 
0361 Even native CAS architectures can fail to provide forward progress for their
0362 primitive (See Sparc64 for an example).
0363 
0364 Such implementations are strongly encouraged to add exponential backoff loops
0365 to a failed CAS in order to ensure some progress. Affected architectures are
0366 also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
0367 their locking primitives.