0001 CONTROL DEPENDENCIES
0002 ====================
0003
0004 A major difficulty with control dependencies is that current compilers
0005 do not support them. One purpose of this document is therefore to
0006 help you prevent your compiler from breaking your code. However,
0007 control dependencies also pose other challenges, which leads to the
0008 second purpose of this document, namely to help you to avoid breaking
0009 your own code, even in the absence of help from your compiler.
0010
0011 One such challenge is that control dependencies order only later stores.
0012 Therefore, a load-load control dependency will not preserve ordering
0013 unless a read memory barrier is provided. Consider the following code:
0014
0015 q = READ_ONCE(a);
0016 if (q)
0017 p = READ_ONCE(b);
0018
0019 This is not guaranteed to provide any ordering because some types of CPUs
0020 are permitted to predict the result of the load from "b". This prediction
0021 can cause other CPUs to see this load as having happened before the load
0022 from "a". This means that an explicit read barrier is required, for example
0023 as follows:
0024
0025 q = READ_ONCE(a);
0026 if (q) {
0027 smp_rmb();
0028 p = READ_ONCE(b);
0029 }
0030
0031 However, stores are not speculated. This means that ordering is
0032 (usually) guaranteed for load-store control dependencies, as in the
0033 following example:
0034
0035 q = READ_ONCE(a);
0036 if (q)
0037 WRITE_ONCE(b, 1);
0038
0039 Control dependencies can pair with each other and with other types
0040 of ordering. But please note that neither the READ_ONCE() nor the
0041 WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might
0042 fuse the load from "a" with other loads. Without the WRITE_ONCE(),
0043 the compiler might fuse the store to "b" with other stores. Worse yet,
0044 the compiler might convert the store into a load and a check followed
0045 by a store, and this compiler-generated load would not be ordered by
0046 the control dependency.
0047
0048 Furthermore, if the compiler is able to prove that the value of variable
0049 "a" is always non-zero, it would be well within its rights to optimize
0050 the original example by eliminating the "if" statement as follows:
0051
0052 q = a;
0053 b = 1; /* BUG: Compiler and CPU can both reorder!!! */
0054
0055 So don't leave out either the READ_ONCE() or the WRITE_ONCE().
0056 In particular, although READ_ONCE() does force the compiler to emit a
0057 load, it does *not* force the compiler to actually use the loaded value.
0058
0059 It is tempting to try use control dependencies to enforce ordering on
0060 identical stores on both branches of the "if" statement as follows:
0061
0062 q = READ_ONCE(a);
0063 if (q) {
0064 barrier();
0065 WRITE_ONCE(b, 1);
0066 do_something();
0067 } else {
0068 barrier();
0069 WRITE_ONCE(b, 1);
0070 do_something_else();
0071 }
0072
0073 Unfortunately, current compilers will transform this as follows at high
0074 optimization levels:
0075
0076 q = READ_ONCE(a);
0077 barrier();
0078 WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */
0079 if (q) {
0080 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
0081 do_something();
0082 } else {
0083 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
0084 do_something_else();
0085 }
0086
0087 Now there is no conditional between the load from "a" and the store to
0088 "b", which means that the CPU is within its rights to reorder them: The
0089 conditional is absolutely required, and must be present in the final
0090 assembly code, after all of the compiler and link-time optimizations
0091 have been applied. Therefore, if you need ordering in this example,
0092 you must use explicit memory ordering, for example, smp_store_release():
0093
0094 q = READ_ONCE(a);
0095 if (q) {
0096 smp_store_release(&b, 1);
0097 do_something();
0098 } else {
0099 smp_store_release(&b, 1);
0100 do_something_else();
0101 }
0102
0103 Without explicit memory ordering, control-dependency-based ordering is
0104 guaranteed only when the stores differ, for example:
0105
0106 q = READ_ONCE(a);
0107 if (q) {
0108 WRITE_ONCE(b, 1);
0109 do_something();
0110 } else {
0111 WRITE_ONCE(b, 2);
0112 do_something_else();
0113 }
0114
0115 The initial READ_ONCE() is still required to prevent the compiler from
0116 knowing too much about the value of "a".
0117
0118 But please note that you need to be careful what you do with the local
0119 variable "q", otherwise the compiler might be able to guess the value
0120 and again remove the conditional branch that is absolutely required to
0121 preserve ordering. For example:
0122
0123 q = READ_ONCE(a);
0124 if (q % MAX) {
0125 WRITE_ONCE(b, 1);
0126 do_something();
0127 } else {
0128 WRITE_ONCE(b, 2);
0129 do_something_else();
0130 }
0131
0132 If MAX is compile-time defined to be 1, then the compiler knows that
0133 (q % MAX) must be equal to zero, regardless of the value of "q".
0134 The compiler is therefore within its rights to transform the above code
0135 into the following:
0136
0137 q = READ_ONCE(a);
0138 WRITE_ONCE(b, 2);
0139 do_something_else();
0140
0141 Given this transformation, the CPU is not required to respect the ordering
0142 between the load from variable "a" and the store to variable "b". It is
0143 tempting to add a barrier(), but this does not help. The conditional
0144 is gone, and the barrier won't bring it back. Therefore, if you need
0145 to relying on control dependencies to produce this ordering, you should
0146 make sure that MAX is greater than one, perhaps as follows:
0147
0148 q = READ_ONCE(a);
0149 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
0150 if (q % MAX) {
0151 WRITE_ONCE(b, 1);
0152 do_something();
0153 } else {
0154 WRITE_ONCE(b, 2);
0155 do_something_else();
0156 }
0157
0158 Please note once again that each leg of the "if" statement absolutely
0159 must store different values to "b". As in previous examples, if the two
0160 values were identical, the compiler could pull this store outside of the
0161 "if" statement, destroying the control dependency's ordering properties.
0162
0163 You must also be careful avoid relying too much on boolean short-circuit
0164 evaluation. Consider this example:
0165
0166 q = READ_ONCE(a);
0167 if (q || 1 > 0)
0168 WRITE_ONCE(b, 1);
0169
0170 Because the first condition cannot fault and the second condition is
0171 always true, the compiler can transform this example as follows, again
0172 destroying the control dependency's ordering:
0173
0174 q = READ_ONCE(a);
0175 WRITE_ONCE(b, 1);
0176
0177 This is yet another example showing the importance of preventing the
0178 compiler from out-guessing your code. Again, although READ_ONCE() really
0179 does force the compiler to emit code for a given load, the compiler is
0180 within its rights to discard the loaded value.
0181
0182 In addition, control dependencies apply only to the then-clause and
0183 else-clause of the "if" statement in question. In particular, they do
0184 not necessarily order the code following the entire "if" statement:
0185
0186 q = READ_ONCE(a);
0187 if (q) {
0188 WRITE_ONCE(b, 1);
0189 } else {
0190 WRITE_ONCE(b, 2);
0191 }
0192 WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */
0193
0194 It is tempting to argue that there in fact is ordering because the
0195 compiler cannot reorder volatile accesses and also cannot reorder
0196 the writes to "b" with the condition. Unfortunately for this line
0197 of reasoning, the compiler might compile the two writes to "b" as
0198 conditional-move instructions, as in this fanciful pseudo-assembly
0199 language:
0200
0201 ld r1,a
0202 cmp r1,$0
0203 cmov,ne r4,$1
0204 cmov,eq r4,$2
0205 st r4,b
0206 st $1,c
0207
0208 The control dependencies would then extend only to the pair of cmov
0209 instructions and the store depending on them. This means that a weakly
0210 ordered CPU would have no dependency of any sort between the load from
0211 "a" and the store to "c". In short, control dependencies provide ordering
0212 only to the stores in the then-clause and else-clause of the "if" statement
0213 in question (including functions invoked by those two clauses), and not
0214 to code following that "if" statement.
0215
0216
0217 In summary:
0218
0219 (*) Control dependencies can order prior loads against later stores.
0220 However, they do *not* guarantee any other sort of ordering:
0221 Not prior loads against later loads, nor prior stores against
0222 later anything. If you need these other forms of ordering, use
0223 smp_load_acquire(), smp_store_release(), or, in the case of prior
0224 stores and later loads, smp_mb().
0225
0226 (*) If both legs of the "if" statement contain identical stores to
0227 the same variable, then you must explicitly order those stores,
0228 either by preceding both of them with smp_mb() or by using
0229 smp_store_release(). Please note that it is *not* sufficient to use
0230 barrier() at beginning and end of each leg of the "if" statement
0231 because, as shown by the example above, optimizing compilers can
0232 destroy the control dependency while respecting the letter of the
0233 barrier() law.
0234
0235 (*) Control dependencies require at least one run-time conditional
0236 between the prior load and the subsequent store, and this
0237 conditional must involve the prior load. If the compiler is able
0238 to optimize the conditional away, it will have also optimized
0239 away the ordering. Careful use of READ_ONCE() and WRITE_ONCE()
0240 can help to preserve the needed conditional.
0241
0242 (*) Control dependencies require that the compiler avoid reordering the
0243 dependency into nonexistence. Careful use of READ_ONCE() or
0244 atomic{,64}_read() can help to preserve your control dependency.
0245
0246 (*) Control dependencies apply only to the then-clause and else-clause
0247 of the "if" statement containing the control dependency, including
0248 any functions that these two clauses call. Control dependencies
0249 do *not* apply to code beyond the end of that "if" statement.
0250
0251 (*) Control dependencies pair normally with other types of barriers.
0252
0253 (*) Control dependencies do *not* provide multicopy atomicity. If you
0254 need all the CPUs to agree on the ordering of a given store against
0255 all other accesses, use smp_mb().
0256
0257 (*) Compilers do not understand control dependencies. It is therefore
0258 your job to ensure that they do not break your code.