Back to home page

OSCL-LXR

 
 

    


0001 CONTROL DEPENDENCIES
0002 ====================
0003 
0004 A major difficulty with control dependencies is that current compilers
0005 do not support them.  One purpose of this document is therefore to
0006 help you prevent your compiler from breaking your code.  However,
0007 control dependencies also pose other challenges, which leads to the
0008 second purpose of this document, namely to help you to avoid breaking
0009 your own code, even in the absence of help from your compiler.
0010 
0011 One such challenge is that control dependencies order only later stores.
0012 Therefore, a load-load control dependency will not preserve ordering
0013 unless a read memory barrier is provided.  Consider the following code:
0014 
0015         q = READ_ONCE(a);
0016         if (q)
0017                 p = READ_ONCE(b);
0018 
0019 This is not guaranteed to provide any ordering because some types of CPUs
0020 are permitted to predict the result of the load from "b".  This prediction
0021 can cause other CPUs to see this load as having happened before the load
0022 from "a".  This means that an explicit read barrier is required, for example
0023 as follows:
0024 
0025         q = READ_ONCE(a);
0026         if (q) {
0027                 smp_rmb();
0028                 p = READ_ONCE(b);
0029         }
0030 
0031 However, stores are not speculated.  This means that ordering is
0032 (usually) guaranteed for load-store control dependencies, as in the
0033 following example:
0034 
0035         q = READ_ONCE(a);
0036         if (q)
0037                 WRITE_ONCE(b, 1);
0038 
0039 Control dependencies can pair with each other and with other types
0040 of ordering.  But please note that neither the READ_ONCE() nor the
0041 WRITE_ONCE() are optional.  Without the READ_ONCE(), the compiler might
0042 fuse the load from "a" with other loads.  Without the WRITE_ONCE(),
0043 the compiler might fuse the store to "b" with other stores.  Worse yet,
0044 the compiler might convert the store into a load and a check followed
0045 by a store, and this compiler-generated load would not be ordered by
0046 the control dependency.
0047 
0048 Furthermore, if the compiler is able to prove that the value of variable
0049 "a" is always non-zero, it would be well within its rights to optimize
0050 the original example by eliminating the "if" statement as follows:
0051 
0052         q = a;
0053         b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
0054 
0055 So don't leave out either the READ_ONCE() or the WRITE_ONCE().
0056 In particular, although READ_ONCE() does force the compiler to emit a
0057 load, it does *not* force the compiler to actually use the loaded value.
0058 
0059 It is tempting to try use control dependencies to enforce ordering on
0060 identical stores on both branches of the "if" statement as follows:
0061 
0062         q = READ_ONCE(a);
0063         if (q) {
0064                 barrier();
0065                 WRITE_ONCE(b, 1);
0066                 do_something();
0067         } else {
0068                 barrier();
0069                 WRITE_ONCE(b, 1);
0070                 do_something_else();
0071         }
0072 
0073 Unfortunately, current compilers will transform this as follows at high
0074 optimization levels:
0075 
0076         q = READ_ONCE(a);
0077         barrier();
0078         WRITE_ONCE(b, 1);  /* BUG: No ordering vs. load from a!!! */
0079         if (q) {
0080                 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
0081                 do_something();
0082         } else {
0083                 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
0084                 do_something_else();
0085         }
0086 
0087 Now there is no conditional between the load from "a" and the store to
0088 "b", which means that the CPU is within its rights to reorder them:  The
0089 conditional is absolutely required, and must be present in the final
0090 assembly code, after all of the compiler and link-time optimizations
0091 have been applied.  Therefore, if you need ordering in this example,
0092 you must use explicit memory ordering, for example, smp_store_release():
0093 
0094         q = READ_ONCE(a);
0095         if (q) {
0096                 smp_store_release(&b, 1);
0097                 do_something();
0098         } else {
0099                 smp_store_release(&b, 1);
0100                 do_something_else();
0101         }
0102 
0103 Without explicit memory ordering, control-dependency-based ordering is
0104 guaranteed only when the stores differ, for example:
0105 
0106         q = READ_ONCE(a);
0107         if (q) {
0108                 WRITE_ONCE(b, 1);
0109                 do_something();
0110         } else {
0111                 WRITE_ONCE(b, 2);
0112                 do_something_else();
0113         }
0114 
0115 The initial READ_ONCE() is still required to prevent the compiler from
0116 knowing too much about the value of "a".
0117 
0118 But please note that you need to be careful what you do with the local
0119 variable "q", otherwise the compiler might be able to guess the value
0120 and again remove the conditional branch that is absolutely required to
0121 preserve ordering.  For example:
0122 
0123         q = READ_ONCE(a);
0124         if (q % MAX) {
0125                 WRITE_ONCE(b, 1);
0126                 do_something();
0127         } else {
0128                 WRITE_ONCE(b, 2);
0129                 do_something_else();
0130         }
0131 
0132 If MAX is compile-time defined to be 1, then the compiler knows that
0133 (q % MAX) must be equal to zero, regardless of the value of "q".
0134 The compiler is therefore within its rights to transform the above code
0135 into the following:
0136 
0137         q = READ_ONCE(a);
0138         WRITE_ONCE(b, 2);
0139         do_something_else();
0140 
0141 Given this transformation, the CPU is not required to respect the ordering
0142 between the load from variable "a" and the store to variable "b".  It is
0143 tempting to add a barrier(), but this does not help.  The conditional
0144 is gone, and the barrier won't bring it back.  Therefore, if you need
0145 to relying on control dependencies to produce this ordering, you should
0146 make sure that MAX is greater than one, perhaps as follows:
0147 
0148         q = READ_ONCE(a);
0149         BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
0150         if (q % MAX) {
0151                 WRITE_ONCE(b, 1);
0152                 do_something();
0153         } else {
0154                 WRITE_ONCE(b, 2);
0155                 do_something_else();
0156         }
0157 
0158 Please note once again that each leg of the "if" statement absolutely
0159 must store different values to "b".  As in previous examples, if the two
0160 values were identical, the compiler could pull this store outside of the
0161 "if" statement, destroying the control dependency's ordering properties.
0162 
0163 You must also be careful avoid relying too much on boolean short-circuit
0164 evaluation.  Consider this example:
0165 
0166         q = READ_ONCE(a);
0167         if (q || 1 > 0)
0168                 WRITE_ONCE(b, 1);
0169 
0170 Because the first condition cannot fault and the second condition is
0171 always true, the compiler can transform this example as follows, again
0172 destroying the control dependency's ordering:
0173 
0174         q = READ_ONCE(a);
0175         WRITE_ONCE(b, 1);
0176 
0177 This is yet another example showing the importance of preventing the
0178 compiler from out-guessing your code.  Again, although READ_ONCE() really
0179 does force the compiler to emit code for a given load, the compiler is
0180 within its rights to discard the loaded value.
0181 
0182 In addition, control dependencies apply only to the then-clause and
0183 else-clause of the "if" statement in question.  In particular, they do
0184 not necessarily order the code following the entire "if" statement:
0185 
0186         q = READ_ONCE(a);
0187         if (q) {
0188                 WRITE_ONCE(b, 1);
0189         } else {
0190                 WRITE_ONCE(b, 2);
0191         }
0192         WRITE_ONCE(c, 1);  /* BUG: No ordering against the read from "a". */
0193 
0194 It is tempting to argue that there in fact is ordering because the
0195 compiler cannot reorder volatile accesses and also cannot reorder
0196 the writes to "b" with the condition.  Unfortunately for this line
0197 of reasoning, the compiler might compile the two writes to "b" as
0198 conditional-move instructions, as in this fanciful pseudo-assembly
0199 language:
0200 
0201         ld r1,a
0202         cmp r1,$0
0203         cmov,ne r4,$1
0204         cmov,eq r4,$2
0205         st r4,b
0206         st $1,c
0207 
0208 The control dependencies would then extend only to the pair of cmov
0209 instructions and the store depending on them.  This means that a weakly
0210 ordered CPU would have no dependency of any sort between the load from
0211 "a" and the store to "c".  In short, control dependencies provide ordering
0212 only to the stores in the then-clause and else-clause of the "if" statement
0213 in question (including functions invoked by those two clauses), and not
0214 to code following that "if" statement.
0215 
0216 
0217 In summary:
0218 
0219   (*) Control dependencies can order prior loads against later stores.
0220       However, they do *not* guarantee any other sort of ordering:
0221       Not prior loads against later loads, nor prior stores against
0222       later anything.  If you need these other forms of ordering, use
0223       smp_load_acquire(), smp_store_release(), or, in the case of prior
0224       stores and later loads, smp_mb().
0225 
0226   (*) If both legs of the "if" statement contain identical stores to
0227       the same variable, then you must explicitly order those stores,
0228       either by preceding both of them with smp_mb() or by using
0229       smp_store_release().  Please note that it is *not* sufficient to use
0230       barrier() at beginning and end of each leg of the "if" statement
0231       because, as shown by the example above, optimizing compilers can
0232       destroy the control dependency while respecting the letter of the
0233       barrier() law.
0234 
0235   (*) Control dependencies require at least one run-time conditional
0236       between the prior load and the subsequent store, and this
0237       conditional must involve the prior load.  If the compiler is able
0238       to optimize the conditional away, it will have also optimized
0239       away the ordering.  Careful use of READ_ONCE() and WRITE_ONCE()
0240       can help to preserve the needed conditional.
0241 
0242   (*) Control dependencies require that the compiler avoid reordering the
0243       dependency into nonexistence.  Careful use of READ_ONCE() or
0244       atomic{,64}_read() can help to preserve your control dependency.
0245 
0246   (*) Control dependencies apply only to the then-clause and else-clause
0247       of the "if" statement containing the control dependency, including
0248       any functions that these two clauses call.  Control dependencies
0249       do *not* apply to code beyond the end of that "if" statement.
0250 
0251   (*) Control dependencies pair normally with other types of barriers.
0252 
0253   (*) Control dependencies do *not* provide multicopy atomicity.  If you
0254       need all the CPUs to agree on the ordering of a given store against
0255       all other accesses, use smp_mb().
0256 
0257   (*) Compilers do not understand control dependencies.  It is therefore
0258       your job to ensure that they do not break your code.