Back to home page

LXR

 
 

    


0001 UNALIGNED MEMORY ACCESSES
0002 =========================
0003 
0004 Linux runs on a wide variety of architectures which have varying behaviour
0005 when it comes to memory access. This document presents some details about
0006 unaligned accesses, why you need to write code that doesn't cause them,
0007 and how to write such code!
0008 
0009 
0010 The definition of an unaligned access
0011 =====================================
0012 
0013 Unaligned memory accesses occur when you try to read N bytes of data starting
0014 from an address that is not evenly divisible by N (i.e. addr % N != 0).
0015 For example, reading 4 bytes of data from address 0x10004 is fine, but
0016 reading 4 bytes of data from address 0x10005 would be an unaligned memory
0017 access.
0018 
0019 The above may seem a little vague, as memory access can happen in different
0020 ways. The context here is at the machine code level: certain instructions read
0021 or write a number of bytes to or from memory (e.g. movb, movw, movl in x86
0022 assembly). As will become clear, it is relatively easy to spot C statements
0023 which will compile to multiple-byte memory access instructions, namely when
0024 dealing with types such as u16, u32 and u64.
0025 
0026 
0027 Natural alignment
0028 =================
0029 
0030 The rule mentioned above forms what we refer to as natural alignment:
0031 When accessing N bytes of memory, the base memory address must be evenly
0032 divisible by N, i.e. addr % N == 0.
0033 
0034 When writing code, assume the target architecture has natural alignment
0035 requirements.
0036 
0037 In reality, only a few architectures require natural alignment on all sizes
0038 of memory access. However, we must consider ALL supported architectures;
0039 writing code that satisfies natural alignment requirements is the easiest way
0040 to achieve full portability.
0041 
0042 
0043 Why unaligned access is bad
0044 ===========================
0045 
0046 The effects of performing an unaligned memory access vary from architecture
0047 to architecture. It would be easy to write a whole document on the differences
0048 here; a summary of the common scenarios is presented below:
0049 
0050  - Some architectures are able to perform unaligned memory accesses
0051    transparently, but there is usually a significant performance cost.
0052  - Some architectures raise processor exceptions when unaligned accesses
0053    happen. The exception handler is able to correct the unaligned access,
0054    at significant cost to performance.
0055  - Some architectures raise processor exceptions when unaligned accesses
0056    happen, but the exceptions do not contain enough information for the
0057    unaligned access to be corrected.
0058  - Some architectures are not capable of unaligned memory access, but will
0059    silently perform a different memory access to the one that was requested,
0060    resulting in a subtle code bug that is hard to detect!
0061 
0062 It should be obvious from the above that if your code causes unaligned
0063 memory accesses to happen, your code will not work correctly on certain
0064 platforms and will cause performance problems on others.
0065 
0066 
0067 Code that does not cause unaligned access
0068 =========================================
0069 
0070 At first, the concepts above may seem a little hard to relate to actual
0071 coding practice. After all, you don't have a great deal of control over
0072 memory addresses of certain variables, etc.
0073 
0074 Fortunately things are not too complex, as in most cases, the compiler
0075 ensures that things will work for you. For example, take the following
0076 structure:
0077 
0078         struct foo {
0079                 u16 field1;
0080                 u32 field2;
0081                 u8 field3;
0082         };
0083 
0084 Let us assume that an instance of the above structure resides in memory
0085 starting at address 0x10000. With a basic level of understanding, it would
0086 not be unreasonable to expect that accessing field2 would cause an unaligned
0087 access. You'd be expecting field2 to be located at offset 2 bytes into the
0088 structure, i.e. address 0x10002, but that address is not evenly divisible
0089 by 4 (remember, we're reading a 4 byte value here).
0090 
0091 Fortunately, the compiler understands the alignment constraints, so in the
0092 above case it would insert 2 bytes of padding in between field1 and field2.
0093 Therefore, for standard structure types you can always rely on the compiler
0094 to pad structures so that accesses to fields are suitably aligned (assuming
0095 you do not cast the field to a type of different length).
0096 
0097 Similarly, you can also rely on the compiler to align variables and function
0098 parameters to a naturally aligned scheme, based on the size of the type of
0099 the variable.
0100 
0101 At this point, it should be clear that accessing a single byte (u8 or char)
0102 will never cause an unaligned access, because all memory addresses are evenly
0103 divisible by one.
0104 
0105 On a related topic, with the above considerations in mind you may observe
0106 that you could reorder the fields in the structure in order to place fields
0107 where padding would otherwise be inserted, and hence reduce the overall
0108 resident memory size of structure instances. The optimal layout of the
0109 above example is:
0110 
0111         struct foo {
0112                 u32 field2;
0113                 u16 field1;
0114                 u8 field3;
0115         };
0116 
0117 For a natural alignment scheme, the compiler would only have to add a single
0118 byte of padding at the end of the structure. This padding is added in order
0119 to satisfy alignment constraints for arrays of these structures.
0120 
0121 Another point worth mentioning is the use of __attribute__((packed)) on a
0122 structure type. This GCC-specific attribute tells the compiler never to
0123 insert any padding within structures, useful when you want to use a C struct
0124 to represent some data that comes in a fixed arrangement 'off the wire'.
0125 
0126 You might be inclined to believe that usage of this attribute can easily
0127 lead to unaligned accesses when accessing fields that do not satisfy
0128 architectural alignment requirements. However, again, the compiler is aware
0129 of the alignment constraints and will generate extra instructions to perform
0130 the memory access in a way that does not cause unaligned access. Of course,
0131 the extra instructions obviously cause a loss in performance compared to the
0132 non-packed case, so the packed attribute should only be used when avoiding
0133 structure padding is of importance.
0134 
0135 
0136 Code that causes unaligned access
0137 =================================
0138 
0139 With the above in mind, let's move onto a real life example of a function
0140 that can cause an unaligned memory access. The following function taken
0141 from include/linux/etherdevice.h is an optimized routine to compare two
0142 ethernet MAC addresses for equality.
0143 
0144 bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
0145 {
0146 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
0147         u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
0148                    ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
0149 
0150         return fold == 0;
0151 #else
0152         const u16 *a = (const u16 *)addr1;
0153         const u16 *b = (const u16 *)addr2;
0154         return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
0155 #endif
0156 }
0157 
0158 In the above function, when the hardware has efficient unaligned access
0159 capability, there is no issue with this code.  But when the hardware isn't
0160 able to access memory on arbitrary boundaries, the reference to a[0] causes
0161 2 bytes (16 bits) to be read from memory starting at address addr1.
0162 
0163 Think about what would happen if addr1 was an odd address such as 0x10003.
0164 (Hint: it'd be an unaligned access.)
0165 
0166 Despite the potential unaligned access problems with the above function, it
0167 is included in the kernel anyway but is understood to only work normally on
0168 16-bit-aligned addresses. It is up to the caller to ensure this alignment or
0169 not use this function at all. This alignment-unsafe function is still useful
0170 as it is a decent optimization for the cases when you can ensure alignment,
0171 which is true almost all of the time in ethernet networking context.
0172 
0173 
0174 Here is another example of some code that could cause unaligned accesses:
0175         void myfunc(u8 *data, u32 value)
0176         {
0177                 [...]
0178                 *((u32 *) data) = cpu_to_le32(value);
0179                 [...]
0180         }
0181 
0182 This code will cause unaligned accesses every time the data parameter points
0183 to an address that is not evenly divisible by 4.
0184 
0185 In summary, the 2 main scenarios where you may run into unaligned access
0186 problems involve:
0187  1. Casting variables to types of different lengths
0188  2. Pointer arithmetic followed by access to at least 2 bytes of data
0189 
0190 
0191 Avoiding unaligned accesses
0192 ===========================
0193 
0194 The easiest way to avoid unaligned access is to use the get_unaligned() and
0195 put_unaligned() macros provided by the <asm/unaligned.h> header file.
0196 
0197 Going back to an earlier example of code that potentially causes unaligned
0198 access:
0199 
0200         void myfunc(u8 *data, u32 value)
0201         {
0202                 [...]
0203                 *((u32 *) data) = cpu_to_le32(value);
0204                 [...]
0205         }
0206 
0207 To avoid the unaligned memory access, you would rewrite it as follows:
0208 
0209         void myfunc(u8 *data, u32 value)
0210         {
0211                 [...]
0212                 value = cpu_to_le32(value);
0213                 put_unaligned(value, (u32 *) data);
0214                 [...]
0215         }
0216 
0217 The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
0218 memory and you wish to avoid unaligned access, its usage is as follows:
0219 
0220         u32 value = get_unaligned((u32 *) data);
0221 
0222 These macros work for memory accesses of any length (not just 32 bits as
0223 in the examples above). Be aware that when compared to standard access of
0224 aligned memory, using these macros to access unaligned memory can be costly in
0225 terms of performance.
0226 
0227 If use of such macros is not convenient, another option is to use memcpy(),
0228 where the source or destination (or both) are of type u8* or unsigned char*.
0229 Due to the byte-wise nature of this operation, unaligned accesses are avoided.
0230 
0231 
0232 Alignment vs. Networking
0233 ========================
0234 
0235 On architectures that require aligned loads, networking requires that the IP
0236 header is aligned on a four-byte boundary to optimise the IP stack. For
0237 regular ethernet hardware, the constant NET_IP_ALIGN is used. On most
0238 architectures this constant has the value 2 because the normal ethernet
0239 header is 14 bytes long, so in order to get proper alignment one needs to
0240 DMA to an address which can be expressed as 4*n + 2. One notable exception
0241 here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned
0242 addresses can be very expensive and dwarf the cost of unaligned loads.
0243 
0244 For some ethernet hardware that cannot DMA to unaligned addresses like
0245 4*n+2 or non-ethernet hardware, this can be a problem, and it is then
0246 required to copy the incoming frame into an aligned buffer. Because this is
0247 unnecessary on architectures that can do unaligned accesses, the code can be
0248 made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:
0249 
0250 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
0251         skb = original skb
0252 #else
0253         skb = copy skb
0254 #endif
0255 
0256 --
0257 Authors: Daniel Drake <dsd@gentoo.org>,
0258          Johannes Berg <johannes@sipsolutions.net>
0259 With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
0260 Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
0261 Vadim Lobanov
0262