0001
0002 ====================
0003 eBPF Instruction Set
0004 ====================
0005
0006 Registers and calling convention
0007 ================================
0008
0009 eBPF has 10 general purpose registers and a read-only frame pointer register,
0010 all of which are 64-bits wide.
0011
0012 The eBPF calling convention is defined as:
0013
0014 * R0: return value from function calls, and exit value for eBPF programs
0015 * R1 - R5: arguments for function calls
0016 * R6 - R9: callee saved registers that function calls will preserve
0017 * R10: read-only frame pointer to access stack
0018
0019 R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
0020 necessary across calls.
0021
0022 Instruction encoding
0023 ====================
0024
0025 eBPF has two instruction encodings:
0026
0027 * the basic instruction encoding, which uses 64 bits to encode an instruction
0028 * the wide instruction encoding, which appends a second 64-bit immediate value
0029 (imm64) after the basic instruction for a total of 128 bits.
0030
0031 The basic instruction encoding looks as follows:
0032
0033 ============= ======= =============== ==================== ============
0034 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
0035 ============= ======= =============== ==================== ============
0036 immediate offset source register destination register opcode
0037 ============= ======= =============== ==================== ============
0038
0039 Note that most instructions do not use all of the fields.
0040 Unused fields shall be cleared to zero.
0041
0042 Instruction classes
0043 -------------------
0044
0045 The three LSB bits of the 'opcode' field store the instruction class:
0046
0047 ========= ===== ===============================
0048 class value description
0049 ========= ===== ===============================
0050 BPF_LD 0x00 non-standard load operations
0051 BPF_LDX 0x01 load into register operations
0052 BPF_ST 0x02 store from immediate operations
0053 BPF_STX 0x03 store from register operations
0054 BPF_ALU 0x04 32-bit arithmetic operations
0055 BPF_JMP 0x05 64-bit jump operations
0056 BPF_JMP32 0x06 32-bit jump operations
0057 BPF_ALU64 0x07 64-bit arithmetic operations
0058 ========= ===== ===============================
0059
0060 Arithmetic and jump instructions
0061 ================================
0062
0063 For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
0064 BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
0065
0066 ============== ====== =================
0067 4 bits (MSB) 1 bit 3 bits (LSB)
0068 ============== ====== =================
0069 operation code source instruction class
0070 ============== ====== =================
0071
0072 The 4th bit encodes the source operand:
0073
0074 ====== ===== ========================================
0075 source value description
0076 ====== ===== ========================================
0077 BPF_K 0x00 use 32-bit immediate as source operand
0078 BPF_X 0x08 use 'src_reg' register as source operand
0079 ====== ===== ========================================
0080
0081 The four MSB bits store the operation code.
0082
0083
0084 Arithmetic instructions
0085 -----------------------
0086
0087 BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
0088 otherwise identical operations.
0089 The code field encodes the operation as below:
0090
0091 ======== ===== =================================================
0092 code value description
0093 ======== ===== =================================================
0094 BPF_ADD 0x00 dst += src
0095 BPF_SUB 0x10 dst -= src
0096 BPF_MUL 0x20 dst \*= src
0097 BPF_DIV 0x30 dst /= src
0098 BPF_OR 0x40 dst \|= src
0099 BPF_AND 0x50 dst &= src
0100 BPF_LSH 0x60 dst <<= src
0101 BPF_RSH 0x70 dst >>= src
0102 BPF_NEG 0x80 dst = ~src
0103 BPF_MOD 0x90 dst %= src
0104 BPF_XOR 0xa0 dst ^= src
0105 BPF_MOV 0xb0 dst = src
0106 BPF_ARSH 0xc0 sign extending shift right
0107 BPF_END 0xd0 byte swap operations (see separate section below)
0108 ======== ===== =================================================
0109
0110 BPF_ADD | BPF_X | BPF_ALU means::
0111
0112 dst_reg = (u32) dst_reg + (u32) src_reg;
0113
0114 BPF_ADD | BPF_X | BPF_ALU64 means::
0115
0116 dst_reg = dst_reg + src_reg
0117
0118 BPF_XOR | BPF_K | BPF_ALU means::
0119
0120 src_reg = (u32) src_reg ^ (u32) imm32
0121
0122 BPF_XOR | BPF_K | BPF_ALU64 means::
0123
0124 src_reg = src_reg ^ imm32
0125
0126
0127 Byte swap instructions
0128 ----------------------
0129
0130 The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
0131 code field of ``BPF_END``.
0132
0133 The byte swap instructions operate on the destination register
0134 only and do not use a separate source register or immediate value.
0135
0136 The 1-bit source operand field in the opcode is used to to select what byte
0137 order the operation convert from or to:
0138
0139 ========= ===== =================================================
0140 source value description
0141 ========= ===== =================================================
0142 BPF_TO_LE 0x00 convert between host byte order and little endian
0143 BPF_TO_BE 0x08 convert between host byte order and big endian
0144 ========= ===== =================================================
0145
0146 The imm field encodes the width of the swap operations. The following widths
0147 are supported: 16, 32 and 64.
0148
0149 Examples:
0150
0151 ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
0152
0153 dst_reg = htole16(dst_reg)
0154
0155 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
0156
0157 dst_reg = htobe64(dst_reg)
0158
0159 ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
0160 ``BPF_TO_BE`` respectively.
0161
0162
0163 Jump instructions
0164 -----------------
0165
0166 BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
0167 otherwise identical operations.
0168 The code field encodes the operation as below:
0169
0170 ======== ===== ========================= ============
0171 code value description notes
0172 ======== ===== ========================= ============
0173 BPF_JA 0x00 PC += off BPF_JMP only
0174 BPF_JEQ 0x10 PC += off if dst == src
0175 BPF_JGT 0x20 PC += off if dst > src unsigned
0176 BPF_JGE 0x30 PC += off if dst >= src unsigned
0177 BPF_JSET 0x40 PC += off if dst & src
0178 BPF_JNE 0x50 PC += off if dst != src
0179 BPF_JSGT 0x60 PC += off if dst > src signed
0180 BPF_JSGE 0x70 PC += off if dst >= src signed
0181 BPF_CALL 0x80 function call
0182 BPF_EXIT 0x90 function / program return BPF_JMP only
0183 BPF_JLT 0xa0 PC += off if dst < src unsigned
0184 BPF_JLE 0xb0 PC += off if dst <= src unsigned
0185 BPF_JSLT 0xc0 PC += off if dst < src signed
0186 BPF_JSLE 0xd0 PC += off if dst <= src signed
0187 ======== ===== ========================= ============
0188
0189 The eBPF program needs to store the return value into register R0 before doing a
0190 BPF_EXIT.
0191
0192
0193 Load and store instructions
0194 ===========================
0195
0196 For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
0197 8-bit 'opcode' field is divided as:
0198
0199 ============ ====== =================
0200 3 bits (MSB) 2 bits 3 bits (LSB)
0201 ============ ====== =================
0202 mode size instruction class
0203 ============ ====== =================
0204
0205 The size modifier is one of:
0206
0207 ============= ===== =====================
0208 size modifier value description
0209 ============= ===== =====================
0210 BPF_W 0x00 word (4 bytes)
0211 BPF_H 0x08 half word (2 bytes)
0212 BPF_B 0x10 byte
0213 BPF_DW 0x18 double word (8 bytes)
0214 ============= ===== =====================
0215
0216 The mode modifier is one of:
0217
0218 ============= ===== ====================================
0219 mode modifier value description
0220 ============= ===== ====================================
0221 BPF_IMM 0x00 64-bit immediate instructions
0222 BPF_ABS 0x20 legacy BPF packet access (absolute)
0223 BPF_IND 0x40 legacy BPF packet access (indirect)
0224 BPF_MEM 0x60 regular load and store operations
0225 BPF_ATOMIC 0xc0 atomic operations
0226 ============= ===== ====================================
0227
0228
0229 Regular load and store operations
0230 ---------------------------------
0231
0232 The ``BPF_MEM`` mode modifier is used to encode regular load and store
0233 instructions that transfer data between a register and memory.
0234
0235 ``BPF_MEM | <size> | BPF_STX`` means::
0236
0237 *(size *) (dst_reg + off) = src_reg
0238
0239 ``BPF_MEM | <size> | BPF_ST`` means::
0240
0241 *(size *) (dst_reg + off) = imm32
0242
0243 ``BPF_MEM | <size> | BPF_LDX`` means::
0244
0245 dst_reg = *(size *) (src_reg + off)
0246
0247 Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
0248
0249 Atomic operations
0250 -----------------
0251
0252 Atomic operations are operations that operate on memory and can not be
0253 interrupted or corrupted by other access to the same memory region
0254 by other eBPF programs or means outside of this specification.
0255
0256 All atomic operations supported by eBPF are encoded as store operations
0257 that use the ``BPF_ATOMIC`` mode modifier as follows:
0258
0259 * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
0260 * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
0261 * 8-bit and 16-bit wide atomic operations are not supported.
0262
0263 The imm field is used to encode the actual atomic operation.
0264 Simple atomic operation use a subset of the values defined to encode
0265 arithmetic operations in the imm field to encode the atomic operation:
0266
0267 ======== ===== ===========
0268 imm value description
0269 ======== ===== ===========
0270 BPF_ADD 0x00 atomic add
0271 BPF_OR 0x40 atomic or
0272 BPF_AND 0x50 atomic and
0273 BPF_XOR 0xa0 atomic xor
0274 ======== ===== ===========
0275
0276
0277 ``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means::
0278
0279 *(u32 *)(dst_reg + off16) += src_reg
0280
0281 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
0282
0283 *(u64 *)(dst_reg + off16) += src_reg
0284
0285 ``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
0286
0287 In addition to the simple atomic operations, there also is a modifier and
0288 two complex atomic operations:
0289
0290 =========== ================ ===========================
0291 imm value description
0292 =========== ================ ===========================
0293 BPF_FETCH 0x01 modifier: return old value
0294 BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
0295 BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
0296 =========== ================ ===========================
0297
0298 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
0299 always set for the complex atomic operations. If the ``BPF_FETCH`` flag
0300 is set, then the operation also overwrites ``src_reg`` with the value that
0301 was in memory before it was modified.
0302
0303 The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
0304 addressed by ``dst_reg + off``.
0305
0306 The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
0307 ``dst_reg + off`` with ``R0``. If they match, the value addressed by
0308 ``dst_reg + off`` is replaced with ``src_reg``. In either case, the
0309 value that was at ``dst_reg + off`` before the operation is zero-extended
0310 and loaded back to ``R0``.
0311
0312 Clang can generate atomic instructions by default when ``-mcpu=v3`` is
0313 enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
0314 Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
0315 the atomics features, while keeping a lower ``-mcpu`` version, you can use
0316 ``-Xclang -target-feature -Xclang +alu32``.
0317
0318 64-bit immediate instructions
0319 -----------------------------
0320
0321 Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
0322 encoding for an extra imm64 value.
0323
0324 There is currently only one such instruction.
0325
0326 ``BPF_LD | BPF_DW | BPF_IMM`` means::
0327
0328 dst_reg = imm64
0329
0330
0331 Legacy BPF Packet access instructions
0332 -------------------------------------
0333
0334 eBPF has special instructions for access to packet data that have been
0335 carried over from classic BPF to retain the performance of legacy socket
0336 filters running in the eBPF interpreter.
0337
0338 The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
0339 ``BPF_IND | <size> | BPF_LD``.
0340
0341 These instructions are used to access packet data and can only be used when
0342 the program context is a pointer to networking packet. ``BPF_ABS``
0343 accesses packet data at an absolute offset specified by the immediate data
0344 and ``BPF_IND`` access packet data at an offset that includes the value of
0345 a register in addition to the immediate data.
0346
0347 These instructions have seven implicit operands:
0348
0349 * Register R6 is an implicit input that must contain pointer to a
0350 struct sk_buff.
0351 * Register R0 is an implicit output which contains the data fetched from
0352 the packet.
0353 * Registers R1-R5 are scratch registers that are clobbered after a call to
0354 ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
0355
0356 These instructions have an implicit program exit condition as well. When an
0357 eBPF program is trying to access the data beyond the packet boundary, the
0358 program execution will be aborted.
0359
0360 ``BPF_ABS | BPF_W | BPF_LD`` means::
0361
0362 R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
0363
0364 ``BPF_IND | BPF_W | BPF_LD`` means::
0365
0366 R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))