Documentation/bpf/instruction-set.rst

0001
0002 ====================
0003 eBPF Instruction Set
0004 ====================
0005
0006 Registers and calling convention
0007 ================================
0008
0009 eBPF has 10 general purpose registers and a read-only frame pointer register,
0010 all of which are 64-bits wide.
0011
0012 The eBPF calling convention is defined as:
0013
0014  * R0: return value from function calls, and exit value for eBPF programs
0015  * R1 - R5: arguments for function calls
0016  * R6 - R9: callee saved registers that function calls will preserve
0017  * R10: read-only frame pointer to access stack
0018
0019 R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
0020 necessary across calls.
0021
0022 Instruction encoding
0023 ====================
0024
0025 eBPF has two instruction encodings:
0026
0027  * the basic instruction encoding, which uses 64 bits to encode an instruction
0028  * the wide instruction encoding, which appends a second 64-bit immediate value
0029    (imm64) after the basic instruction for a total of 128 bits.
0030
0031 The basic instruction encoding looks as follows:
0032
0033  =============  =======  ===============  ====================  ============
0034  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
0035  =============  =======  ===============  ====================  ============
0036  immediate      offset   source register  destination register  opcode
0037  =============  =======  ===============  ====================  ============
0038
0039 Note that most instructions do not use all of the fields.
0040 Unused fields shall be cleared to zero.
0041
0042 Instruction classes
0043 -------------------
0044
0045 The three LSB bits of the 'opcode' field store the instruction class:
0046
0047   =========  =====  ===============================
0048   class      value  description
0049   =========  =====  ===============================
0050   BPF_LD     0x00   non-standard load operations
0051   BPF_LDX    0x01   load into register operations
0052   BPF_ST     0x02   store from immediate operations
0053   BPF_STX    0x03   store from register operations
0054   BPF_ALU    0x04   32-bit arithmetic operations
0055   BPF_JMP    0x05   64-bit jump operations
0056   BPF_JMP32  0x06   32-bit jump operations
0057   BPF_ALU64  0x07   64-bit arithmetic operations
0058   =========  =====  ===============================
0059
0060 Arithmetic and jump instructions
0061 ================================
0062
0063 For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
0064 BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
0065
0066   ==============  ======  =================
0067   4 bits (MSB)    1 bit   3 bits (LSB)
0068   ==============  ======  =================
0069   operation code  source  instruction class
0070   ==============  ======  =================
0071
0072 The 4th bit encodes the source operand:
0073
0074   ======  =====  ========================================
0075   source  value  description
0076   ======  =====  ========================================
0077   BPF_K   0x00   use 32-bit immediate as source operand
0078   BPF_X   0x08   use 'src_reg' register as source operand
0079   ======  =====  ========================================
0080
0081 The four MSB bits store the operation code.
0082
0083
0084 Arithmetic instructions
0085 -----------------------
0086
0087 BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
0088 otherwise identical operations.
0089 The code field encodes the operation as below:
0090
0091   ========  =====  =================================================
0092   code      value  description
0093   ========  =====  =================================================
0094   BPF_ADD   0x00   dst += src
0095   BPF_SUB   0x10   dst -= src
0096   BPF_MUL   0x20   dst \*= src
0097   BPF_DIV   0x30   dst /= src
0098   BPF_OR    0x40   dst \|= src
0099   BPF_AND   0x50   dst &= src
0100   BPF_LSH   0x60   dst <<= src
0101   BPF_RSH   0x70   dst >>= src
0102   BPF_NEG   0x80   dst = ~src
0103   BPF_MOD   0x90   dst %= src
0104   BPF_XOR   0xa0   dst ^= src
0105   BPF_MOV   0xb0   dst = src
0106   BPF_ARSH  0xc0   sign extending shift right
0107   BPF_END   0xd0   byte swap operations (see separate section below)
0108   ========  =====  =================================================
0109
0110 BPF_ADD | BPF_X | BPF_ALU means::
0111
0112   dst_reg = (u32) dst_reg + (u32) src_reg;
0113
0114 BPF_ADD | BPF_X | BPF_ALU64 means::
0115
0116   dst_reg = dst_reg + src_reg
0117
0118 BPF_XOR | BPF_K | BPF_ALU means::
0119
0120   src_reg = (u32) src_reg ^ (u32) imm32
0121
0122 BPF_XOR | BPF_K | BPF_ALU64 means::
0123
0124   src_reg = src_reg ^ imm32
0125
0126
0127 Byte swap instructions
0128 ----------------------
0129
0130 The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
0131 code field of ``BPF_END``.
0132
0133 The byte swap instructions operate on the destination register
0134 only and do not use a separate source register or immediate value.
0135
0136 The 1-bit source operand field in the opcode is used to to select what byte
0137 order the operation convert from or to:
0138
0139   =========  =====  =================================================
0140   source     value  description
0141   =========  =====  =================================================
0142   BPF_TO_LE  0x00   convert between host byte order and little endian
0143   BPF_TO_BE  0x08   convert between host byte order and big endian
0144   =========  =====  =================================================
0145
0146 The imm field encodes the width of the swap operations.  The following widths
0147 are supported: 16, 32 and 64.
0148
0149 Examples:
0150
0151 ``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
0152
0153   dst_reg = htole16(dst_reg)
0154
0155 ``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
0156
0157   dst_reg = htobe64(dst_reg)
0158
0159 ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
0160 ``BPF_TO_BE`` respectively.
0161
0162
0163 Jump instructions
0164 -----------------
0165
0166 BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
0167 otherwise identical operations.
0168 The code field encodes the operation as below:
0169
0170   ========  =====  =========================  ============
0171   code      value  description                notes
0172   ========  =====  =========================  ============
0173   BPF_JA    0x00   PC += off                  BPF_JMP only
0174   BPF_JEQ   0x10   PC += off if dst == src
0175   BPF_JGT   0x20   PC += off if dst > src     unsigned
0176   BPF_JGE   0x30   PC += off if dst >= src    unsigned
0177   BPF_JSET  0x40   PC += off if dst & src
0178   BPF_JNE   0x50   PC += off if dst != src
0179   BPF_JSGT  0x60   PC += off if dst > src     signed
0180   BPF_JSGE  0x70   PC += off if dst >= src    signed
0181   BPF_CALL  0x80   function call
0182   BPF_EXIT  0x90   function / program return  BPF_JMP only
0183   BPF_JLT   0xa0   PC += off if dst < src     unsigned
0184   BPF_JLE   0xb0   PC += off if dst <= src    unsigned
0185   BPF_JSLT  0xc0   PC += off if dst < src     signed
0186   BPF_JSLE  0xd0   PC += off if dst <= src    signed
0187   ========  =====  =========================  ============
0188
0189 The eBPF program needs to store the return value into register R0 before doing a
0190 BPF_EXIT.
0191
0192
0193 Load and store instructions
0194 ===========================
0195
0196 For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
0197 8-bit 'opcode' field is divided as:
0198
0199   ============  ======  =================
0200   3 bits (MSB)  2 bits  3 bits (LSB)
0201   ============  ======  =================
0202   mode          size    instruction class
0203   ============  ======  =================
0204
0205 The size modifier is one of:
0206
0207   =============  =====  =====================
0208   size modifier  value  description
0209   =============  =====  =====================
0210   BPF_W          0x00   word        (4 bytes)
0211   BPF_H          0x08   half word   (2 bytes)
0212   BPF_B          0x10   byte
0213   BPF_DW         0x18   double word (8 bytes)
0214   =============  =====  =====================
0215
0216 The mode modifier is one of:
0217
0218   =============  =====  ====================================
0219   mode modifier  value  description
0220   =============  =====  ====================================
0221   BPF_IMM        0x00   64-bit immediate instructions
0222   BPF_ABS        0x20   legacy BPF packet access (absolute)
0223   BPF_IND        0x40   legacy BPF packet access (indirect)
0224   BPF_MEM        0x60   regular load and store operations
0225   BPF_ATOMIC     0xc0   atomic operations
0226   =============  =====  ====================================
0227
0228
0229 Regular load and store operations
0230 ---------------------------------
0231
0232 The ``BPF_MEM`` mode modifier is used to encode regular load and store
0233 instructions that transfer data between a register and memory.
0234
0235 ``BPF_MEM | <size> | BPF_STX`` means::
0236
0237   *(size *) (dst_reg + off) = src_reg
0238
0239 ``BPF_MEM | <size> | BPF_ST`` means::
0240
0241   *(size *) (dst_reg + off) = imm32
0242
0243 ``BPF_MEM | <size> | BPF_LDX`` means::
0244
0245   dst_reg = *(size *) (src_reg + off)
0246
0247 Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
0248
0249 Atomic operations
0250 -----------------
0251
0252 Atomic operations are operations that operate on memory and can not be
0253 interrupted or corrupted by other access to the same memory region
0254 by other eBPF programs or means outside of this specification.
0255
0256 All atomic operations supported by eBPF are encoded as store operations
0257 that use the ``BPF_ATOMIC`` mode modifier as follows:
0258
0259   * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
0260   * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
0261   * 8-bit and 16-bit wide atomic operations are not supported.
0262
0263 The imm field is used to encode the actual atomic operation.
0264 Simple atomic operation use a subset of the values defined to encode
0265 arithmetic operations in the imm field to encode the atomic operation:
0266
0267   ========  =====  ===========
0268   imm       value  description
0269   ========  =====  ===========
0270   BPF_ADD   0x00   atomic add
0271   BPF_OR    0x40   atomic or
0272   BPF_AND   0x50   atomic and
0273   BPF_XOR   0xa0   atomic xor
0274   ========  =====  ===========
0275
0276
0277 ``BPF_ATOMIC | BPF_W  | BPF_STX`` with imm = BPF_ADD means::
0278
0279   *(u32 *)(dst_reg + off16) += src_reg
0280
0281 ``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
0282
0283   *(u64 *)(dst_reg + off16) += src_reg
0284
0285 ``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
0286
0287 In addition to the simple atomic operations, there also is a modifier and
0288 two complex atomic operations:
0289
0290   ===========  ================  ===========================
0291   imm          value             description
0292   ===========  ================  ===========================
0293   BPF_FETCH    0x01              modifier: return old value
0294   BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
0295   BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
0296   ===========  ================  ===========================
0297
0298 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
0299 always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
0300 is set, then the operation also overwrites ``src_reg`` with the value that
0301 was in memory before it was modified.
0302
0303 The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
0304 addressed by ``dst_reg + off``.
0305
0306 The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
0307 ``dst_reg + off`` with ``R0``. If they match, the value addressed by
0308 ``dst_reg + off`` is replaced with ``src_reg``. In either case, the
0309 value that was at ``dst_reg + off`` before the operation is zero-extended
0310 and loaded back to ``R0``.
0311
0312 Clang can generate atomic instructions by default when ``-mcpu=v3`` is
0313 enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
0314 Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
0315 the atomics features, while keeping a lower ``-mcpu`` version, you can use
0316 ``-Xclang -target-feature -Xclang +alu32``.
0317
0318 64-bit immediate instructions
0319 -----------------------------
0320
0321 Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
0322 encoding for an extra imm64 value.
0323
0324 There is currently only one such instruction.
0325
0326 ``BPF_LD | BPF_DW | BPF_IMM`` means::
0327
0328   dst_reg = imm64
0329
0330
0331 Legacy BPF Packet access instructions
0332 -------------------------------------
0333
0334 eBPF has special instructions for access to packet data that have been
0335 carried over from classic BPF to retain the performance of legacy socket
0336 filters running in the eBPF interpreter.
0337
0338 The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
0339 ``BPF_IND | <size> | BPF_LD``.
0340
0341 These instructions are used to access packet data and can only be used when
0342 the program context is a pointer to networking packet.  ``BPF_ABS``
0343 accesses packet data at an absolute offset specified by the immediate data
0344 and ``BPF_IND`` access packet data at an offset that includes the value of
0345 a register in addition to the immediate data.
0346
0347 These instructions have seven implicit operands:
0348
0349  * Register R6 is an implicit input that must contain pointer to a
0350    struct sk_buff.
0351  * Register R0 is an implicit output which contains the data fetched from
0352    the packet.
0353  * Registers R1-R5 are scratch registers that are clobbered after a call to
0354    ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
0355
0356 These instructions have an implicit program exit condition as well. When an
0357 eBPF program is trying to access the data beyond the packet boundary, the
0358 program execution will be aborted.
0359
0360 ``BPF_ABS | BPF_W | BPF_LD`` means::
0361
0362   R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
0363
0364 ``BPF_IND | BPF_W | BPF_LD`` means::
0365
0366   R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))