0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ============================
0004 BPF_PROG_TYPE_FLOW_DISSECTOR
0005 ============================
0006
0007 Overview
0008 ========
0009
0010 Flow dissector is a routine that parses metadata out of the packets. It's
0011 used in the various places in the networking subsystem (RFS, flow hash, etc).
0012
0013 BPF flow dissector is an attempt to reimplement C-based flow dissector logic
0014 in BPF to gain all the benefits of BPF verifier (namely, limits on the
0015 number of instructions and tail calls).
0016
0017 API
0018 ===
0019
0020 BPF flow dissector programs operate on an ``__sk_buff``. However, only the
0021 limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
0022 ``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
0023 and output arguments.
0024
0025 The inputs are:
0026 * ``nhoff`` - initial offset of the networking header
0027 * ``thoff`` - initial offset of the transport header, initialized to nhoff
0028 * ``n_proto`` - L3 protocol type, parsed out of L2 header
0029 * ``flags`` - optional flags
0030
0031 Flow dissector BPF program should fill out the rest of the ``struct
0032 bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
0033 also adjusted accordingly.
0034
0035 The return code of the BPF program is either BPF_OK to indicate successful
0036 dissection, or BPF_DROP to indicate parsing error.
0037
0038 __sk_buff->data
0039 ===============
0040
0041 In the VLAN-less case, this is what the initial state of the BPF flow
0042 dissector looks like::
0043
0044 +------+------+------------+-----------+
0045 | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
0046 +------+------+------------+-----------+
0047 ^
0048 |
0049 +-- flow dissector starts here
0050
0051
0052 .. code:: c
0053
0054 skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
0055 flow_keys->thoff = nhoff
0056 flow_keys->n_proto = ETHER_TYPE
0057
0058 In case of VLAN, flow dissector can be called with the two different states.
0059
0060 Pre-VLAN parsing::
0061
0062 +------+------+------+-----+-----------+-----------+
0063 | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
0064 +------+------+------+-----+-----------+-----------+
0065 ^
0066 |
0067 +-- flow dissector starts here
0068
0069 .. code:: c
0070
0071 skb->data + flow_keys->nhoff point the to first byte of TCI
0072 flow_keys->thoff = nhoff
0073 flow_keys->n_proto = TPID
0074
0075 Please note that TPID can be 802.1AD and, hence, BPF program would
0076 have to parse VLAN information twice for double tagged packets.
0077
0078
0079 Post-VLAN parsing::
0080
0081 +------+------+------+-----+-----------+-----------+
0082 | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
0083 +------+------+------+-----+-----------+-----------+
0084 ^
0085 |
0086 +-- flow dissector starts here
0087
0088 .. code:: c
0089
0090 skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
0091 flow_keys->thoff = nhoff
0092 flow_keys->n_proto = ETHER_TYPE
0093
0094 In this case VLAN information has been processed before the flow dissector
0095 and BPF flow dissector is not required to handle it.
0096
0097
0098 The takeaway here is as follows: BPF flow dissector program can be called with
0099 the optional VLAN header and should gracefully handle both cases: when single
0100 or double VLAN is present and when it is not present. The same program
0101 can be called for both cases and would have to be written carefully to
0102 handle both cases.
0103
0104
0105 Flags
0106 =====
0107
0108 ``flow_keys->flags`` might contain optional input flags that work as follows:
0109
0110 * ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
0111 continue parsing first fragment; the default expected behavior is that
0112 flow dissector returns as soon as it finds out that the packet is fragmented;
0113 used by ``eth_get_headlen`` to estimate length of all headers for GRO.
0114 * ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
0115 stop parsing as soon as it reaches IPv6 flow label; used by
0116 ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash.
0117 * ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
0118 parsing as soon as it reaches encapsulated headers; used by routing
0119 infrastructure.
0120
0121
0122 Reference Implementation
0123 ========================
0124
0125 See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
0126 implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
0127 for the loader. bpftool can be used to load BPF flow dissector program as well.
0128
0129 The reference implementation is organized as follows:
0130 * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
0131 * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
0132 does ``bpf_tail_call`` to the appropriate L3 handler
0133
0134 Since BPF at this point doesn't support looping (or any jumping back),
0135 jmp_table is used instead to handle multiple levels of encapsulation (and
0136 IPv6 options).
0137
0138
0139 Current Limitations
0140 ===================
0141 BPF flow dissector doesn't support exporting all the metadata that in-kernel
0142 C-based implementation can export. Notable example is single VLAN (802.1Q)
0143 and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
0144 for a set of information that's currently can be exported from the BPF context.
0145
0146 When BPF flow dissector is attached to the root network namespace (machine-wide
0147 policy), users can't override it in their child network namespaces.