Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============================
0004 BPF_PROG_TYPE_FLOW_DISSECTOR
0005 ============================
0006 
0007 Overview
0008 ========
0009 
0010 Flow dissector is a routine that parses metadata out of the packets. It's
0011 used in the various places in the networking subsystem (RFS, flow hash, etc).
0012 
0013 BPF flow dissector is an attempt to reimplement C-based flow dissector logic
0014 in BPF to gain all the benefits of BPF verifier (namely, limits on the
0015 number of instructions and tail calls).
0016 
0017 API
0018 ===
0019 
0020 BPF flow dissector programs operate on an ``__sk_buff``. However, only the
0021 limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
0022 ``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
0023 and output arguments.
0024 
0025 The inputs are:
0026   * ``nhoff`` - initial offset of the networking header
0027   * ``thoff`` - initial offset of the transport header, initialized to nhoff
0028   * ``n_proto`` - L3 protocol type, parsed out of L2 header
0029   * ``flags`` - optional flags
0030 
0031 Flow dissector BPF program should fill out the rest of the ``struct
0032 bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
0033 also adjusted accordingly.
0034 
0035 The return code of the BPF program is either BPF_OK to indicate successful
0036 dissection, or BPF_DROP to indicate parsing error.
0037 
0038 __sk_buff->data
0039 ===============
0040 
0041 In the VLAN-less case, this is what the initial state of the BPF flow
0042 dissector looks like::
0043 
0044   +------+------+------------+-----------+
0045   | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
0046   +------+------+------------+-----------+
0047                               ^
0048                               |
0049                               +-- flow dissector starts here
0050 
0051 
0052 .. code:: c
0053 
0054   skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
0055   flow_keys->thoff = nhoff
0056   flow_keys->n_proto = ETHER_TYPE
0057 
0058 In case of VLAN, flow dissector can be called with the two different states.
0059 
0060 Pre-VLAN parsing::
0061 
0062   +------+------+------+-----+-----------+-----------+
0063   | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
0064   +------+------+------+-----+-----------+-----------+
0065                         ^
0066                         |
0067                         +-- flow dissector starts here
0068 
0069 .. code:: c
0070 
0071   skb->data + flow_keys->nhoff point the to first byte of TCI
0072   flow_keys->thoff = nhoff
0073   flow_keys->n_proto = TPID
0074 
0075 Please note that TPID can be 802.1AD and, hence, BPF program would
0076 have to parse VLAN information twice for double tagged packets.
0077 
0078 
0079 Post-VLAN parsing::
0080 
0081   +------+------+------+-----+-----------+-----------+
0082   | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
0083   +------+------+------+-----+-----------+-----------+
0084                                           ^
0085                                           |
0086                                           +-- flow dissector starts here
0087 
0088 .. code:: c
0089 
0090   skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
0091   flow_keys->thoff = nhoff
0092   flow_keys->n_proto = ETHER_TYPE
0093 
0094 In this case VLAN information has been processed before the flow dissector
0095 and BPF flow dissector is not required to handle it.
0096 
0097 
0098 The takeaway here is as follows: BPF flow dissector program can be called with
0099 the optional VLAN header and should gracefully handle both cases: when single
0100 or double VLAN is present and when it is not present. The same program
0101 can be called for both cases and would have to be written carefully to
0102 handle both cases.
0103 
0104 
0105 Flags
0106 =====
0107 
0108 ``flow_keys->flags`` might contain optional input flags that work as follows:
0109 
0110 * ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
0111   continue parsing first fragment; the default expected behavior is that
0112   flow dissector returns as soon as it finds out that the packet is fragmented;
0113   used by ``eth_get_headlen`` to estimate length of all headers for GRO.
0114 * ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
0115   stop parsing as soon as it reaches IPv6 flow label; used by
0116   ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash.
0117 * ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
0118   parsing as soon as it reaches encapsulated headers; used by routing
0119   infrastructure.
0120 
0121 
0122 Reference Implementation
0123 ========================
0124 
0125 See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
0126 implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
0127 for the loader. bpftool can be used to load BPF flow dissector program as well.
0128 
0129 The reference implementation is organized as follows:
0130   * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
0131   * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
0132     does ``bpf_tail_call`` to the appropriate L3 handler
0133 
0134 Since BPF at this point doesn't support looping (or any jumping back),
0135 jmp_table is used instead to handle multiple levels of encapsulation (and
0136 IPv6 options).
0137 
0138 
0139 Current Limitations
0140 ===================
0141 BPF flow dissector doesn't support exporting all the metadata that in-kernel
0142 C-based implementation can export. Notable example is single VLAN (802.1Q)
0143 and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
0144 for a set of information that's currently can be exported from the BPF context.
0145 
0146 When BPF flow dissector is attached to the root network namespace (machine-wide
0147 policy), users can't override it in their child network namespaces.