0001 .. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
0002
0003 =====================
0004 BPF sk_lookup program
0005 =====================
0006
0007 BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
0008 into the socket lookup performed by the transport layer when a packet is to be
0009 delivered locally.
0010
0011 When invoked BPF sk_lookup program can select a socket that will receive the
0012 incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
0013
0014 Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
0015
0016 Motivation
0017 ==========
0018
0019 BPF sk_lookup program type was introduced to address setup scenarios where
0020 binding sockets to an address with ``bind()`` socket call is impractical, such
0021 as:
0022
0023 1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
0024 binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
0025 conflict,
0026 2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
0027 case.
0028
0029 Such setups would require creating and ``bind()``'ing one socket to each of the
0030 IP address/port in the range, leading to resource consumption and potential
0031 latency spikes during socket lookup.
0032
0033 Attachment
0034 ==========
0035
0036 BPF sk_lookup program can be attached to a network namespace with
0037 ``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
0038 netns FD as attachment ``target_fd``.
0039
0040 Multiple programs can be attached to one network namespace. Programs will be
0041 invoked in the same order as they were attached.
0042
0043 Hooks
0044 =====
0045
0046 The attached BPF sk_lookup programs run whenever the transport layer needs to
0047 find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
0048
0049 Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
0050 as usual without triggering the BPF sk_lookup hook.
0051
0052 The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
0053 verdict code. As for other BPF program types that are network filters,
0054 ``SK_PASS`` signifies that the socket lookup should continue on to regular
0055 hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
0056 packet.
0057
0058 A BPF sk_lookup program can also select a socket to receive the packet by
0059 calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
0060 in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
0061 ``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
0062 selection. Selecting a socket only takes effect if the program has terminated
0063 with ``SK_PASS`` code.
0064
0065 When multiple programs are attached, the end result is determined from return
0066 codes of all the programs according to the following rules:
0067
0068 1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
0069 is used as the result of the socket lookup.
0070 2. If more than one program returned ``SK_PASS`` and selected a socket, the last
0071 selection takes effect.
0072 3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
0073 selected a socket, socket lookup fails.
0074 4. If all programs returned ``SK_PASS`` and none of them selected a socket,
0075 socket lookup continues on.
0076
0077 API
0078 ===
0079
0080 In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
0081 receives information about the packet that triggered the socket lookup. Namely:
0082
0083 * IP version (``AF_INET`` or ``AF_INET6``),
0084 * L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
0085 * source and destination IP address,
0086 * source and destination L4 port,
0087 * the socket that has been selected with ``bpf_sk_assign()``.
0088
0089 Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
0090 header, and `bpf-helpers(7)
0091 <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
0092 for ``bpf_sk_assign()`` for details.
0093
0094 Example
0095 =======
0096
0097 See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
0098 implementation.