Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
0002 
0003 =====================
0004 BPF sk_lookup program
0005 =====================
0006 
0007 BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
0008 into the socket lookup performed by the transport layer when a packet is to be
0009 delivered locally.
0010 
0011 When invoked BPF sk_lookup program can select a socket that will receive the
0012 incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
0013 
0014 Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
0015 
0016 Motivation
0017 ==========
0018 
0019 BPF sk_lookup program type was introduced to address setup scenarios where
0020 binding sockets to an address with ``bind()`` socket call is impractical, such
0021 as:
0022 
0023 1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
0024    binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
0025    conflict,
0026 2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
0027    case.
0028 
0029 Such setups would require creating and ``bind()``'ing one socket to each of the
0030 IP address/port in the range, leading to resource consumption and potential
0031 latency spikes during socket lookup.
0032 
0033 Attachment
0034 ==========
0035 
0036 BPF sk_lookup program can be attached to a network namespace with
0037 ``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
0038 netns FD as attachment ``target_fd``.
0039 
0040 Multiple programs can be attached to one network namespace. Programs will be
0041 invoked in the same order as they were attached.
0042 
0043 Hooks
0044 =====
0045 
0046 The attached BPF sk_lookup programs run whenever the transport layer needs to
0047 find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
0048 
0049 Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
0050 as usual without triggering the BPF sk_lookup hook.
0051 
0052 The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
0053 verdict code. As for other BPF program types that are network filters,
0054 ``SK_PASS`` signifies that the socket lookup should continue on to regular
0055 hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
0056 packet.
0057 
0058 A BPF sk_lookup program can also select a socket to receive the packet by
0059 calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
0060 in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
0061 ``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
0062 selection. Selecting a socket only takes effect if the program has terminated
0063 with ``SK_PASS`` code.
0064 
0065 When multiple programs are attached, the end result is determined from return
0066 codes of all the programs according to the following rules:
0067 
0068 1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
0069    is used as the result of the socket lookup.
0070 2. If more than one program returned ``SK_PASS`` and selected a socket, the last
0071    selection takes effect.
0072 3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
0073    selected a socket, socket lookup fails.
0074 4. If all programs returned ``SK_PASS`` and none of them selected a socket,
0075    socket lookup continues on.
0076 
0077 API
0078 ===
0079 
0080 In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
0081 receives information about the packet that triggered the socket lookup. Namely:
0082 
0083 * IP version (``AF_INET`` or ``AF_INET6``),
0084 * L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
0085 * source and destination IP address,
0086 * source and destination L4 port,
0087 * the socket that has been selected with ``bpf_sk_assign()``.
0088 
0089 Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
0090 header, and `bpf-helpers(7)
0091 <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
0092 for ``bpf_sk_assign()`` for details.
0093 
0094 Example
0095 =======
0096 
0097 See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
0098 implementation.