0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ============================
0004 BPF_PROG_TYPE_CGROUP_SOCKOPT
0005 ============================
0006
0007 ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
0008 cgroup hooks:
0009
0010 * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
0011 system call.
0012 * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
0013 system call.
0014
0015 The context (``struct bpf_sockopt``) has associated socket (``sk``) and
0016 all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
0017
0018 BPF_CGROUP_SETSOCKOPT
0019 =====================
0020
0021 ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
0022 sockopt and it has writable context: it can modify the supplied arguments
0023 before passing them down to the kernel. This hook has access to the cgroup
0024 and socket local storage.
0025
0026 If BPF program sets ``optlen`` to -1, the control will be returned
0027 back to the userspace after all other BPF programs in the cgroup
0028 chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
0029
0030 Note, that ``optlen`` can not be increased beyond the user-supplied
0031 value. It can only be decreased or set to -1. Any other value will
0032 trigger ``EFAULT``.
0033
0034 Return Type
0035 -----------
0036
0037 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
0038 * ``1`` - success, continue with next BPF program in the cgroup chain.
0039
0040 BPF_CGROUP_GETSOCKOPT
0041 =====================
0042
0043 ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
0044 sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
0045 if it's interested in whatever kernel has returned. BPF hook can override
0046 the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
0047 has been increased above initial ``getsockopt`` value (i.e. userspace
0048 buffer is too small), ``EFAULT`` is returned.
0049
0050 This hook has access to the cgroup and socket local storage.
0051
0052 Note, that the only acceptable value to set to ``retval`` is 0 and the
0053 original value that the kernel returned. Any other value will trigger
0054 ``EFAULT``.
0055
0056 Return Type
0057 -----------
0058
0059 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
0060 * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
0061 ``retval`` from the syscall (note that this can be overwritten by
0062 the BPF program from the parent cgroup).
0063
0064 Cgroup Inheritance
0065 ==================
0066
0067 Suppose, there is the following cgroup hierarchy where each cgroup
0068 has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
0069 ``BPF_F_ALLOW_MULTI`` flag::
0070
0071 A (root, parent)
0072 \
0073 B (child)
0074
0075 When the application calls ``getsockopt`` syscall from the cgroup B,
0076 the programs are executed from the bottom up: B, A. First program
0077 (B) sees the result of kernel's ``getsockopt``. It can optionally
0078 adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
0079 control will be passed to the second (A) program which will see the
0080 same context as B including any potential modifications.
0081
0082 Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
0083 A and B, the trigger order is B, then A. If B does any changes
0084 to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
0085 then the next program in the chain (A) will see those changes,
0086 *not* the original input ``setsockopt`` arguments. The potentially
0087 modified values will be then passed down to the kernel.
0088
0089 Large optval
0090 ============
0091 When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
0092 can access only the first ``PAGE_SIZE`` of that data. So it has to options:
0093
0094 * Set ``optlen`` to zero, which indicates that the kernel should
0095 use the original buffer from the userspace. Any modifications
0096 done by the BPF program to the ``optval`` are ignored.
0097 * Set ``optlen`` to the value less than ``PAGE_SIZE``, which
0098 indicates that the kernel should use BPF's trimmed ``optval``.
0099
0100 When the BPF program returns with the ``optlen`` greater than
0101 ``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
0102
0103 Example
0104 =======
0105
0106 See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
0107 of BPF program that handles socket options.