Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============================
0004 BPF_PROG_TYPE_CGROUP_SOCKOPT
0005 ============================
0006 
0007 ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
0008 cgroup hooks:
0009 
0010 * ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
0011   system call.
0012 * ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
0013   system call.
0014 
0015 The context (``struct bpf_sockopt``) has associated socket (``sk``) and
0016 all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
0017 
0018 BPF_CGROUP_SETSOCKOPT
0019 =====================
0020 
0021 ``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
0022 sockopt and it has writable context: it can modify the supplied arguments
0023 before passing them down to the kernel. This hook has access to the cgroup
0024 and socket local storage.
0025 
0026 If BPF program sets ``optlen`` to -1, the control will be returned
0027 back to the userspace after all other BPF programs in the cgroup
0028 chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
0029 
0030 Note, that ``optlen`` can not be increased beyond the user-supplied
0031 value. It can only be decreased or set to -1. Any other value will
0032 trigger ``EFAULT``.
0033 
0034 Return Type
0035 -----------
0036 
0037 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
0038 * ``1`` - success, continue with next BPF program in the cgroup chain.
0039 
0040 BPF_CGROUP_GETSOCKOPT
0041 =====================
0042 
0043 ``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
0044 sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
0045 if it's interested in whatever kernel has returned. BPF hook can override
0046 the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
0047 has been increased above initial ``getsockopt`` value (i.e. userspace
0048 buffer is too small), ``EFAULT`` is returned.
0049 
0050 This hook has access to the cgroup and socket local storage.
0051 
0052 Note, that the only acceptable value to set to ``retval`` is 0 and the
0053 original value that the kernel returned. Any other value will trigger
0054 ``EFAULT``.
0055 
0056 Return Type
0057 -----------
0058 
0059 * ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
0060 * ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
0061   ``retval`` from the syscall (note that this can be overwritten by
0062   the BPF program from the parent cgroup).
0063 
0064 Cgroup Inheritance
0065 ==================
0066 
0067 Suppose, there is the following cgroup hierarchy where each cgroup
0068 has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
0069 ``BPF_F_ALLOW_MULTI`` flag::
0070 
0071   A (root, parent)
0072    \
0073     B (child)
0074 
0075 When the application calls ``getsockopt`` syscall from the cgroup B,
0076 the programs are executed from the bottom up: B, A. First program
0077 (B) sees the result of kernel's ``getsockopt``. It can optionally
0078 adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
0079 control will be passed to the second (A) program which will see the
0080 same context as B including any potential modifications.
0081 
0082 Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
0083 A and B, the trigger order is B, then A. If B does any changes
0084 to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
0085 then the next program in the chain (A) will see those changes,
0086 *not* the original input ``setsockopt`` arguments. The potentially
0087 modified values will be then passed down to the kernel.
0088 
0089 Large optval
0090 ============
0091 When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
0092 can access only the first ``PAGE_SIZE`` of that data. So it has to options:
0093 
0094 * Set ``optlen`` to zero, which indicates that the kernel should
0095   use the original buffer from the userspace. Any modifications
0096   done by the BPF program to the ``optval`` are ignored.
0097 * Set ``optlen`` to the value less than ``PAGE_SIZE``, which
0098   indicates that the kernel should use BPF's trimmed ``optval``.
0099 
0100 When the BPF program returns with the ``optlen`` greater than
0101 ``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
0102 
0103 Example
0104 =======
0105 
0106 See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
0107 of BPF program that handles socket options.