0001 .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
0002
0003 ===========================
0004 BPF_PROG_TYPE_CGROUP_SYSCTL
0005 ===========================
0006
0007 This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
0008 provides cgroup-bpf hook for sysctl.
0009
0010 The hook has to be attached to a cgroup and will be called every time a
0011 process inside that cgroup tries to read from or write to sysctl knob in proc.
0012
0013 1. Attach type
0014 **************
0015
0016 ``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
0017 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
0018
0019 2. Context
0020 **********
0021
0022 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
0023 BPF program::
0024
0025 struct bpf_sysctl {
0026 __u32 write;
0027 __u32 file_pos;
0028 };
0029
0030 * ``write`` indicates whether sysctl value is being read (``0``) or written
0031 (``1``). This field is read-only.
0032
0033 * ``file_pos`` indicates file position sysctl is being accessed at, read
0034 or written. This field is read-write. Writing to the field sets the starting
0035 position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
0036 will be writing to. Writing zero to the field can be used e.g. to override
0037 whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
0038 when it's called by user space on ``file_pos > 0``. Writing non-zero
0039 value to the field can be used to access part of sysctl value starting from
0040 specified ``file_pos``. Not all sysctl support access with ``file_pos !=
0041 0``, e.g. writes to numeric sysctl entries must always be at file position
0042 ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
0043
0044 See `linux/bpf.h`_ for more details on how context field can be accessed.
0045
0046 3. Return code
0047 **************
0048
0049 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
0050 return codes:
0051
0052 * ``0`` means "reject access to sysctl";
0053 * ``1`` means "proceed with access".
0054
0055 If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
0056 ``write(2)`` and ``errno`` will be set to ``EPERM``.
0057
0058 4. Helpers
0059 **********
0060
0061 Since sysctl knob is represented by a name and a value, sysctl specific BPF
0062 helpers focus on providing access to these properties:
0063
0064 * ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
0065 ``/proc/sys`` into provided by BPF program buffer;
0066
0067 * ``bpf_sysctl_get_current_value()`` to get string value currently held by
0068 sysctl into provided by BPF program buffer. This helper is available on both
0069 ``read(2)`` from and ``write(2)`` to sysctl;
0070
0071 * ``bpf_sysctl_get_new_value()`` to get new string value currently being
0072 written to sysctl before actual write happens. This helper can be used only
0073 on ``ctx->write == 1``;
0074
0075 * ``bpf_sysctl_set_new_value()`` to override new string value currently being
0076 written to sysctl before actual write happens. Sysctl value will be
0077 overridden starting from the current ``ctx->file_pos``. If the whole value
0078 has to be overridden BPF program can set ``file_pos`` to zero before calling
0079 to the helper. This helper can be used only on ``ctx->write == 1``. New
0080 string value set by the helper is treated and verified by kernel same way as
0081 an equivalent string passed by user space.
0082
0083 BPF program sees sysctl value same way as user space does in proc filesystem,
0084 i.e. as a string. Since many sysctl values represent an integer or a vector
0085 of integers, the following helpers can be used to get numeric value from the
0086 string:
0087
0088 * ``bpf_strtol()`` to convert initial part of the string to long integer
0089 similar to user space `strtol(3)`_;
0090 * ``bpf_strtoul()`` to convert initial part of the string to unsigned long
0091 integer similar to user space `strtoul(3)`_;
0092
0093 See `linux/bpf.h`_ for more details on helpers described here.
0094
0095 5. Examples
0096 ***********
0097
0098 See `test_sysctl_prog.c`_ for an example of BPF program in C that access
0099 sysctl name and value, parses string value to get vector of integers and uses
0100 the result to make decision whether to allow or deny access to sysctl.
0101
0102 6. Notes
0103 ********
0104
0105 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
0106 environment, for example to monitor sysctl usage or catch unreasonable values
0107 an application, running as root in a separate cgroup, is trying to set.
0108
0109 Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
0110 may return results different from that at `sys_open` time, i.e. process that
0111 opened sysctl file in proc filesystem may differ from process that is trying
0112 to read from / write to it and two such processes may run in different
0113 cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
0114 security mechanism to limit sysctl usage.
0115
0116 As with any cgroup-bpf program additional care should be taken if an
0117 application running as root in a cgroup should not be allowed to
0118 detach/replace BPF program attached by administrator.
0119
0120 .. Links
0121 .. _linux/bpf.h: ../../include/uapi/linux/bpf.h
0122 .. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
0123 .. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
0124 .. _test_sysctl_prog.c:
0125 ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c