Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
0002 
0003 ===========================
0004 BPF_PROG_TYPE_CGROUP_SYSCTL
0005 ===========================
0006 
0007 This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
0008 provides cgroup-bpf hook for sysctl.
0009 
0010 The hook has to be attached to a cgroup and will be called every time a
0011 process inside that cgroup tries to read from or write to sysctl knob in proc.
0012 
0013 1. Attach type
0014 **************
0015 
0016 ``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
0017 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
0018 
0019 2. Context
0020 **********
0021 
0022 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
0023 BPF program::
0024 
0025     struct bpf_sysctl {
0026         __u32 write;
0027         __u32 file_pos;
0028     };
0029 
0030 * ``write`` indicates whether sysctl value is being read (``0``) or written
0031   (``1``). This field is read-only.
0032 
0033 * ``file_pos`` indicates file position sysctl is being accessed at, read
0034   or written. This field is read-write. Writing to the field sets the starting
0035   position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
0036   will be writing to. Writing zero to the field can be used e.g. to override
0037   whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
0038   when it's called by user space on ``file_pos > 0``. Writing non-zero
0039   value to the field can be used to access part of sysctl value starting from
0040   specified ``file_pos``. Not all sysctl support access with ``file_pos !=
0041   0``, e.g. writes to numeric sysctl entries must always be at file position
0042   ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
0043 
0044 See `linux/bpf.h`_ for more details on how context field can be accessed.
0045 
0046 3. Return code
0047 **************
0048 
0049 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
0050 return codes:
0051 
0052 * ``0`` means "reject access to sysctl";
0053 * ``1`` means "proceed with access".
0054 
0055 If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
0056 ``write(2)`` and ``errno`` will be set to ``EPERM``.
0057 
0058 4. Helpers
0059 **********
0060 
0061 Since sysctl knob is represented by a name and a value, sysctl specific BPF
0062 helpers focus on providing access to these properties:
0063 
0064 * ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
0065   ``/proc/sys`` into provided by BPF program buffer;
0066 
0067 * ``bpf_sysctl_get_current_value()`` to get string value currently held by
0068   sysctl into provided by BPF program buffer. This helper is available on both
0069   ``read(2)`` from and ``write(2)`` to sysctl;
0070 
0071 * ``bpf_sysctl_get_new_value()`` to get new string value currently being
0072   written to sysctl before actual write happens. This helper can be used only
0073   on ``ctx->write == 1``;
0074 
0075 * ``bpf_sysctl_set_new_value()`` to override new string value currently being
0076   written to sysctl before actual write happens. Sysctl value will be
0077   overridden starting from the current ``ctx->file_pos``. If the whole value
0078   has to be overridden BPF program can set ``file_pos`` to zero before calling
0079   to the helper. This helper can be used only on ``ctx->write == 1``. New
0080   string value set by the helper is treated and verified by kernel same way as
0081   an equivalent string passed by user space.
0082 
0083 BPF program sees sysctl value same way as user space does in proc filesystem,
0084 i.e. as a string. Since many sysctl values represent an integer or a vector
0085 of integers, the following helpers can be used to get numeric value from the
0086 string:
0087 
0088 * ``bpf_strtol()`` to convert initial part of the string to long integer
0089   similar to user space `strtol(3)`_;
0090 * ``bpf_strtoul()`` to convert initial part of the string to unsigned long
0091   integer similar to user space `strtoul(3)`_;
0092 
0093 See `linux/bpf.h`_ for more details on helpers described here.
0094 
0095 5. Examples
0096 ***********
0097 
0098 See `test_sysctl_prog.c`_ for an example of BPF program in C that access
0099 sysctl name and value, parses string value to get vector of integers and uses
0100 the result to make decision whether to allow or deny access to sysctl.
0101 
0102 6. Notes
0103 ********
0104 
0105 ``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
0106 environment, for example to monitor sysctl usage or catch unreasonable values
0107 an application, running as root in a separate cgroup, is trying to set.
0108 
0109 Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
0110 may return results different from that at `sys_open` time, i.e. process that
0111 opened sysctl file in proc filesystem may differ from process that is trying
0112 to read from / write to it and two such processes may run in different
0113 cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
0114 security mechanism to limit sysctl usage.
0115 
0116 As with any cgroup-bpf program additional care should be taken if an
0117 application running as root in a cgroup should not be allowed to
0118 detach/replace BPF program attached by administrator.
0119 
0120 .. Links
0121 .. _linux/bpf.h: ../../include/uapi/linux/bpf.h
0122 .. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
0123 .. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
0124 .. _test_sysctl_prog.c:
0125    ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c