Documentation/bpf/map_cgroup_storage.rst

0001 .. SPDX-License-Identifier: GPL-2.0-only
0002 .. Copyright (C) 2020 Google LLC.
0003
0004 ===========================
0005 BPF_MAP_TYPE_CGROUP_STORAGE
0006 ===========================
0007
0008 The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
0009 storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
0010 attach to cgroups; the programs are made available by the same Kconfig. The
0011 storage is identified by the cgroup the program is attached to.
0012
0013 The map provide a local storage at the cgroup that the BPF program is attached
0014 to. It provides a faster and simpler access than the general purpose hash
0015 table, which performs a hash table lookups, and requires user to track live
0016 cgroups on their own.
0017
0018 This document describes the usage and semantics of the
0019 ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
0020 Linux 5.9 and this document will describe the differences.
0021
0022 Usage
0023 =====
0024
0025 The map uses key of type of either ``__u64 cgroup_inode_id`` or
0026 ``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
0027
0028     struct bpf_cgroup_storage_key {
0029             __u64 cgroup_inode_id;
0030             __u32 attach_type;
0031     };
0032
0033 ``cgroup_inode_id`` is the inode id of the cgroup directory.
0034 ``attach_type`` is the the program's attach type.
0035
0036 Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
0037 When this key type is used, then all attach types of the particular cgroup and
0038 map will share the same storage. Otherwise, if the type is
0039 ``struct bpf_cgroup_storage_key``, then programs of different attach types
0040 be isolated and see different storages.
0041
0042 To access the storage in a program, use ``bpf_get_local_storage``::
0043
0044     void *bpf_get_local_storage(void *map, u64 flags)
0045
0046 ``flags`` is reserved for future use and must be 0.
0047
0048 There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
0049 can be accessed by multiple programs across different CPUs, and user should
0050 take care of synchronization by themselves. The bpf infrastructure provides
0051 ``struct bpf_spin_lock`` to synchronize the storage. See
0052 ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
0053
0054 Examples
0055 ========
0056
0057 Usage with key type as ``struct bpf_cgroup_storage_key``::
0058
0059     #include <bpf/bpf.h>
0060
0061     struct {
0062             __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
0063             __type(key, struct bpf_cgroup_storage_key);
0064             __type(value, __u32);
0065     } cgroup_storage SEC(".maps");
0066
0067     int program(struct __sk_buff *skb)
0068     {
0069             __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
0070             __sync_fetch_and_add(ptr, 1);
0071
0072             return 0;
0073     }
0074
0075 Userspace accessing map declared above::
0076
0077     #include <linux/bpf.h>
0078     #include <linux/libbpf.h>
0079
0080     __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
0081     {
0082             struct bpf_cgroup_storage_key = {
0083                     .cgroup_inode_id = cgrp,
0084                     .attach_type = type,
0085             };
0086             __u32 value;
0087             bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
0088             // error checking omitted
0089             return value;
0090     }
0091
0092 Alternatively, using just ``__u64 cgroup_inode_id`` as key type::
0093
0094     #include <bpf/bpf.h>
0095
0096     struct {
0097             __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
0098             __type(key, __u64);
0099             __type(value, __u32);
0100     } cgroup_storage SEC(".maps");
0101
0102     int program(struct __sk_buff *skb)
0103     {
0104             __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
0105             __sync_fetch_and_add(ptr, 1);
0106
0107             return 0;
0108     }
0109
0110 And userspace::
0111
0112     #include <linux/bpf.h>
0113     #include <linux/libbpf.h>
0114
0115     __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
0116     {
0117             __u32 value;
0118             bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
0119             // error checking omitted
0120             return value;
0121     }
0122
0123 Semantics
0124 =========
0125
0126 ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
0127 per-CPU variant will have different memory regions for each CPU for each
0128 storage. The non-per-CPU will have the same memory region for each storage.
0129
0130 Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
0131 for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
0132 that uses the map. A program may be attached to multiple cgroups or have
0133 multiple attach types, and each attach creates a fresh zeroed storage. The
0134 storage is freed upon detach.
0135
0136 There is a one-to-one association between the map of each type (per-CPU and
0137 non-per-CPU) and the BPF program during load verification time. As a result,
0138 each map can only be used by one BPF program and each BPF program can only use
0139 one storage map of each type. Because of map can only be used by one BPF
0140 program, sharing of this cgroup's storage with other BPF programs were
0141 impossible.
0142
0143 Since Linux 5.9, storage can be shared by multiple programs. When a program is
0144 attached to a cgroup, the kernel would create a new storage only if the map
0145 does not already contain an entry for the cgroup and attach type pair, or else
0146 the old storage is reused for the new attachment. If the map is attach type
0147 shared, then attach type is simply ignored during comparison. Storage is freed
0148 only when either the map or the cgroup attached to is being freed. Detaching
0149 will not directly free the storage, but it may cause the reference to the map
0150 to reach zero and indirectly freeing all storage in the map.
0151
0152 The map is not associated with any BPF program, thus making sharing possible.
0153 However, the BPF program can still only associate with one map of each type
0154 (per-CPU and non-per-CPU). A BPF program cannot use more than one
0155 ``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
0156 ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
0157
0158 In all versions, userspace may use the the attach parameters of cgroup and
0159 attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
0160 APIs to read or update the storage for a given attachment. For Linux 5.9
0161 attach type shared storages, only the first value in the struct, cgroup inode
0162 id, is used during comparison, so userspace may just specify a ``__u64``
0163 directly.
0164
0165 The storage is bound at attach time. Even if the program is attached to parent
0166 and triggers in child, the storage still belongs to the parent.
0167
0168 Userspace cannot create a new entry in the map or delete an existing entry.
0169 Program test runs always use a temporary storage.