0001 .. SPDX-License-Identifier: GPL-2.0
0002 .. Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
0003 .. Copyright © 2019-2020 ANSSI
0004 .. Copyright © 2021-2022 Microsoft Corporation
0005
0006 =====================================
0007 Landlock: unprivileged access control
0008 =====================================
0009
0010 :Author: Mickaël Salaün
0011 :Date: May 2022
0012
0013 The goal of Landlock is to enable to restrict ambient rights (e.g. global
0014 filesystem access) for a set of processes. Because Landlock is a stackable
0015 LSM, it makes possible to create safe security sandboxes as new security layers
0016 in addition to the existing system-wide access-controls. This kind of sandbox
0017 is expected to help mitigate the security impact of bugs or
0018 unexpected/malicious behaviors in user space applications. Landlock empowers
0019 any process, including unprivileged ones, to securely restrict themselves.
0020
0021 We can quickly make sure that Landlock is enabled in the running system by
0022 looking for "landlock: Up and running" in kernel logs (as root): ``dmesg | grep
0023 landlock || journalctl -kg landlock`` . Developers can also easily check for
0024 Landlock support with a :ref:`related system call <landlock_abi_versions>`. If
0025 Landlock is not currently supported, we need to :ref:`configure the kernel
0026 appropriately <kernel_support>`.
0027
0028 Landlock rules
0029 ==============
0030
0031 A Landlock rule describes an action on an object. An object is currently a
0032 file hierarchy, and the related filesystem actions are defined with `access
0033 rights`_. A set of rules is aggregated in a ruleset, which can then restrict
0034 the thread enforcing it, and its future children.
0035
0036 Defining and enforcing a security policy
0037 ----------------------------------------
0038
0039 We first need to define the ruleset that will contain our rules. For this
0040 example, the ruleset will contain rules that only allow read actions, but write
0041 actions will be denied. The ruleset then needs to handle both of these kind of
0042 actions. This is required for backward and forward compatibility (i.e. the
0043 kernel and user space may not know each other's supported restrictions), hence
0044 the need to be explicit about the denied-by-default access rights.
0045
0046 .. code-block:: c
0047
0048 struct landlock_ruleset_attr ruleset_attr = {
0049 .handled_access_fs =
0050 LANDLOCK_ACCESS_FS_EXECUTE |
0051 LANDLOCK_ACCESS_FS_WRITE_FILE |
0052 LANDLOCK_ACCESS_FS_READ_FILE |
0053 LANDLOCK_ACCESS_FS_READ_DIR |
0054 LANDLOCK_ACCESS_FS_REMOVE_DIR |
0055 LANDLOCK_ACCESS_FS_REMOVE_FILE |
0056 LANDLOCK_ACCESS_FS_MAKE_CHAR |
0057 LANDLOCK_ACCESS_FS_MAKE_DIR |
0058 LANDLOCK_ACCESS_FS_MAKE_REG |
0059 LANDLOCK_ACCESS_FS_MAKE_SOCK |
0060 LANDLOCK_ACCESS_FS_MAKE_FIFO |
0061 LANDLOCK_ACCESS_FS_MAKE_BLOCK |
0062 LANDLOCK_ACCESS_FS_MAKE_SYM |
0063 LANDLOCK_ACCESS_FS_REFER,
0064 };
0065
0066 Because we may not know on which kernel version an application will be
0067 executed, it is safer to follow a best-effort security approach. Indeed, we
0068 should try to protect users as much as possible whatever the kernel they are
0069 using. To avoid binary enforcement (i.e. either all security features or
0070 none), we can leverage a dedicated Landlock command to get the current version
0071 of the Landlock ABI and adapt the handled accesses. Let's check if we should
0072 remove the `LANDLOCK_ACCESS_FS_REFER` access right which is only supported
0073 starting with the second version of the ABI.
0074
0075 .. code-block:: c
0076
0077 int abi;
0078
0079 abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
0080 if (abi < 2) {
0081 ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_REFER;
0082 }
0083
0084 This enables to create an inclusive ruleset that will contain our rules.
0085
0086 .. code-block:: c
0087
0088 int ruleset_fd;
0089
0090 ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
0091 if (ruleset_fd < 0) {
0092 perror("Failed to create a ruleset");
0093 return 1;
0094 }
0095
0096 We can now add a new rule to this ruleset thanks to the returned file
0097 descriptor referring to this ruleset. The rule will only allow reading the
0098 file hierarchy ``/usr``. Without another rule, write actions would then be
0099 denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the
0100 ``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file
0101 descriptor.
0102
0103 .. code-block:: c
0104
0105 int err;
0106 struct landlock_path_beneath_attr path_beneath = {
0107 .allowed_access =
0108 LANDLOCK_ACCESS_FS_EXECUTE |
0109 LANDLOCK_ACCESS_FS_READ_FILE |
0110 LANDLOCK_ACCESS_FS_READ_DIR,
0111 };
0112
0113 path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
0114 if (path_beneath.parent_fd < 0) {
0115 perror("Failed to open file");
0116 close(ruleset_fd);
0117 return 1;
0118 }
0119 err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
0120 &path_beneath, 0);
0121 close(path_beneath.parent_fd);
0122 if (err) {
0123 perror("Failed to update ruleset");
0124 close(ruleset_fd);
0125 return 1;
0126 }
0127
0128 It may also be required to create rules following the same logic as explained
0129 for the ruleset creation, by filtering access rights according to the Landlock
0130 ABI version. In this example, this is not required because
0131 `LANDLOCK_ACCESS_FS_REFER` is not allowed by any rule.
0132
0133 We now have a ruleset with one rule allowing read access to ``/usr`` while
0134 denying all other handled accesses for the filesystem. The next step is to
0135 restrict the current thread from gaining more privileges (e.g. thanks to a SUID
0136 binary).
0137
0138 .. code-block:: c
0139
0140 if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
0141 perror("Failed to restrict privileges");
0142 close(ruleset_fd);
0143 return 1;
0144 }
0145
0146 The current thread is now ready to sandbox itself with the ruleset.
0147
0148 .. code-block:: c
0149
0150 if (landlock_restrict_self(ruleset_fd, 0)) {
0151 perror("Failed to enforce ruleset");
0152 close(ruleset_fd);
0153 return 1;
0154 }
0155 close(ruleset_fd);
0156
0157 If the `landlock_restrict_self` system call succeeds, the current thread is now
0158 restricted and this policy will be enforced on all its subsequently created
0159 children as well. Once a thread is landlocked, there is no way to remove its
0160 security policy; only adding more restrictions is allowed. These threads are
0161 now in a new Landlock domain, merge of their parent one (if any) with the new
0162 ruleset.
0163
0164 Full working code can be found in `samples/landlock/sandboxer.c`_.
0165
0166 Good practices
0167 --------------
0168
0169 It is recommended setting access rights to file hierarchy leaves as much as
0170 possible. For instance, it is better to be able to have ``~/doc/`` as a
0171 read-only hierarchy and ``~/tmp/`` as a read-write hierarchy, compared to
0172 ``~/`` as a read-only hierarchy and ``~/tmp/`` as a read-write hierarchy.
0173 Following this good practice leads to self-sufficient hierarchies that don't
0174 depend on their location (i.e. parent directories). This is particularly
0175 relevant when we want to allow linking or renaming. Indeed, having consistent
0176 access rights per directory enables to change the location of such directory
0177 without relying on the destination directory access rights (except those that
0178 are required for this operation, see `LANDLOCK_ACCESS_FS_REFER` documentation).
0179 Having self-sufficient hierarchies also helps to tighten the required access
0180 rights to the minimal set of data. This also helps avoid sinkhole directories,
0181 i.e. directories where data can be linked to but not linked from. However,
0182 this depends on data organization, which might not be controlled by developers.
0183 In this case, granting read-write access to ``~/tmp/``, instead of write-only
0184 access, would potentially allow to move ``~/tmp/`` to a non-readable directory
0185 and still keep the ability to list the content of ``~/tmp/``.
0186
0187 Layers of file path access rights
0188 ---------------------------------
0189
0190 Each time a thread enforces a ruleset on itself, it updates its Landlock domain
0191 with a new layer of policy. Indeed, this complementary policy is stacked with
0192 the potentially other rulesets already restricting this thread. A sandboxed
0193 thread can then safely add more constraints to itself with a new enforced
0194 ruleset.
0195
0196 One policy layer grants access to a file path if at least one of its rules
0197 encountered on the path grants the access. A sandboxed thread can only access
0198 a file path if all its enforced policy layers grant the access as well as all
0199 the other system access controls (e.g. filesystem DAC, other LSM policies,
0200 etc.).
0201
0202 Bind mounts and OverlayFS
0203 -------------------------
0204
0205 Landlock enables to restrict access to file hierarchies, which means that these
0206 access rights can be propagated with bind mounts (cf.
0207 Documentation/filesystems/sharedsubtree.rst) but not with
0208 Documentation/filesystems/overlayfs.rst.
0209
0210 A bind mount mirrors a source file hierarchy to a destination. The destination
0211 hierarchy is then composed of the exact same files, on which Landlock rules can
0212 be tied, either via the source or the destination path. These rules restrict
0213 access when they are encountered on a path, which means that they can restrict
0214 access to multiple file hierarchies at the same time, whether these hierarchies
0215 are the result of bind mounts or not.
0216
0217 An OverlayFS mount point consists of upper and lower layers. These layers are
0218 combined in a merge directory, result of the mount point. This merge hierarchy
0219 may include files from the upper and lower layers, but modifications performed
0220 on the merge hierarchy only reflects on the upper layer. From a Landlock
0221 policy point of view, each OverlayFS layers and merge hierarchies are
0222 standalone and contains their own set of files and directories, which is
0223 different from bind mounts. A policy restricting an OverlayFS layer will not
0224 restrict the resulted merged hierarchy, and vice versa. Landlock users should
0225 then only think about file hierarchies they want to allow access to, regardless
0226 of the underlying filesystem.
0227
0228 Inheritance
0229 -----------
0230
0231 Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain
0232 restrictions from its parent. This is similar to the seccomp inheritance (cf.
0233 Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with
0234 task's :manpage:`credentials(7)`. For instance, one process's thread may apply
0235 Landlock rules to itself, but they will not be automatically applied to other
0236 sibling threads (unlike POSIX thread credential changes, cf.
0237 :manpage:`nptl(7)`).
0238
0239 When a thread sandboxes itself, we have the guarantee that the related security
0240 policy will stay enforced on all this thread's descendants. This allows
0241 creating standalone and modular security policies per application, which will
0242 automatically be composed between themselves according to their runtime parent
0243 policies.
0244
0245 Ptrace restrictions
0246 -------------------
0247
0248 A sandboxed process has less privileges than a non-sandboxed process and must
0249 then be subject to additional restrictions when manipulating another process.
0250 To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
0251 process, a sandboxed process should have a subset of the target process rules,
0252 which means the tracee must be in a sub-domain of the tracer.
0253
0254 Compatibility
0255 =============
0256
0257 Backward and forward compatibility
0258 ----------------------------------
0259
0260 Landlock is designed to be compatible with past and future versions of the
0261 kernel. This is achieved thanks to the system call attributes and the
0262 associated bitflags, particularly the ruleset's `handled_access_fs`. Making
0263 handled access right explicit enables the kernel and user space to have a clear
0264 contract with each other. This is required to make sure sandboxing will not
0265 get stricter with a system update, which could break applications.
0266
0267 Developers can subscribe to the `Landlock mailing list
0268 <https://subspace.kernel.org/lists.linux.dev.html>`_ to knowingly update and
0269 test their applications with the latest available features. In the interest of
0270 users, and because they may use different kernel versions, it is strongly
0271 encouraged to follow a best-effort security approach by checking the Landlock
0272 ABI version at runtime and only enforcing the supported features.
0273
0274 .. _landlock_abi_versions:
0275
0276 Landlock ABI versions
0277 ---------------------
0278
0279 The Landlock ABI version can be read with the sys_landlock_create_ruleset()
0280 system call:
0281
0282 .. code-block:: c
0283
0284 int abi;
0285
0286 abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
0287 if (abi < 0) {
0288 switch (errno) {
0289 case ENOSYS:
0290 printf("Landlock is not supported by the current kernel.\n");
0291 break;
0292 case EOPNOTSUPP:
0293 printf("Landlock is currently disabled.\n");
0294 break;
0295 }
0296 return 0;
0297 }
0298 if (abi >= 2) {
0299 printf("Landlock supports LANDLOCK_ACCESS_FS_REFER.\n");
0300 }
0301
0302 The following kernel interfaces are implicitly supported by the first ABI
0303 version. Features only supported from a specific version are explicitly marked
0304 as such.
0305
0306 Kernel interface
0307 ================
0308
0309 Access rights
0310 -------------
0311
0312 .. kernel-doc:: include/uapi/linux/landlock.h
0313 :identifiers: fs_access
0314
0315 Creating a new ruleset
0316 ----------------------
0317
0318 .. kernel-doc:: security/landlock/syscalls.c
0319 :identifiers: sys_landlock_create_ruleset
0320
0321 .. kernel-doc:: include/uapi/linux/landlock.h
0322 :identifiers: landlock_ruleset_attr
0323
0324 Extending a ruleset
0325 -------------------
0326
0327 .. kernel-doc:: security/landlock/syscalls.c
0328 :identifiers: sys_landlock_add_rule
0329
0330 .. kernel-doc:: include/uapi/linux/landlock.h
0331 :identifiers: landlock_rule_type landlock_path_beneath_attr
0332
0333 Enforcing a ruleset
0334 -------------------
0335
0336 .. kernel-doc:: security/landlock/syscalls.c
0337 :identifiers: sys_landlock_restrict_self
0338
0339 Current limitations
0340 ===================
0341
0342 Filesystem topology modification
0343 --------------------------------
0344
0345 As for file renaming and linking, a sandboxed thread cannot modify its
0346 filesystem topology, whether via :manpage:`mount(2)` or
0347 :manpage:`pivot_root(2)`. However, :manpage:`chroot(2)` calls are not denied.
0348
0349 Special filesystems
0350 -------------------
0351
0352 Access to regular files and directories can be restricted by Landlock,
0353 according to the handled accesses of a ruleset. However, files that do not
0354 come from a user-visible filesystem (e.g. pipe, socket), but can still be
0355 accessed through ``/proc/<pid>/fd/*``, cannot currently be explicitly
0356 restricted. Likewise, some special kernel filesystems such as nsfs, which can
0357 be accessed through ``/proc/<pid>/ns/*``, cannot currently be explicitly
0358 restricted. However, thanks to the `ptrace restrictions`_, access to such
0359 sensitive ``/proc`` files are automatically restricted according to domain
0360 hierarchies. Future Landlock evolutions could still enable to explicitly
0361 restrict such paths with dedicated ruleset flags.
0362
0363 Ruleset layers
0364 --------------
0365
0366 There is a limit of 16 layers of stacked rulesets. This can be an issue for a
0367 task willing to enforce a new ruleset in complement to its 16 inherited
0368 rulesets. Once this limit is reached, sys_landlock_restrict_self() returns
0369 E2BIG. It is then strongly suggested to carefully build rulesets once in the
0370 life of a thread, especially for applications able to launch other applications
0371 that may also want to sandbox themselves (e.g. shells, container managers,
0372 etc.).
0373
0374 Memory usage
0375 ------------
0376
0377 Kernel memory allocated to create rulesets is accounted and can be restricted
0378 by the Documentation/admin-guide/cgroup-v1/memory.rst.
0379
0380 Previous limitations
0381 ====================
0382
0383 File renaming and linking (ABI 1)
0384 ---------------------------------
0385
0386 Because Landlock targets unprivileged access controls, it needs to properly
0387 handle composition of rules. Such property also implies rules nesting.
0388 Properly handling multiple layers of rulesets, each one of them able to
0389 restrict access to files, also implies inheritance of the ruleset restrictions
0390 from a parent to its hierarchy. Because files are identified and restricted by
0391 their hierarchy, moving or linking a file from one directory to another implies
0392 propagation of the hierarchy constraints, or restriction of these actions
0393 according to the potentially lost constraints. To protect against privilege
0394 escalations through renaming or linking, and for the sake of simplicity,
0395 Landlock previously limited linking and renaming to the same directory.
0396 Starting with the Landlock ABI version 2, it is now possible to securely
0397 control renaming and linking thanks to the new `LANDLOCK_ACCESS_FS_REFER`
0398 access right.
0399
0400 .. _kernel_support:
0401
0402 Kernel support
0403 ==============
0404
0405 Landlock was first introduced in Linux 5.13 but it must be configured at build
0406 time with `CONFIG_SECURITY_LANDLOCK=y`. Landlock must also be enabled at boot
0407 time as the other security modules. The list of security modules enabled by
0408 default is set with `CONFIG_LSM`. The kernel configuration should then
0409 contains `CONFIG_LSM=landlock,[...]` with `[...]` as the list of other
0410 potentially useful security modules for the running system (see the
0411 `CONFIG_LSM` help).
0412
0413 If the running kernel doesn't have `landlock` in `CONFIG_LSM`, then we can
0414 still enable it by adding ``lsm=landlock,[...]`` to
0415 Documentation/admin-guide/kernel-parameters.rst thanks to the bootloader
0416 configuration.
0417
0418 Questions and answers
0419 =====================
0420
0421 What about user space sandbox managers?
0422 ---------------------------------------
0423
0424 Using user space process to enforce restrictions on kernel resources can lead
0425 to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of
0426 the OS code and state
0427 <https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
0428
0429 What about namespaces and containers?
0430 -------------------------------------
0431
0432 Namespaces can help create sandboxes but they are not designed for
0433 access-control and then miss useful features for such use case (e.g. no
0434 fine-grained restrictions). Moreover, their complexity can lead to security
0435 issues, especially when untrusted processes can manipulate them (cf.
0436 `Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
0437
0438 Additional documentation
0439 ========================
0440
0441 * Documentation/security/landlock.rst
0442 * https://landlock.io
0443
0444 .. Links
0445 .. _samples/landlock/sandboxer.c:
0446 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c