Back to home page

OSCL-LXR

 
 

    


0001 ======================
0002 No New Privileges Flag
0003 ======================
0004 
0005 The execve system call can grant a newly-started program privileges that
0006 its parent did not have.  The most obvious examples are setuid/setgid
0007 programs and file capabilities.  To prevent the parent program from
0008 gaining these privileges as well, the kernel and user code must be
0009 careful to prevent the parent from doing anything that could subvert the
0010 child.  For example:
0011 
0012  - The dynamic loader handles ``LD_*`` environment variables differently if
0013    a program is setuid.
0014 
0015  - chroot is disallowed to unprivileged processes, since it would allow
0016    ``/etc/passwd`` to be replaced from the point of view of a process that
0017    inherited chroot.
0018 
0019  - The exec code has special handling for ptrace.
0020 
0021 These are all ad-hoc fixes.  The ``no_new_privs`` bit (since Linux 3.5) is a
0022 new, generic mechanism to make it safe for a process to modify its
0023 execution environment in a manner that persists across execve.  Any task
0024 can set ``no_new_privs``.  Once the bit is set, it is inherited across fork,
0025 clone, and execve and cannot be unset.  With ``no_new_privs`` set, ``execve()``
0026 promises not to grant the privilege to do anything that could not have
0027 been done without the execve call.  For example, the setuid and setgid
0028 bits will no longer change the uid or gid; file capabilities will not
0029 add to the permitted set, and LSMs will not relax constraints after
0030 execve.
0031 
0032 To set ``no_new_privs``, use::
0033 
0034     prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
0035 
0036 Be careful, though: LSMs might also not tighten constraints on exec
0037 in ``no_new_privs`` mode.  (This means that setting up a general-purpose
0038 service launcher to set ``no_new_privs`` before execing daemons may
0039 interfere with LSM-based sandboxing.)
0040 
0041 Note that ``no_new_privs`` does not prevent privilege changes that do not
0042 involve ``execve()``.  An appropriately privileged task can still call
0043 ``setuid(2)`` and receive SCM_RIGHTS datagrams.
0044 
0045 There are two main use cases for ``no_new_privs`` so far:
0046 
0047  - Filters installed for the seccomp mode 2 sandbox persist across
0048    execve and can change the behavior of newly-executed programs.
0049    Unprivileged users are therefore only allowed to install such filters
0050    if ``no_new_privs`` is set.
0051 
0052  - By itself, ``no_new_privs`` can be used to reduce the attack surface
0053    available to an unprivileged user.  If everything running with a
0054    given uid has ``no_new_privs`` set, then that uid will be unable to
0055    escalate its privileges by directly attacking setuid, setgid, and
0056    fcap-using binaries; it will need to compromise something without the
0057    ``no_new_privs`` bit set first.
0058 
0059 In the future, other potentially dangerous kernel features could become
0060 available to unprivileged tasks if ``no_new_privs`` is set.  In principle,
0061 several options to ``unshare(2)`` and ``clone(2)`` would be safe when
0062 ``no_new_privs`` is set, and ``no_new_privs`` + ``chroot`` is considerable less
0063 dangerous than chroot by itself.