Back to home page

OSCL-LXR

 
 

    


0001 ====================
0002 Credentials in Linux
0003 ====================
0004 
0005 By: David Howells <dhowells@redhat.com>
0006 
0007 .. contents:: :local:
0008 
0009 Overview
0010 ========
0011 
0012 There are several parts to the security check performed by Linux when one
0013 object acts upon another:
0014 
0015  1. Objects.
0016 
0017      Objects are things in the system that may be acted upon directly by
0018      userspace programs.  Linux has a variety of actionable objects, including:
0019 
0020         - Tasks
0021         - Files/inodes
0022         - Sockets
0023         - Message queues
0024         - Shared memory segments
0025         - Semaphores
0026         - Keys
0027 
0028      As a part of the description of all these objects there is a set of
0029      credentials.  What's in the set depends on the type of object.
0030 
0031  2. Object ownership.
0032 
0033      Amongst the credentials of most objects, there will be a subset that
0034      indicates the ownership of that object.  This is used for resource
0035      accounting and limitation (disk quotas and task rlimits for example).
0036 
0037      In a standard UNIX filesystem, for instance, this will be defined by the
0038      UID marked on the inode.
0039 
0040  3. The objective context.
0041 
0042      Also amongst the credentials of those objects, there will be a subset that
0043      indicates the 'objective context' of that object.  This may or may not be
0044      the same set as in (2) - in standard UNIX files, for instance, this is the
0045      defined by the UID and the GID marked on the inode.
0046 
0047      The objective context is used as part of the security calculation that is
0048      carried out when an object is acted upon.
0049 
0050  4. Subjects.
0051 
0052      A subject is an object that is acting upon another object.
0053 
0054      Most of the objects in the system are inactive: they don't act on other
0055      objects within the system.  Processes/tasks are the obvious exception:
0056      they do stuff; they access and manipulate things.
0057 
0058      Objects other than tasks may under some circumstances also be subjects.
0059      For instance an open file may send SIGIO to a task using the UID and EUID
0060      given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
0061      the file struct will have a subjective context too.
0062 
0063  5. The subjective context.
0064 
0065      A subject has an additional interpretation of its credentials.  A subset
0066      of its credentials forms the 'subjective context'.  The subjective context
0067      is used as part of the security calculation that is carried out when a
0068      subject acts.
0069 
0070      A Linux task, for example, has the FSUID, FSGID and the supplementary
0071      group list for when it is acting upon a file - which are quite separate
0072      from the real UID and GID that normally form the objective context of the
0073      task.
0074 
0075  6. Actions.
0076 
0077      Linux has a number of actions available that a subject may perform upon an
0078      object.  The set of actions available depends on the nature of the subject
0079      and the object.
0080 
0081      Actions include reading, writing, creating and deleting files; forking or
0082      signalling and tracing tasks.
0083 
0084  7. Rules, access control lists and security calculations.
0085 
0086      When a subject acts upon an object, a security calculation is made.  This
0087      involves taking the subjective context, the objective context and the
0088      action, and searching one or more sets of rules to see whether the subject
0089      is granted or denied permission to act in the desired manner on the
0090      object, given those contexts.
0091 
0092      There are two main sources of rules:
0093 
0094      a. Discretionary access control (DAC):
0095 
0096          Sometimes the object will include sets of rules as part of its
0097          description.  This is an 'Access Control List' or 'ACL'.  A Linux
0098          file may supply more than one ACL.
0099 
0100          A traditional UNIX file, for example, includes a permissions mask that
0101          is an abbreviated ACL with three fixed classes of subject ('user',
0102          'group' and 'other'), each of which may be granted certain privileges
0103          ('read', 'write' and 'execute' - whatever those map to for the object
0104          in question).  UNIX file permissions do not allow the arbitrary
0105          specification of subjects, however, and so are of limited use.
0106 
0107          A Linux file might also sport a POSIX ACL.  This is a list of rules
0108          that grants various permissions to arbitrary subjects.
0109 
0110      b. Mandatory access control (MAC):
0111 
0112          The system as a whole may have one or more sets of rules that get
0113          applied to all subjects and objects, regardless of their source.
0114          SELinux and Smack are examples of this.
0115 
0116          In the case of SELinux and Smack, each object is given a label as part
0117          of its credentials.  When an action is requested, they take the
0118          subject label, the object label and the action and look for a rule
0119          that says that this action is either granted or denied.
0120 
0121 
0122 Types of Credentials
0123 ====================
0124 
0125 The Linux kernel supports the following types of credentials:
0126 
0127  1. Traditional UNIX credentials.
0128 
0129         - Real User ID
0130         - Real Group ID
0131 
0132      The UID and GID are carried by most, if not all, Linux objects, even if in
0133      some cases it has to be invented (FAT or CIFS files for example, which are
0134      derived from Windows).  These (mostly) define the objective context of
0135      that object, with tasks being slightly different in some cases.
0136 
0137         - Effective, Saved and FS User ID
0138         - Effective, Saved and FS Group ID
0139         - Supplementary groups
0140 
0141      These are additional credentials used by tasks only.  Usually, an
0142      EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
0143      will be used as the objective.  For tasks, it should be noted that this is
0144      not always true.
0145 
0146  2. Capabilities.
0147 
0148         - Set of permitted capabilities
0149         - Set of inheritable capabilities
0150         - Set of effective capabilities
0151         - Capability bounding set
0152 
0153      These are only carried by tasks.  They indicate superior capabilities
0154      granted piecemeal to a task that an ordinary task wouldn't otherwise have.
0155      These are manipulated implicitly by changes to the traditional UNIX
0156      credentials, but can also be manipulated directly by the ``capset()``
0157      system call.
0158 
0159      The permitted capabilities are those caps that the process might grant
0160      itself to its effective or permitted sets through ``capset()``.  This
0161      inheritable set might also be so constrained.
0162 
0163      The effective capabilities are the ones that a task is actually allowed to
0164      make use of itself.
0165 
0166      The inheritable capabilities are the ones that may get passed across
0167      ``execve()``.
0168 
0169      The bounding set limits the capabilities that may be inherited across
0170      ``execve()``, especially when a binary is executed that will execute as
0171      UID 0.
0172 
0173  3. Secure management flags (securebits).
0174 
0175      These are only carried by tasks.  These govern the way the above
0176      credentials are manipulated and inherited over certain operations such as
0177      execve().  They aren't used directly as objective or subjective
0178      credentials.
0179 
0180  4. Keys and keyrings.
0181 
0182      These are only carried by tasks.  They carry and cache security tokens
0183      that don't fit into the other standard UNIX credentials.  They are for
0184      making such things as network filesystem keys available to the file
0185      accesses performed by processes, without the necessity of ordinary
0186      programs having to know about security details involved.
0187 
0188      Keyrings are a special type of key.  They carry sets of other keys and can
0189      be searched for the desired key.  Each process may subscribe to a number
0190      of keyrings:
0191 
0192         Per-thread keying
0193         Per-process keyring
0194         Per-session keyring
0195 
0196      When a process accesses a key, if not already present, it will normally be
0197      cached on one of these keyrings for future accesses to find.
0198 
0199      For more information on using keys, see ``Documentation/security/keys/*``.
0200 
0201  5. LSM
0202 
0203      The Linux Security Module allows extra controls to be placed over the
0204      operations that a task may do.  Currently Linux supports several LSM
0205      options.
0206 
0207      Some work by labelling the objects in a system and then applying sets of
0208      rules (policies) that say what operations a task with one label may do to
0209      an object with another label.
0210 
0211  6. AF_KEY
0212 
0213      This is a socket-based approach to credential management for networking
0214      stacks [RFC 2367].  It isn't discussed by this document as it doesn't
0215      interact directly with task and file credentials; rather it keeps system
0216      level credentials.
0217 
0218 
0219 When a file is opened, part of the opening task's subjective context is
0220 recorded in the file struct created.  This allows operations using that file
0221 struct to use those credentials instead of the subjective context of the task
0222 that issued the operation.  An example of this would be a file opened on a
0223 network filesystem where the credentials of the opened file should be presented
0224 to the server, regardless of who is actually doing a read or a write upon it.
0225 
0226 
0227 File Markings
0228 =============
0229 
0230 Files on disk or obtained over the network may have annotations that form the
0231 objective security context of that file.  Depending on the type of filesystem,
0232 this may include one or more of the following:
0233 
0234  * UNIX UID, GID, mode;
0235  * Windows user ID;
0236  * Access control list;
0237  * LSM security label;
0238  * UNIX exec privilege escalation bits (SUID/SGID);
0239  * File capabilities exec privilege escalation bits.
0240 
0241 These are compared to the task's subjective security context, and certain
0242 operations allowed or disallowed as a result.  In the case of execve(), the
0243 privilege escalation bits come into play, and may allow the resulting process
0244 extra privileges, based on the annotations on the executable file.
0245 
0246 
0247 Task Credentials
0248 ================
0249 
0250 In Linux, all of a task's credentials are held in (uid, gid) or through
0251 (groups, keys, LSM security) a refcounted structure of type 'struct cred'.
0252 Each task points to its credentials by a pointer called 'cred' in its
0253 task_struct.
0254 
0255 Once a set of credentials has been prepared and committed, it may not be
0256 changed, barring the following exceptions:
0257 
0258  1. its reference count may be changed;
0259 
0260  2. the reference count on the group_info struct it points to may be changed;
0261 
0262  3. the reference count on the security data it points to may be changed;
0263 
0264  4. the reference count on any keyrings it points to may be changed;
0265 
0266  5. any keyrings it points to may be revoked, expired or have their security
0267     attributes changed; and
0268 
0269  6. the contents of any keyrings to which it points may be changed (the whole
0270     point of keyrings being a shared set of credentials, modifiable by anyone
0271     with appropriate access).
0272 
0273 To alter anything in the cred struct, the copy-and-replace principle must be
0274 adhered to.  First take a copy, then alter the copy and then use RCU to change
0275 the task pointer to make it point to the new copy.  There are wrappers to aid
0276 with this (see below).
0277 
0278 A task may only alter its _own_ credentials; it is no longer permitted for a
0279 task to alter another's credentials.  This means the ``capset()`` system call
0280 is no longer permitted to take any PID other than the one of the current
0281 process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
0282 longer permit attachment to process-specific keyrings in the requesting
0283 process as the instantiating process may need to create them.
0284 
0285 
0286 Immutable Credentials
0287 ---------------------
0288 
0289 Once a set of credentials has been made public (by calling ``commit_creds()``
0290 for example), it must be considered immutable, barring two exceptions:
0291 
0292  1. The reference count may be altered.
0293 
0294  2. While the keyring subscriptions of a set of credentials may not be
0295     changed, the keyrings subscribed to may have their contents altered.
0296 
0297 To catch accidental credential alteration at compile time, struct task_struct
0298 has _const_ pointers to its credential sets, as does struct file.  Furthermore,
0299 certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
0300 pointers, thus rendering casts unnecessary, but require to temporarily ditch
0301 the const qualification to be able to alter the reference count.
0302 
0303 
0304 Accessing Task Credentials
0305 --------------------------
0306 
0307 A task being able to alter only its own credentials permits the current process
0308 to read or replace its own credentials without the need for any form of locking
0309 -- which simplifies things greatly.  It can just call::
0310 
0311         const struct cred *current_cred()
0312 
0313 to get a pointer to its credentials structure, and it doesn't have to release
0314 it afterwards.
0315 
0316 There are convenience wrappers for retrieving specific aspects of a task's
0317 credentials (the value is simply returned in each case)::
0318 
0319         uid_t current_uid(void)         Current's real UID
0320         gid_t current_gid(void)         Current's real GID
0321         uid_t current_euid(void)        Current's effective UID
0322         gid_t current_egid(void)        Current's effective GID
0323         uid_t current_fsuid(void)       Current's file access UID
0324         gid_t current_fsgid(void)       Current's file access GID
0325         kernel_cap_t current_cap(void)  Current's effective capabilities
0326         struct user_struct *current_user(void)  Current's user account
0327 
0328 There are also convenience wrappers for retrieving specific associated pairs of
0329 a task's credentials::
0330 
0331         void current_uid_gid(uid_t *, gid_t *);
0332         void current_euid_egid(uid_t *, gid_t *);
0333         void current_fsuid_fsgid(uid_t *, gid_t *);
0334 
0335 which return these pairs of values through their arguments after retrieving
0336 them from the current task's credentials.
0337 
0338 
0339 In addition, there is a function for obtaining a reference on the current
0340 process's current set of credentials::
0341 
0342         const struct cred *get_current_cred(void);
0343 
0344 and functions for getting references to one of the credentials that don't
0345 actually live in struct cred::
0346 
0347         struct user_struct *get_current_user(void);
0348         struct group_info *get_current_groups(void);
0349 
0350 which get references to the current process's user accounting structure and
0351 supplementary groups list respectively.
0352 
0353 Once a reference has been obtained, it must be released with ``put_cred()``,
0354 ``free_uid()`` or ``put_group_info()`` as appropriate.
0355 
0356 
0357 Accessing Another Task's Credentials
0358 ------------------------------------
0359 
0360 While a task may access its own credentials without the need for locking, the
0361 same is not true of a task wanting to access another task's credentials.  It
0362 must use the RCU read lock and ``rcu_dereference()``.
0363 
0364 The ``rcu_dereference()`` is wrapped by::
0365 
0366         const struct cred *__task_cred(struct task_struct *task);
0367 
0368 This should be used inside the RCU read lock, as in the following example::
0369 
0370         void foo(struct task_struct *t, struct foo_data *f)
0371         {
0372                 const struct cred *tcred;
0373                 ...
0374                 rcu_read_lock();
0375                 tcred = __task_cred(t);
0376                 f->uid = tcred->uid;
0377                 f->gid = tcred->gid;
0378                 f->groups = get_group_info(tcred->groups);
0379                 rcu_read_unlock();
0380                 ...
0381         }
0382 
0383 Should it be necessary to hold another task's credentials for a long period of
0384 time, and possibly to sleep while doing so, then the caller should get a
0385 reference on them using::
0386 
0387         const struct cred *get_task_cred(struct task_struct *task);
0388 
0389 This does all the RCU magic inside of it.  The caller must call put_cred() on
0390 the credentials so obtained when they're finished with.
0391 
0392 .. note::
0393    The result of ``__task_cred()`` should not be passed directly to
0394    ``get_cred()`` as this may race with ``commit_cred()``.
0395 
0396 There are a couple of convenience functions to access bits of another task's
0397 credentials, hiding the RCU magic from the caller::
0398 
0399         uid_t task_uid(task)            Task's real UID
0400         uid_t task_euid(task)           Task's effective UID
0401 
0402 If the caller is holding the RCU read lock at the time anyway, then::
0403 
0404         __task_cred(task)->uid
0405         __task_cred(task)->euid
0406 
0407 should be used instead.  Similarly, if multiple aspects of a task's credentials
0408 need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
0409 the result stored in a temporary pointer and then the credential aspects called
0410 from that before dropping the lock.  This prevents the potentially expensive
0411 RCU magic from being invoked multiple times.
0412 
0413 Should some other single aspect of another task's credentials need to be
0414 accessed, then this can be used::
0415 
0416         task_cred_xxx(task, member)
0417 
0418 where 'member' is a non-pointer member of the cred struct.  For instance::
0419 
0420         uid_t task_cred_xxx(task, suid);
0421 
0422 will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
0423 magic.  This may not be used for pointer members as what they point to may
0424 disappear the moment the RCU read lock is dropped.
0425 
0426 
0427 Altering Credentials
0428 --------------------
0429 
0430 As previously mentioned, a task may only alter its own credentials, and may not
0431 alter those of another task.  This means that it doesn't need to use any
0432 locking to alter its own credentials.
0433 
0434 To alter the current process's credentials, a function should first prepare a
0435 new set of credentials by calling::
0436 
0437         struct cred *prepare_creds(void);
0438 
0439 this locks current->cred_replace_mutex and then allocates and constructs a
0440 duplicate of the current process's credentials, returning with the mutex still
0441 held if successful.  It returns NULL if not successful (out of memory).
0442 
0443 The mutex prevents ``ptrace()`` from altering the ptrace state of a process
0444 while security checks on credentials construction and changing is taking place
0445 as the ptrace state may alter the outcome, particularly in the case of
0446 ``execve()``.
0447 
0448 The new credentials set should be altered appropriately, and any security
0449 checks and hooks done.  Both the current and the proposed sets of credentials
0450 are available for this purpose as current_cred() will return the current set
0451 still at this point.
0452 
0453 When replacing the group list, the new list must be sorted before it
0454 is added to the credential, as a binary search is used to test for
0455 membership.  In practice, this means groups_sort() should be
0456 called before set_groups() or set_current_groups().
0457 groups_sort() must not be called on a ``struct group_list`` which
0458 is shared as it may permute elements as part of the sorting process
0459 even if the array is already sorted.
0460 
0461 When the credential set is ready, it should be committed to the current process
0462 by calling::
0463 
0464         int commit_creds(struct cred *new);
0465 
0466 This will alter various aspects of the credentials and the process, giving the
0467 LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
0468 actually commit the new credentials to ``current->cred``, it will release
0469 ``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
0470 will notify the scheduler and others of the changes.
0471 
0472 This function is guaranteed to return 0, so that it can be tail-called at the
0473 end of such functions as ``sys_setresuid()``.
0474 
0475 Note that this function consumes the caller's reference to the new credentials.
0476 The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
0477 
0478 Furthermore, once this function has been called on a new set of credentials,
0479 those credentials may _not_ be changed further.
0480 
0481 
0482 Should the security checks fail or some other error occur after
0483 ``prepare_creds()`` has been called, then the following function should be
0484 invoked::
0485 
0486         void abort_creds(struct cred *new);
0487 
0488 This releases the lock on ``current->cred_replace_mutex`` that
0489 ``prepare_creds()`` got and then releases the new credentials.
0490 
0491 
0492 A typical credentials alteration function would look something like this::
0493 
0494         int alter_suid(uid_t suid)
0495         {
0496                 struct cred *new;
0497                 int ret;
0498 
0499                 new = prepare_creds();
0500                 if (!new)
0501                         return -ENOMEM;
0502 
0503                 new->suid = suid;
0504                 ret = security_alter_suid(new);
0505                 if (ret < 0) {
0506                         abort_creds(new);
0507                         return ret;
0508                 }
0509 
0510                 return commit_creds(new);
0511         }
0512 
0513 
0514 Managing Credentials
0515 --------------------
0516 
0517 There are some functions to help manage credentials:
0518 
0519  - ``void put_cred(const struct cred *cred);``
0520 
0521      This releases a reference to the given set of credentials.  If the
0522      reference count reaches zero, the credentials will be scheduled for
0523      destruction by the RCU system.
0524 
0525  - ``const struct cred *get_cred(const struct cred *cred);``
0526 
0527      This gets a reference on a live set of credentials, returning a pointer to
0528      that set of credentials.
0529 
0530  - ``struct cred *get_new_cred(struct cred *cred);``
0531 
0532      This gets a reference on a set of credentials that is under construction
0533      and is thus still mutable, returning a pointer to that set of credentials.
0534 
0535 
0536 Open File Credentials
0537 =====================
0538 
0539 When a new file is opened, a reference is obtained on the opening task's
0540 credentials and this is attached to the file struct as ``f_cred`` in place of
0541 ``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
0542 ``file->f_gid`` should now access ``file->f_cred->fsuid`` and
0543 ``file->f_cred->fsgid``.
0544 
0545 It is safe to access ``f_cred`` without the use of RCU or locking because the
0546 pointer will not change over the lifetime of the file struct, and nor will the
0547 contents of the cred struct pointed to, barring the exceptions listed above
0548 (see the Task Credentials section).
0549 
0550 To avoid "confused deputy" privilege escalation attacks, access control checks
0551 during subsequent operations on an opened file should use these credentials
0552 instead of "current"'s credentials, as the file may have been passed to a more
0553 privileged process.
0554 
0555 Overriding the VFS's Use of Credentials
0556 =======================================
0557 
0558 Under some circumstances it is desirable to override the credentials used by
0559 the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
0560 different set of credentials.  This is done in the following places:
0561 
0562  * ``sys_faccessat()``.
0563  * ``do_coredump()``.
0564  * nfs4recover.c.