0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ===================================
0004 File management in the Linux kernel
0005 ===================================
0006
0007 This document describes how locking for files (struct file)
0008 and file descriptor table (struct files) works.
0009
0010 Up until 2.6.12, the file descriptor table has been protected
0011 with a lock (files->file_lock) and reference count (files->count).
0012 ->file_lock protected accesses to all the file related fields
0013 of the table. ->count was used for sharing the file descriptor
0014 table between tasks cloned with CLONE_FILES flag. Typically
0015 this would be the case for posix threads. As with the common
0016 refcounting model in the kernel, the last task doing
0017 a put_files_struct() frees the file descriptor (fd) table.
0018 The files (struct file) themselves are protected using
0019 reference count (->f_count).
0020
0021 In the new lock-free model of file descriptor management,
0022 the reference counting is similar, but the locking is
0023 based on RCU. The file descriptor table contains multiple
0024 elements - the fd sets (open_fds and close_on_exec, the
0025 array of file pointers, the sizes of the sets and the array
0026 etc.). In order for the updates to appear atomic to
0027 a lock-free reader, all the elements of the file descriptor
0028 table are in a separate structure - struct fdtable.
0029 files_struct contains a pointer to struct fdtable through
0030 which the actual fd table is accessed. Initially the
0031 fdtable is embedded in files_struct itself. On a subsequent
0032 expansion of fdtable, a new fdtable structure is allocated
0033 and files->fdtab points to the new structure. The fdtable
0034 structure is freed with RCU and lock-free readers either
0035 see the old fdtable or the new fdtable making the update
0036 appear atomic. Here are the locking rules for
0037 the fdtable structure -
0038
0039 1. All references to the fdtable must be done through
0040 the files_fdtable() macro::
0041
0042 struct fdtable *fdt;
0043
0044 rcu_read_lock();
0045
0046 fdt = files_fdtable(files);
0047 ....
0048 if (n <= fdt->max_fds)
0049 ....
0050 ...
0051 rcu_read_unlock();
0052
0053 files_fdtable() uses rcu_dereference() macro which takes care of
0054 the memory barrier requirements for lock-free dereference.
0055 The fdtable pointer must be read within the read-side
0056 critical section.
0057
0058 2. Reading of the fdtable as described above must be protected
0059 by rcu_read_lock()/rcu_read_unlock().
0060
0061 3. For any update to the fd table, files->file_lock must
0062 be held.
0063
0064 4. To look up the file structure given an fd, a reader
0065 must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These
0066 take care of barrier requirements due to lock-free lookup.
0067
0068 An example::
0069
0070 struct file *file;
0071
0072 rcu_read_lock();
0073 file = lookup_fd_rcu(fd);
0074 if (file) {
0075 ...
0076 }
0077 ....
0078 rcu_read_unlock();
0079
0080 5. Handling of the file structures is special. Since the look-up
0081 of the fd (fget()/fget_light()) are lock-free, it is possible
0082 that look-up may race with the last put() operation on the
0083 file structure. This is avoided using atomic_long_inc_not_zero()
0084 on ->f_count::
0085
0086 rcu_read_lock();
0087 file = files_lookup_fd_rcu(files, fd);
0088 if (file) {
0089 if (atomic_long_inc_not_zero(&file->f_count))
0090 *fput_needed = 1;
0091 else
0092 /* Didn't get the reference, someone's freed */
0093 file = NULL;
0094 }
0095 rcu_read_unlock();
0096 ....
0097 return file;
0098
0099 atomic_long_inc_not_zero() detects if refcounts is already zero or
0100 goes to zero during increment. If it does, we fail
0101 fget()/fget_light().
0102
0103 6. Since both fdtable and file structures can be looked up
0104 lock-free, they must be installed using rcu_assign_pointer()
0105 API. If they are looked up lock-free, rcu_dereference()
0106 must be used. However it is advisable to use files_fdtable()
0107 and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues.
0108
0109 7. While updating, the fdtable pointer must be looked up while
0110 holding files->file_lock. If ->file_lock is dropped, then
0111 another thread expand the files thereby creating a new
0112 fdtable and making the earlier fdtable pointer stale.
0113
0114 For example::
0115
0116 spin_lock(&files->file_lock);
0117 fd = locate_fd(files, file, start);
0118 if (fd >= 0) {
0119 /* locate_fd() may have expanded fdtable, load the ptr */
0120 fdt = files_fdtable(files);
0121 __set_open_fd(fd, fdt);
0122 __clear_close_on_exec(fd, fdt);
0123 spin_unlock(&files->file_lock);
0124 .....
0125
0126 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
0127 the fdtable pointer (fdt) must be loaded after locate_fd().
0128