Back to home page

OSCL-LXR

 
 

    


0001 ======================
0002 ioctl based interfaces
0003 ======================
0004 
0005 ioctl() is the most common way for applications to interface
0006 with device drivers. It is flexible and easily extended by adding new
0007 commands and can be passed through character devices, block devices as
0008 well as sockets and other special file descriptors.
0009 
0010 However, it is also very easy to get ioctl command definitions wrong,
0011 and hard to fix them later without breaking existing applications,
0012 so this documentation tries to help developers get it right.
0013 
0014 Command number definitions
0015 ==========================
0016 
0017 The command number, or request number, is the second argument passed to
0018 the ioctl system call. While this can be any 32-bit number that uniquely
0019 identifies an action for a particular driver, there are a number of
0020 conventions around defining them.
0021 
0022 ``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
0023 ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
0024 ``_IOW``, and ``_IOWR``. These should be used for all new commands,
0025 with the correct parameters:
0026 
0027 _IO/_IOR/_IOW/_IOWR
0028    The macro name specifies how the argument will be used.  It may be a
0029    pointer to data to be passed into the kernel (_IOW), out of the kernel
0030    (_IOR), or both (_IOWR).  _IO can indicate either commands with no
0031    argument or those passing an integer value instead of a pointer.
0032    It is recommended to only use _IO for commands without arguments,
0033    and use pointers for passing data.
0034 
0035 type
0036    An 8-bit number, often a character literal, specific to a subsystem
0037    or driver, and listed in Documentation/userspace-api/ioctl/ioctl-number.rst
0038 
0039 nr
0040   An 8-bit number identifying the specific command, unique for a give
0041   value of 'type'
0042 
0043 data_type
0044   The name of the data type pointed to by the argument, the command number
0045   encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
0046   leading to a limit of 8191 bytes for the maximum size of the argument.
0047   Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
0048   will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
0049   _IO does not have a data_type parameter.
0050 
0051 
0052 Interface versions
0053 ==================
0054 
0055 Some subsystems use version numbers in data structures to overload
0056 commands with different interpretations of the argument.
0057 
0058 This is generally a bad idea, since changes to existing commands tend
0059 to break existing applications.
0060 
0061 A better approach is to add a new ioctl command with a new number. The
0062 old command still needs to be implemented in the kernel for compatibility,
0063 but this can be a wrapper around the new implementation.
0064 
0065 Return code
0066 ===========
0067 
0068 ioctl commands can return negative error codes as documented in errno(3);
0069 these get turned into errno values in user space. On success, the return
0070 code should be zero. It is also possible but not recommended to return
0071 a positive 'long' value.
0072 
0073 When the ioctl callback is called with an unknown command number, the
0074 handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
0075 -ENOTTY being returned from the system call. Some subsystems return
0076 -ENOSYS or -EINVAL here for historic reasons, but this is wrong.
0077 
0078 Prior to Linux 5.5, compat_ioctl handlers were required to return
0079 -ENOIOCTLCMD in order to use the fallback conversion into native
0080 commands. As all subsystems are now responsible for handling compat
0081 mode themselves, this is no longer needed, but it may be important to
0082 consider when backporting bug fixes to older kernels.
0083 
0084 Timestamps
0085 ==========
0086 
0087 Traditionally, timestamps and timeout values are passed as ``struct
0088 timespec`` or ``struct timeval``, but these are problematic because of
0089 incompatible definitions of these structures in user space after the
0090 move to 64-bit time_t.
0091 
0092 The ``struct __kernel_timespec`` type can be used instead to be embedded
0093 in other data structures when separate second/nanosecond values are
0094 desired, or passed to user space directly. This is still not ideal though,
0095 as the structure matches neither the kernel's timespec64 nor the user
0096 space timespec exactly. The get_timespec64() and put_timespec64() helper
0097 functions can be used to ensure that the layout remains compatible with
0098 user space and the padding is treated correctly.
0099 
0100 As it is cheap to convert seconds to nanoseconds, but the opposite
0101 requires an expensive 64-bit division, a simple __u64 nanosecond value
0102 can be simpler and more efficient.
0103 
0104 Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
0105 as returned by ktime_get_ns() or ktime_get_ts64().  Unlike
0106 CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
0107 or forwards due to leap second adjustments and clock_settime() calls.
0108 
0109 ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
0110 need to be persistent across a reboot or between multiple machines.
0111 
0112 32-bit compat mode
0113 ==================
0114 
0115 In order to support 32-bit user space running on a 64-bit machine, each
0116 subsystem or driver that implements an ioctl callback handler must also
0117 implement the corresponding compat_ioctl handler.
0118 
0119 As long as all the rules for data structures are followed, this is as
0120 easy as setting the .compat_ioctl pointer to a helper function such as
0121 compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
0122 
0123 compat_ptr()
0124 ------------
0125 
0126 On the s390 architecture, 31-bit user space has ambiguous representations
0127 for data pointers, with the upper bit being ignored. When running such
0128 a process in compat mode, the compat_ptr() helper must be used to
0129 clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
0130 pointer.  On other architectures, this macro only performs a cast to a
0131 ``void __user *`` pointer.
0132 
0133 In an compat_ioctl() callback, the last argument is an unsigned long,
0134 which can be interpreted as either a pointer or a scalar depending on
0135 the command. If it is a scalar, then compat_ptr() must not be used, to
0136 ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
0137 for arguments with the upper bit set.
0138 
0139 The compat_ptr_ioctl() helper can be used in place of a custom
0140 compat_ioctl file operation for drivers that only take arguments that
0141 are pointers to compatible data structures.
0142 
0143 Structure layout
0144 ----------------
0145 
0146 Compatible data structures have the same layout on all architectures,
0147 avoiding all problematic members:
0148 
0149 * ``long`` and ``unsigned long`` are the size of a register, so
0150   they can be either 32-bit or 64-bit wide and cannot be used in portable
0151   data structures. Fixed-length replacements are ``__s32``, ``__u32``,
0152   ``__s64`` and ``__u64``.
0153 
0154 * Pointers have the same problem, in addition to requiring the
0155   use of compat_ptr(). The best workaround is to use ``__u64``
0156   in place of pointers, which requires a cast to ``uintptr_t`` in user
0157   space, and the use of u64_to_user_ptr() in the kernel to convert
0158   it back into a user pointer.
0159 
0160 * On the x86-32 (i386) architecture, the alignment of 64-bit variables
0161   is only 32-bit, but they are naturally aligned on most other
0162   architectures including x86-64. This means a structure like::
0163 
0164     struct foo {
0165         __u32 a;
0166         __u64 b;
0167         __u32 c;
0168     };
0169 
0170   has four bytes of padding between a and b on x86-64, plus another four
0171   bytes of padding at the end, but no padding on i386, and it needs a
0172   compat_ioctl conversion handler to translate between the two formats.
0173 
0174   To avoid this problem, all structures should have their members
0175   naturally aligned, or explicit reserved fields added in place of the
0176   implicit padding. The ``pahole`` tool can be used for checking the
0177   alignment.
0178 
0179 * On ARM OABI user space, structures are padded to multiples of 32-bit,
0180   making some structs incompatible with modern EABI kernels if they
0181   do not end on a 32-bit boundary.
0182 
0183 * On the m68k architecture, struct members are not guaranteed to have an
0184   alignment greater than 16-bit, which is a problem when relying on
0185   implicit padding.
0186 
0187 * Bitfields and enums generally work as one would expect them to,
0188   but some properties of them are implementation-defined, so it is better
0189   to avoid them completely in ioctl interfaces.
0190 
0191 * ``char`` members can be either signed or unsigned, depending on
0192   the architecture, so the __u8 and __s8 types should be used for 8-bit
0193   integer values, though char arrays are clearer for fixed-length strings.
0194 
0195 Information leaks
0196 =================
0197 
0198 Uninitialized data must not be copied back to user space, as this can
0199 cause an information leak, which can be used to defeat kernel address
0200 space layout randomization (KASLR), helping in an attack.
0201 
0202 For this reason (and for compat support) it is best to avoid any
0203 implicit padding in data structures.  Where there is implicit padding
0204 in an existing structure, kernel drivers must be careful to fully
0205 initialize an instance of the structure before copying it to user
0206 space.  This is usually done by calling memset() before assigning to
0207 individual members.
0208 
0209 Subsystem abstractions
0210 ======================
0211 
0212 While some device drivers implement their own ioctl function, most
0213 subsystems implement the same command for multiple drivers.  Ideally the
0214 subsystem has an .ioctl() handler that copies the arguments from and
0215 to user space, passing them into subsystem specific callback functions
0216 through normal kernel pointers.
0217 
0218 This helps in various ways:
0219 
0220 * Applications written for one driver are more likely to work for
0221   another one in the same subsystem if there are no subtle differences
0222   in the user space ABI.
0223 
0224 * The complexity of user space access and data structure layout is done
0225   in one place, reducing the potential for implementation bugs.
0226 
0227 * It is more likely to be reviewed by experienced developers
0228   that can spot problems in the interface when the ioctl is shared
0229   between multiple drivers than when it is only used in a single driver.
0230 
0231 Alternatives to ioctl
0232 =====================
0233 
0234 There are many cases in which ioctl is not the best solution for a
0235 problem. Alternatives include:
0236 
0237 * System calls are a better choice for a system-wide feature that
0238   is not tied to a physical device or constrained by the file system
0239   permissions of a character device node
0240 
0241 * netlink is the preferred way of configuring any network related
0242   objects through sockets.
0243 
0244 * debugfs is used for ad-hoc interfaces for debugging functionality
0245   that does not need to be exposed as a stable interface to applications.
0246 
0247 * sysfs is a good way to expose the state of an in-kernel object
0248   that is not tied to a file descriptor.
0249 
0250 * configfs can be used for more complex configuration than sysfs
0251 
0252 * A custom file system can provide extra flexibility with a simple
0253   user interface but adds a lot of complexity to the implementation.