0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 .. _inline_encryption:
0004
0005 =================
0006 Inline Encryption
0007 =================
0008
0009 Background
0010 ==========
0011
0012 Inline encryption hardware sits logically between memory and disk, and can
0013 en/decrypt data as it goes in/out of the disk. For each I/O request, software
0014 can control exactly how the inline encryption hardware will en/decrypt the data
0015 in terms of key, algorithm, data unit size (the granularity of en/decryption),
0016 and data unit number (a value that determines the initialization vector(s)).
0017
0018 Some inline encryption hardware accepts all encryption parameters including raw
0019 keys directly in low-level I/O requests. However, most inline encryption
0020 hardware instead has a fixed number of "keyslots" and requires that the key,
0021 algorithm, and data unit size first be programmed into a keyslot. Each
0022 low-level I/O request then just contains a keyslot index and data unit number.
0023
0024 Note that inline encryption hardware is very different from traditional crypto
0025 accelerators, which are supported through the kernel crypto API. Traditional
0026 crypto accelerators operate on memory regions, whereas inline encryption
0027 hardware operates on I/O requests. Thus, inline encryption hardware needs to be
0028 managed by the block layer, not the kernel crypto API.
0029
0030 Inline encryption hardware is also very different from "self-encrypting drives",
0031 such as those based on the TCG Opal or ATA Security standards. Self-encrypting
0032 drives don't provide fine-grained control of encryption and provide no way to
0033 verify the correctness of the resulting ciphertext. Inline encryption hardware
0034 provides fine-grained control of encryption, including the choice of key and
0035 initialization vector for each sector, and can be tested for correctness.
0036
0037 Objective
0038 =========
0039
0040 We want to support inline encryption in the kernel. To make testing easier, we
0041 also want support for falling back to the kernel crypto API when actual inline
0042 encryption hardware is absent. We also want inline encryption to work with
0043 layered devices like device-mapper and loopback (i.e. we want to be able to use
0044 the inline encryption hardware of the underlying devices if present, or else
0045 fall back to crypto API en/decryption).
0046
0047 Constraints and notes
0048 =====================
0049
0050 - We need a way for upper layers (e.g. filesystems) to specify an encryption
0051 context to use for en/decrypting a bio, and device drivers (e.g. UFSHCD) need
0052 to be able to use that encryption context when they process the request.
0053 Encryption contexts also introduce constraints on bio merging; the block layer
0054 needs to be aware of these constraints.
0055
0056 - Different inline encryption hardware has different supported algorithms,
0057 supported data unit sizes, maximum data unit numbers, etc. We call these
0058 properties the "crypto capabilities". We need a way for device drivers to
0059 advertise crypto capabilities to upper layers in a generic way.
0060
0061 - Inline encryption hardware usually (but not always) requires that keys be
0062 programmed into keyslots before being used. Since programming keyslots may be
0063 slow and there may not be very many keyslots, we shouldn't just program the
0064 key for every I/O request, but rather keep track of which keys are in the
0065 keyslots and reuse an already-programmed keyslot when possible.
0066
0067 - Upper layers typically define a specific end-of-life for crypto keys, e.g.
0068 when an encrypted directory is locked or when a crypto mapping is torn down.
0069 At these times, keys are wiped from memory. We must provide a way for upper
0070 layers to also evict keys from any keyslots they are present in.
0071
0072 - When possible, device-mapper devices must be able to pass through the inline
0073 encryption support of their underlying devices. However, it doesn't make
0074 sense for device-mapper devices to have keyslots themselves.
0075
0076 Basic design
0077 ============
0078
0079 We introduce ``struct blk_crypto_key`` to represent an inline encryption key and
0080 how it will be used. This includes the actual bytes of the key; the size of the
0081 key; the algorithm and data unit size the key will be used with; and the number
0082 of bytes needed to represent the maximum data unit number the key will be used
0083 with.
0084
0085 We introduce ``struct bio_crypt_ctx`` to represent an encryption context. It
0086 contains a data unit number and a pointer to a blk_crypto_key. We add pointers
0087 to a bio_crypt_ctx to ``struct bio`` and ``struct request``; this allows users
0088 of the block layer (e.g. filesystems) to provide an encryption context when
0089 creating a bio and have it be passed down the stack for processing by the block
0090 layer and device drivers. Note that the encryption context doesn't explicitly
0091 say whether to encrypt or decrypt, as that is implicit from the direction of the
0092 bio; WRITE means encrypt, and READ means decrypt.
0093
0094 We also introduce ``struct blk_crypto_profile`` to contain all generic inline
0095 encryption-related state for a particular inline encryption device. The
0096 blk_crypto_profile serves as the way that drivers for inline encryption hardware
0097 advertise their crypto capabilities and provide certain functions (e.g.,
0098 functions to program and evict keys) to upper layers. Each device driver that
0099 wants to support inline encryption will construct a blk_crypto_profile, then
0100 associate it with the disk's request_queue.
0101
0102 The blk_crypto_profile also manages the hardware's keyslots, when applicable.
0103 This happens in the block layer, so that users of the block layer can just
0104 specify encryption contexts and don't need to know about keyslots at all, nor do
0105 device drivers need to care about most details of keyslot management.
0106
0107 Specifically, for each keyslot, the block layer (via the blk_crypto_profile)
0108 keeps track of which blk_crypto_key that keyslot contains (if any), and how many
0109 in-flight I/O requests are using it. When the block layer creates a
0110 ``struct request`` for a bio that has an encryption context, it grabs a keyslot
0111 that already contains the key if possible. Otherwise it waits for an idle
0112 keyslot (a keyslot that isn't in-use by any I/O), then programs the key into the
0113 least-recently-used idle keyslot using the function the device driver provided.
0114 In both cases, the resulting keyslot is stored in the ``crypt_keyslot`` field of
0115 the request, where it is then accessible to device drivers and is released after
0116 the request completes.
0117
0118 ``struct request`` also contains a pointer to the original bio_crypt_ctx.
0119 Requests can be built from multiple bios, and the block layer must take the
0120 encryption context into account when trying to merge bios and requests. For two
0121 bios/requests to be merged, they must have compatible encryption contexts: both
0122 unencrypted, or both encrypted with the same key and contiguous data unit
0123 numbers. Only the encryption context for the first bio in a request is
0124 retained, since the remaining bios have been verified to be merge-compatible
0125 with the first bio.
0126
0127 To make it possible for inline encryption to work with request_queue based
0128 layered devices, when a request is cloned, its encryption context is cloned as
0129 well. When the cloned request is submitted, it is then processed as usual; this
0130 includes getting a keyslot from the clone's target device if needed.
0131
0132 blk-crypto-fallback
0133 ===================
0134
0135 It is desirable for the inline encryption support of upper layers (e.g.
0136 filesystems) to be testable without real inline encryption hardware, and
0137 likewise for the block layer's keyslot management logic. It is also desirable
0138 to allow upper layers to just always use inline encryption rather than have to
0139 implement encryption in multiple ways.
0140
0141 Therefore, we also introduce *blk-crypto-fallback*, which is an implementation
0142 of inline encryption using the kernel crypto API. blk-crypto-fallback is built
0143 into the block layer, so it works on any block device without any special setup.
0144 Essentially, when a bio with an encryption context is submitted to a
0145 request_queue that doesn't support that encryption context, the block layer will
0146 handle en/decryption of the bio using blk-crypto-fallback.
0147
0148 For encryption, the data cannot be encrypted in-place, as callers usually rely
0149 on it being unmodified. Instead, blk-crypto-fallback allocates bounce pages,
0150 fills a new bio with those bounce pages, encrypts the data into those bounce
0151 pages, and submits that "bounce" bio. When the bounce bio completes,
0152 blk-crypto-fallback completes the original bio. If the original bio is too
0153 large, multiple bounce bios may be required; see the code for details.
0154
0155 For decryption, blk-crypto-fallback "wraps" the bio's completion callback
0156 (``bi_complete``) and private data (``bi_private``) with its own, unsets the
0157 bio's encryption context, then submits the bio. If the read completes
0158 successfully, blk-crypto-fallback restores the bio's original completion
0159 callback and private data, then decrypts the bio's data in-place using the
0160 kernel crypto API. Decryption happens from a workqueue, as it may sleep.
0161 Afterwards, blk-crypto-fallback completes the bio.
0162
0163 In both cases, the bios that blk-crypto-fallback submits no longer have an
0164 encryption context. Therefore, lower layers only see standard unencrypted I/O.
0165
0166 blk-crypto-fallback also defines its own blk_crypto_profile and has its own
0167 "keyslots"; its keyslots contain ``struct crypto_skcipher`` objects. The reason
0168 for this is twofold. First, it allows the keyslot management logic to be tested
0169 without actual inline encryption hardware. Second, similar to actual inline
0170 encryption hardware, the crypto API doesn't accept keys directly in requests but
0171 rather requires that keys be set ahead of time, and setting keys can be
0172 expensive; moreover, allocating a crypto_skcipher can't happen on the I/O path
0173 at all due to the locks it takes. Therefore, the concept of keyslots still
0174 makes sense for blk-crypto-fallback.
0175
0176 Note that regardless of whether real inline encryption hardware or
0177 blk-crypto-fallback is used, the ciphertext written to disk (and hence the
0178 on-disk format of data) will be the same (assuming that both the inline
0179 encryption hardware's implementation and the kernel crypto API's implementation
0180 of the algorithm being used adhere to spec and function correctly).
0181
0182 blk-crypto-fallback is optional and is controlled by the
0183 ``CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK`` kernel configuration option.
0184
0185 API presented to users of the block layer
0186 =========================================
0187
0188 ``blk_crypto_config_supported()`` allows users to check ahead of time whether
0189 inline encryption with particular crypto settings will work on a particular
0190 request_queue -- either via hardware or via blk-crypto-fallback. This function
0191 takes in a ``struct blk_crypto_config`` which is like blk_crypto_key, but omits
0192 the actual bytes of the key and instead just contains the algorithm, data unit
0193 size, etc. This function can be useful if blk-crypto-fallback is disabled.
0194
0195 ``blk_crypto_init_key()`` allows users to initialize a blk_crypto_key.
0196
0197 Users must call ``blk_crypto_start_using_key()`` before actually starting to use
0198 a blk_crypto_key on a request_queue (even if ``blk_crypto_config_supported()``
0199 was called earlier). This is needed to initialize blk-crypto-fallback if it
0200 will be needed. This must not be called from the data path, as this may have to
0201 allocate resources, which may deadlock in that case.
0202
0203 Next, to attach an encryption context to a bio, users should call
0204 ``bio_crypt_set_ctx()``. This function allocates a bio_crypt_ctx and attaches
0205 it to a bio, given the blk_crypto_key and the data unit number that will be used
0206 for en/decryption. Users don't need to worry about freeing the bio_crypt_ctx
0207 later, as that happens automatically when the bio is freed or reset.
0208
0209 Finally, when done using inline encryption with a blk_crypto_key on a
0210 request_queue, users must call ``blk_crypto_evict_key()``. This ensures that
0211 the key is evicted from all keyslots it may be programmed into and unlinked from
0212 any kernel data structures it may be linked into.
0213
0214 In summary, for users of the block layer, the lifecycle of a blk_crypto_key is
0215 as follows:
0216
0217 1. ``blk_crypto_config_supported()`` (optional)
0218 2. ``blk_crypto_init_key()``
0219 3. ``blk_crypto_start_using_key()``
0220 4. ``bio_crypt_set_ctx()`` (potentially many times)
0221 5. ``blk_crypto_evict_key()`` (after all I/O has completed)
0222 6. Zeroize the blk_crypto_key (this has no dedicated function)
0223
0224 If a blk_crypto_key is being used on multiple request_queues, then
0225 ``blk_crypto_config_supported()`` (if used), ``blk_crypto_start_using_key()``,
0226 and ``blk_crypto_evict_key()`` must be called on each request_queue.
0227
0228 API presented to device drivers
0229 ===============================
0230
0231 A device driver that wants to support inline encryption must set up a
0232 blk_crypto_profile in the request_queue of its device. To do this, it first
0233 must call ``blk_crypto_profile_init()`` (or its resource-managed variant
0234 ``devm_blk_crypto_profile_init()``), providing the number of keyslots.
0235
0236 Next, it must advertise its crypto capabilities by setting fields in the
0237 blk_crypto_profile, e.g. ``modes_supported`` and ``max_dun_bytes_supported``.
0238
0239 It then must set function pointers in the ``ll_ops`` field of the
0240 blk_crypto_profile to tell upper layers how to control the inline encryption
0241 hardware, e.g. how to program and evict keyslots. Most drivers will need to
0242 implement ``keyslot_program`` and ``keyslot_evict``. For details, see the
0243 comments for ``struct blk_crypto_ll_ops``.
0244
0245 Once the driver registers a blk_crypto_profile with a request_queue, I/O
0246 requests the driver receives via that queue may have an encryption context. All
0247 encryption contexts will be compatible with the crypto capabilities declared in
0248 the blk_crypto_profile, so drivers don't need to worry about handling
0249 unsupported requests. Also, if a nonzero number of keyslots was declared in the
0250 blk_crypto_profile, then all I/O requests that have an encryption context will
0251 also have a keyslot which was already programmed with the appropriate key.
0252
0253 If the driver implements runtime suspend and its blk_crypto_ll_ops don't work
0254 while the device is runtime-suspended, then the driver must also set the ``dev``
0255 field of the blk_crypto_profile to point to the ``struct device`` that will be
0256 resumed before any of the low-level operations are called.
0257
0258 If there are situations where the inline encryption hardware loses the contents
0259 of its keyslots, e.g. device resets, the driver must handle reprogramming the
0260 keyslots. To do this, the driver may call ``blk_crypto_reprogram_all_keys()``.
0261
0262 Finally, if the driver used ``blk_crypto_profile_init()`` instead of
0263 ``devm_blk_crypto_profile_init()``, then it is responsible for calling
0264 ``blk_crypto_profile_destroy()`` when the crypto profile is no longer needed.
0265
0266 Layered Devices
0267 ===============
0268
0269 Request queue based layered devices like dm-rq that wish to support inline
0270 encryption need to create their own blk_crypto_profile for their request_queue,
0271 and expose whatever functionality they choose. When a layered device wants to
0272 pass a clone of that request to another request_queue, blk-crypto will
0273 initialize and prepare the clone as necessary; see
0274 ``blk_crypto_insert_cloned_request()``.
0275
0276 Interaction between inline encryption and blk integrity
0277 =======================================================
0278
0279 At the time of this patch, there is no real hardware that supports both these
0280 features. However, these features do interact with each other, and it's not
0281 completely trivial to make them both work together properly. In particular,
0282 when a WRITE bio wants to use inline encryption on a device that supports both
0283 features, the bio will have an encryption context specified, after which
0284 its integrity information is calculated (using the plaintext data, since
0285 the encryption will happen while data is being written), and the data and
0286 integrity info is sent to the device. Obviously, the integrity info must be
0287 verified before the data is encrypted. After the data is encrypted, the device
0288 must not store the integrity info that it received with the plaintext data
0289 since that might reveal information about the plaintext data. As such, it must
0290 re-generate the integrity info from the ciphertext data and store that on disk
0291 instead. Another issue with storing the integrity info of the plaintext data is
0292 that it changes the on disk format depending on whether hardware inline
0293 encryption support is present or the kernel crypto API fallback is used (since
0294 if the fallback is used, the device will receive the integrity info of the
0295 ciphertext, not that of the plaintext).
0296
0297 Because there isn't any real hardware yet, it seems prudent to assume that
0298 hardware implementations might not implement both features together correctly,
0299 and disallow the combination for now. Whenever a device supports integrity, the
0300 kernel will pretend that the device does not support hardware inline encryption
0301 (by setting the blk_crypto_profile in the request_queue of the device to NULL).
0302 When the crypto API fallback is enabled, this means that all bios with and
0303 encryption context will use the fallback, and IO will complete as usual. When
0304 the fallback is disabled, a bio with an encryption context will be failed.