0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ==============================
0004 Network Filesystem Caching API
0005 ==============================
0006
0007 Fscache provides an API by which a network filesystem can make use of local
0008 caching facilities. The API is arranged around a number of principles:
0009
0010 (1) A cache is logically organised into volumes and data storage objects
0011 within those volumes.
0012
0013 (2) Volumes and data storage objects are represented by various types of
0014 cookie.
0015
0016 (3) Cookies have keys that distinguish them from their peers.
0017
0018 (4) Cookies have coherency data that allows a cache to determine if the
0019 cached data is still valid.
0020
0021 (5) I/O is done asynchronously where possible.
0022
0023 This API is used by::
0024
0025 #include <linux/fscache.h>.
0026
0027 .. This document contains the following sections:
0028
0029 (1) Overview
0030 (2) Volume registration
0031 (3) Data file registration
0032 (4) Declaring a cookie to be in use
0033 (5) Resizing a data file (truncation)
0034 (6) Data I/O API
0035 (7) Data file coherency
0036 (8) Data file invalidation
0037 (9) Write back resource management
0038 (10) Caching of local modifications
0039 (11) Page release and invalidation
0040
0041
0042 Overview
0043 ========
0044
0045 The fscache hierarchy is organised on two levels from a network filesystem's
0046 point of view. The upper level represents "volumes" and the lower level
0047 represents "data storage objects". These are represented by two types of
0048 cookie, hereafter referred to as "volume cookies" and "cookies".
0049
0050 A network filesystem acquires a volume cookie for a volume using a volume key,
0051 which represents all the information that defines that volume (e.g. cell name
0052 or server address, volume ID or share name). This must be rendered as a
0053 printable string that can be used as a directory name (ie. no '/' characters
0054 and shouldn't begin with a '.'). The maximum name length is one less than the
0055 maximum size of a filename component (allowing the cache backend one char for
0056 its own purposes).
0057
0058 A filesystem would typically have a volume cookie for each superblock.
0059
0060 The filesystem then acquires a cookie for each file within that volume using an
0061 object key. Object keys are binary blobs and only need to be unique within
0062 their parent volume. The cache backend is reponsible for rendering the binary
0063 blob into something it can use and may employ hash tables, trees or whatever to
0064 improve its ability to find an object. This is transparent to the network
0065 filesystem.
0066
0067 A filesystem would typically have a cookie for each inode, and would acquire it
0068 in iget and relinquish it when evicting the cookie.
0069
0070 Once it has a cookie, the filesystem needs to mark the cookie as being in use.
0071 This causes fscache to send the cache backend off to look up/create resources
0072 for the cookie in the background, to check its coherency and, if necessary, to
0073 mark the object as being under modification.
0074
0075 A filesystem would typically "use" the cookie in its file open routine and
0076 unuse it in file release and it needs to use the cookie around calls to
0077 truncate the cookie locally. It *also* needs to use the cookie when the
0078 pagecache becomes dirty and unuse it when writeback is complete. This is
0079 slightly tricky, and provision is made for it.
0080
0081 When performing a read, write or resize on a cookie, the filesystem must first
0082 begin an operation. This copies the resources into a holding struct and puts
0083 extra pins into the cache to stop cache withdrawal from tearing down the
0084 structures being used. The actual operation can then be issued and conflicting
0085 invalidations can be detected upon completion.
0086
0087 The filesystem is expected to use netfslib to access the cache, but that's not
0088 actually required and it can use the fscache I/O API directly.
0089
0090
0091 Volume Registration
0092 ===================
0093
0094 The first step for a network filsystem is to acquire a volume cookie for the
0095 volume it wants to access::
0096
0097 struct fscache_volume *
0098 fscache_acquire_volume(const char *volume_key,
0099 const char *cache_name,
0100 const void *coherency_data,
0101 size_t coherency_len);
0102
0103 This function creates a volume cookie with the specified volume key as its name
0104 and notes the coherency data.
0105
0106 The volume key must be a printable string with no '/' characters in it. It
0107 should begin with the name of the filesystem and should be no longer than 254
0108 characters. It should uniquely represent the volume and will be matched with
0109 what's stored in the cache.
0110
0111 The caller may also specify the name of the cache to use. If specified,
0112 fscache will look up or create a cache cookie of that name and will use a cache
0113 of that name if it is online or comes online. If no cache name is specified,
0114 it will use the first cache that comes to hand and set the name to that.
0115
0116 The specified coherency data is stored in the cookie and will be matched
0117 against coherency data stored on disk. The data pointer may be NULL if no data
0118 is provided. If the coherency data doesn't match, the entire cache volume will
0119 be invalidated.
0120
0121 This function can return errors such as EBUSY if the volume key is already in
0122 use by an acquired volume or ENOMEM if an allocation failure occured. It may
0123 also return a NULL volume cookie if fscache is not enabled. It is safe to
0124 pass a NULL cookie to any function that takes a volume cookie. This will
0125 cause that function to do nothing.
0126
0127
0128 When the network filesystem has finished with a volume, it should relinquish it
0129 by calling::
0130
0131 void fscache_relinquish_volume(struct fscache_volume *volume,
0132 const void *coherency_data,
0133 bool invalidate);
0134
0135 This will cause the volume to be committed or removed, and if sealed the
0136 coherency data will be set to the value supplied. The amount of coherency data
0137 must match the length specified when the volume was acquired. Note that all
0138 data cookies obtained in this volume must be relinquished before the volume is
0139 relinquished.
0140
0141
0142 Data File Registration
0143 ======================
0144
0145 Once it has a volume cookie, a network filesystem can use it to acquire a
0146 cookie for data storage::
0147
0148 struct fscache_cookie *
0149 fscache_acquire_cookie(struct fscache_volume *volume,
0150 u8 advice,
0151 const void *index_key,
0152 size_t index_key_len,
0153 const void *aux_data,
0154 size_t aux_data_len,
0155 loff_t object_size)
0156
0157 This creates the cookie in the volume using the specified index key. The index
0158 key is a binary blob of the given length and must be unique for the volume.
0159 This is saved into the cookie. There are no restrictions on the content, but
0160 its length shouldn't exceed about three quarters of the maximum filename length
0161 to allow for encoding.
0162
0163 The caller should also pass in a piece of coherency data in aux_data. A buffer
0164 of size aux_data_len will be allocated and the coherency data copied in. It is
0165 assumed that the size is invariant over time. The coherency data is used to
0166 check the validity of data in the cache. Functions are provided by which the
0167 coherency data can be updated.
0168
0169 The file size of the object being cached should also be provided. This may be
0170 used to trim the data and will be stored with the coherency data.
0171
0172 This function never returns an error, though it may return a NULL cookie on
0173 allocation failure or if fscache is not enabled. It is safe to pass in a NULL
0174 volume cookie and pass the NULL cookie returned to any function that takes it.
0175 This will cause that function to do nothing.
0176
0177
0178 When the network filesystem has finished with a cookie, it should relinquish it
0179 by calling::
0180
0181 void fscache_relinquish_cookie(struct fscache_cookie *cookie,
0182 bool retire);
0183
0184 This will cause fscache to either commit the storage backing the cookie or
0185 delete it.
0186
0187
0188 Marking A Cookie In-Use
0189 =======================
0190
0191 Once a cookie has been acquired by a network filesystem, the filesystem should
0192 tell fscache when it intends to use the cookie (typically done on file open)
0193 and should say when it has finished with it (typically on file close)::
0194
0195 void fscache_use_cookie(struct fscache_cookie *cookie,
0196 bool will_modify);
0197 void fscache_unuse_cookie(struct fscache_cookie *cookie,
0198 const void *aux_data,
0199 const loff_t *object_size);
0200
0201 The *use* function tells fscache that it will use the cookie and, additionally,
0202 indicate if the user is intending to modify the contents locally. If not yet
0203 done, this will trigger the cache backend to go and gather the resources it
0204 needs to access/store data in the cache. This is done in the background, and
0205 so may not be complete by the time the function returns.
0206
0207 The *unuse* function indicates that a filesystem has finished using a cookie.
0208 It optionally updates the stored coherency data and object size and then
0209 decreases the in-use counter. When the last user unuses the cookie, it is
0210 scheduled for garbage collection. If not reused within a short time, the
0211 resources will be released to reduce system resource consumption.
0212
0213 A cookie must be marked in-use before it can be accessed for read, write or
0214 resize - and an in-use mark must be kept whilst there is dirty data in the
0215 pagecache in order to avoid an oops due to trying to open a file during process
0216 exit.
0217
0218 Note that in-use marks are cumulative. For each time a cookie is marked
0219 in-use, it must be unused.
0220
0221
0222 Resizing A Data File (Truncation)
0223 =================================
0224
0225 If a network filesystem file is resized locally by truncation, the following
0226 should be called to notify the cache::
0227
0228 void fscache_resize_cookie(struct fscache_cookie *cookie,
0229 loff_t new_size);
0230
0231 The caller must have first marked the cookie in-use. The cookie and the new
0232 size are passed in and the cache is synchronously resized. This is expected to
0233 be called from ``->setattr()`` inode operation under the inode lock.
0234
0235
0236 Data I/O API
0237 ============
0238
0239 To do data I/O operations directly through a cookie, the following functions
0240 are available::
0241
0242 int fscache_begin_read_operation(struct netfs_cache_resources *cres,
0243 struct fscache_cookie *cookie);
0244 int fscache_read(struct netfs_cache_resources *cres,
0245 loff_t start_pos,
0246 struct iov_iter *iter,
0247 enum netfs_read_from_hole read_hole,
0248 netfs_io_terminated_t term_func,
0249 void *term_func_priv);
0250 int fscache_write(struct netfs_cache_resources *cres,
0251 loff_t start_pos,
0252 struct iov_iter *iter,
0253 netfs_io_terminated_t term_func,
0254 void *term_func_priv);
0255
0256 The *begin* function sets up an operation, attaching the resources required to
0257 the cache resources block from the cookie. Assuming it doesn't return an error
0258 (for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
0259 nothing), then one of the other two functions can be issued.
0260
0261 The *read* and *write* functions initiate a direct-IO operation. Both take the
0262 previously set up cache resources block, an indication of the start file
0263 position, and an I/O iterator that describes buffer and indicates the amount of
0264 data.
0265
0266 The read function also takes a parameter to indicate how it should handle a
0267 partially populated region (a hole) in the disk content. This may be to ignore
0268 it, skip over an initial hole and place zeros in the buffer or give an error.
0269
0270 The read and write functions can be given an optional termination function that
0271 will be run on completion::
0272
0273 typedef
0274 void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
0275 bool was_async);
0276
0277 If a termination function is given, the operation will be run asynchronously
0278 and the termination function will be called upon completion. If not given, the
0279 operation will be run synchronously. Note that in the asynchronous case, it is
0280 possible for the operation to complete before the function returns.
0281
0282 Both the read and write functions end the operation when they complete,
0283 detaching any pinned resources.
0284
0285 The read operation will fail with ESTALE if invalidation occurred whilst the
0286 operation was ongoing.
0287
0288
0289 Data File Coherency
0290 ===================
0291
0292 To request an update of the coherency data and file size on a cookie, the
0293 following should be called::
0294
0295 void fscache_update_cookie(struct fscache_cookie *cookie,
0296 const void *aux_data,
0297 const loff_t *object_size);
0298
0299 This will update the cookie's coherency data and/or file size.
0300
0301
0302 Data File Invalidation
0303 ======================
0304
0305 Sometimes it will be necessary to invalidate an object that contains data.
0306 Typically this will be necessary when the server informs the network filesystem
0307 of a remote third-party change - at which point the filesystem has to throw
0308 away the state and cached data that it had for an file and reload from the
0309 server.
0310
0311 To indicate that a cache object should be invalidated, the following should be
0312 called::
0313
0314 void fscache_invalidate(struct fscache_cookie *cookie,
0315 const void *aux_data,
0316 loff_t size,
0317 unsigned int flags);
0318
0319 This increases the invalidation counter in the cookie to cause outstanding
0320 reads to fail with -ESTALE, sets the coherency data and file size from the
0321 information supplied, blocks new I/O on the cookie and dispatches the cache to
0322 go and get rid of the old data.
0323
0324 Invalidation runs asynchronously in a worker thread so that it doesn't block
0325 too much.
0326
0327
0328 Write-Back Resource Management
0329 ==============================
0330
0331 To write data to the cache from network filesystem writeback, the cache
0332 resources required need to be pinned at the point the modification is made (for
0333 instance when the page is marked dirty) as it's not possible to open a file in
0334 a thread that's exiting.
0335
0336 The following facilities are provided to manage this:
0337
0338 * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
0339 in-use is held on the cookie for this inode. It can only be changed if the
0340 the inode lock is held.
0341
0342 * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
0343 struct that gets set if ``__writeback_single_inode()`` clears
0344 ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
0345
0346 To support this, the following functions are provided::
0347
0348 bool fscache_dirty_folio(struct address_space *mapping,
0349 struct folio *folio,
0350 struct fscache_cookie *cookie);
0351 void fscache_unpin_writeback(struct writeback_control *wbc,
0352 struct fscache_cookie *cookie);
0353 void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
0354 struct inode *inode,
0355 const void *aux);
0356
0357 The *set* function is intended to be called from the filesystem's
0358 ``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
0359 set, it sets that flag and increments the use count on the cookie (the caller
0360 must already have called ``fscache_use_cookie()``).
0361
0362 The *unpin* function is intended to be called from the filesystem's
0363 ``write_inode`` superblock operation. It cleans up after writing by unusing
0364 the cookie if unpinned_fscache_wb is set in the writeback_control struct.
0365
0366 The *clear* function is intended to be called from the netfs's ``evict_inode``
0367 superblock operation. It must be called *after*
0368 ``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans
0369 up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to
0370 be updated.
0371
0372
0373 Caching of Local Modifications
0374 ==============================
0375
0376 If a network filesystem has locally modified data that it wants to write to the
0377 cache, it needs to mark the pages to indicate that a write is in progress, and
0378 if the mark is already present, it needs to wait for it to be removed first
0379 (presumably due to an already in-progress operation). This prevents multiple
0380 competing DIO writes to the same storage in the cache.
0381
0382 Firstly, the netfs should determine if caching is available by doing something
0383 like::
0384
0385 bool caching = fscache_cookie_enabled(cookie);
0386
0387 If caching is to be attempted, pages should be waited for and then marked using
0388 the following functions provided by the netfs helper library::
0389
0390 void set_page_fscache(struct page *page);
0391 void wait_on_page_fscache(struct page *page);
0392 int wait_on_page_fscache_killable(struct page *page);
0393
0394 Once all the pages in the span are marked, the netfs can ask fscache to
0395 schedule a write of that region::
0396
0397 void fscache_write_to_cache(struct fscache_cookie *cookie,
0398 struct address_space *mapping,
0399 loff_t start, size_t len, loff_t i_size,
0400 netfs_io_terminated_t term_func,
0401 void *term_func_priv,
0402 bool caching)
0403
0404 And if an error occurs before that point is reached, the marks can be removed
0405 by calling::
0406
0407 void fscache_clear_page_bits(struct address_space *mapping,
0408 loff_t start, size_t len,
0409 bool caching)
0410
0411 In these functions, a pointer to the mapping to which the source pages are
0412 attached is passed in and start and len indicate the size of the region that's
0413 going to be written (it doesn't have to align to page boundaries necessarily,
0414 but it does have to align to DIO boundaries on the backing filesystem). The
0415 caching parameter indicates if caching should be skipped, and if false, the
0416 functions do nothing.
0417
0418 The write function takes some additional parameters: the cookie representing
0419 the cache object to be written to, i_size indicates the size of the netfs file
0420 and term_func indicates an optional completion function, to which
0421 term_func_priv will be passed, along with the error or amount written.
0422
0423 Note that the write function will always run asynchronously and will unmark all
0424 the pages upon completion before calling term_func.
0425
0426
0427 Page Release and Invalidation
0428 =============================
0429
0430 Fscache keeps track of whether we have any data in the cache yet for a cache
0431 object we've just created. It knows it doesn't have to do any reading until it
0432 has done a write and then the page it wrote from has been released by the VM,
0433 after which it *has* to look in the cache.
0434
0435 To inform fscache that a page might now be in the cache, the following function
0436 should be called from the ``release_folio`` address space op::
0437
0438 void fscache_note_page_release(struct fscache_cookie *cookie);
0439
0440 if the page has been released (ie. release_folio returned true).
0441
0442 Page release and page invalidation should also wait for any mark left on the
0443 page to say that a DIO write is underway from that page::
0444
0445 void wait_on_page_fscache(struct page *page);
0446 int wait_on_page_fscache_killable(struct page *page);
0447
0448
0449 API Function Reference
0450 ======================
0451
0452 .. kernel-doc:: include/linux/fscache.h