Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ==============================
0004 Network Filesystem Caching API
0005 ==============================
0006 
0007 Fscache provides an API by which a network filesystem can make use of local
0008 caching facilities.  The API is arranged around a number of principles:
0009 
0010  (1) A cache is logically organised into volumes and data storage objects
0011      within those volumes.
0012 
0013  (2) Volumes and data storage objects are represented by various types of
0014      cookie.
0015 
0016  (3) Cookies have keys that distinguish them from their peers.
0017 
0018  (4) Cookies have coherency data that allows a cache to determine if the
0019      cached data is still valid.
0020 
0021  (5) I/O is done asynchronously where possible.
0022 
0023 This API is used by::
0024 
0025         #include <linux/fscache.h>.
0026 
0027 .. This document contains the following sections:
0028 
0029          (1) Overview
0030          (2) Volume registration
0031          (3) Data file registration
0032          (4) Declaring a cookie to be in use
0033          (5) Resizing a data file (truncation)
0034          (6) Data I/O API
0035          (7) Data file coherency
0036          (8) Data file invalidation
0037          (9) Write back resource management
0038         (10) Caching of local modifications
0039         (11) Page release and invalidation
0040 
0041 
0042 Overview
0043 ========
0044 
0045 The fscache hierarchy is organised on two levels from a network filesystem's
0046 point of view.  The upper level represents "volumes" and the lower level
0047 represents "data storage objects".  These are represented by two types of
0048 cookie, hereafter referred to as "volume cookies" and "cookies".
0049 
0050 A network filesystem acquires a volume cookie for a volume using a volume key,
0051 which represents all the information that defines that volume (e.g. cell name
0052 or server address, volume ID or share name).  This must be rendered as a
0053 printable string that can be used as a directory name (ie. no '/' characters
0054 and shouldn't begin with a '.').  The maximum name length is one less than the
0055 maximum size of a filename component (allowing the cache backend one char for
0056 its own purposes).
0057 
0058 A filesystem would typically have a volume cookie for each superblock.
0059 
0060 The filesystem then acquires a cookie for each file within that volume using an
0061 object key.  Object keys are binary blobs and only need to be unique within
0062 their parent volume.  The cache backend is reponsible for rendering the binary
0063 blob into something it can use and may employ hash tables, trees or whatever to
0064 improve its ability to find an object.  This is transparent to the network
0065 filesystem.
0066 
0067 A filesystem would typically have a cookie for each inode, and would acquire it
0068 in iget and relinquish it when evicting the cookie.
0069 
0070 Once it has a cookie, the filesystem needs to mark the cookie as being in use.
0071 This causes fscache to send the cache backend off to look up/create resources
0072 for the cookie in the background, to check its coherency and, if necessary, to
0073 mark the object as being under modification.
0074 
0075 A filesystem would typically "use" the cookie in its file open routine and
0076 unuse it in file release and it needs to use the cookie around calls to
0077 truncate the cookie locally.  It *also* needs to use the cookie when the
0078 pagecache becomes dirty and unuse it when writeback is complete.  This is
0079 slightly tricky, and provision is made for it.
0080 
0081 When performing a read, write or resize on a cookie, the filesystem must first
0082 begin an operation.  This copies the resources into a holding struct and puts
0083 extra pins into the cache to stop cache withdrawal from tearing down the
0084 structures being used.  The actual operation can then be issued and conflicting
0085 invalidations can be detected upon completion.
0086 
0087 The filesystem is expected to use netfslib to access the cache, but that's not
0088 actually required and it can use the fscache I/O API directly.
0089 
0090 
0091 Volume Registration
0092 ===================
0093 
0094 The first step for a network filsystem is to acquire a volume cookie for the
0095 volume it wants to access::
0096 
0097         struct fscache_volume *
0098         fscache_acquire_volume(const char *volume_key,
0099                                const char *cache_name,
0100                                const void *coherency_data,
0101                                size_t coherency_len);
0102 
0103 This function creates a volume cookie with the specified volume key as its name
0104 and notes the coherency data.
0105 
0106 The volume key must be a printable string with no '/' characters in it.  It
0107 should begin with the name of the filesystem and should be no longer than 254
0108 characters.  It should uniquely represent the volume and will be matched with
0109 what's stored in the cache.
0110 
0111 The caller may also specify the name of the cache to use.  If specified,
0112 fscache will look up or create a cache cookie of that name and will use a cache
0113 of that name if it is online or comes online.  If no cache name is specified,
0114 it will use the first cache that comes to hand and set the name to that.
0115 
0116 The specified coherency data is stored in the cookie and will be matched
0117 against coherency data stored on disk.  The data pointer may be NULL if no data
0118 is provided.  If the coherency data doesn't match, the entire cache volume will
0119 be invalidated.
0120 
0121 This function can return errors such as EBUSY if the volume key is already in
0122 use by an acquired volume or ENOMEM if an allocation failure occured.  It may
0123 also return a NULL volume cookie if fscache is not enabled.  It is safe to
0124 pass a NULL cookie to any function that takes a volume cookie.  This will
0125 cause that function to do nothing.
0126 
0127 
0128 When the network filesystem has finished with a volume, it should relinquish it
0129 by calling::
0130 
0131         void fscache_relinquish_volume(struct fscache_volume *volume,
0132                                        const void *coherency_data,
0133                                        bool invalidate);
0134 
0135 This will cause the volume to be committed or removed, and if sealed the
0136 coherency data will be set to the value supplied.  The amount of coherency data
0137 must match the length specified when the volume was acquired.  Note that all
0138 data cookies obtained in this volume must be relinquished before the volume is
0139 relinquished.
0140 
0141 
0142 Data File Registration
0143 ======================
0144 
0145 Once it has a volume cookie, a network filesystem can use it to acquire a
0146 cookie for data storage::
0147 
0148         struct fscache_cookie *
0149         fscache_acquire_cookie(struct fscache_volume *volume,
0150                                u8 advice,
0151                                const void *index_key,
0152                                size_t index_key_len,
0153                                const void *aux_data,
0154                                size_t aux_data_len,
0155                                loff_t object_size)
0156 
0157 This creates the cookie in the volume using the specified index key.  The index
0158 key is a binary blob of the given length and must be unique for the volume.
0159 This is saved into the cookie.  There are no restrictions on the content, but
0160 its length shouldn't exceed about three quarters of the maximum filename length
0161 to allow for encoding.
0162 
0163 The caller should also pass in a piece of coherency data in aux_data.  A buffer
0164 of size aux_data_len will be allocated and the coherency data copied in.  It is
0165 assumed that the size is invariant over time.  The coherency data is used to
0166 check the validity of data in the cache.  Functions are provided by which the
0167 coherency data can be updated.
0168 
0169 The file size of the object being cached should also be provided.  This may be
0170 used to trim the data and will be stored with the coherency data.
0171 
0172 This function never returns an error, though it may return a NULL cookie on
0173 allocation failure or if fscache is not enabled.  It is safe to pass in a NULL
0174 volume cookie and pass the NULL cookie returned to any function that takes it.
0175 This will cause that function to do nothing.
0176 
0177 
0178 When the network filesystem has finished with a cookie, it should relinquish it
0179 by calling::
0180 
0181         void fscache_relinquish_cookie(struct fscache_cookie *cookie,
0182                                        bool retire);
0183 
0184 This will cause fscache to either commit the storage backing the cookie or
0185 delete it.
0186 
0187 
0188 Marking A Cookie In-Use
0189 =======================
0190 
0191 Once a cookie has been acquired by a network filesystem, the filesystem should
0192 tell fscache when it intends to use the cookie (typically done on file open)
0193 and should say when it has finished with it (typically on file close)::
0194 
0195         void fscache_use_cookie(struct fscache_cookie *cookie,
0196                                 bool will_modify);
0197         void fscache_unuse_cookie(struct fscache_cookie *cookie,
0198                                   const void *aux_data,
0199                                   const loff_t *object_size);
0200 
0201 The *use* function tells fscache that it will use the cookie and, additionally,
0202 indicate if the user is intending to modify the contents locally.  If not yet
0203 done, this will trigger the cache backend to go and gather the resources it
0204 needs to access/store data in the cache.  This is done in the background, and
0205 so may not be complete by the time the function returns.
0206 
0207 The *unuse* function indicates that a filesystem has finished using a cookie.
0208 It optionally updates the stored coherency data and object size and then
0209 decreases the in-use counter.  When the last user unuses the cookie, it is
0210 scheduled for garbage collection.  If not reused within a short time, the
0211 resources will be released to reduce system resource consumption.
0212 
0213 A cookie must be marked in-use before it can be accessed for read, write or
0214 resize - and an in-use mark must be kept whilst there is dirty data in the
0215 pagecache in order to avoid an oops due to trying to open a file during process
0216 exit.
0217 
0218 Note that in-use marks are cumulative.  For each time a cookie is marked
0219 in-use, it must be unused.
0220 
0221 
0222 Resizing A Data File (Truncation)
0223 =================================
0224 
0225 If a network filesystem file is resized locally by truncation, the following
0226 should be called to notify the cache::
0227 
0228         void fscache_resize_cookie(struct fscache_cookie *cookie,
0229                                    loff_t new_size);
0230 
0231 The caller must have first marked the cookie in-use.  The cookie and the new
0232 size are passed in and the cache is synchronously resized.  This is expected to
0233 be called from ``->setattr()`` inode operation under the inode lock.
0234 
0235 
0236 Data I/O API
0237 ============
0238 
0239 To do data I/O operations directly through a cookie, the following functions
0240 are available::
0241 
0242         int fscache_begin_read_operation(struct netfs_cache_resources *cres,
0243                                          struct fscache_cookie *cookie);
0244         int fscache_read(struct netfs_cache_resources *cres,
0245                          loff_t start_pos,
0246                          struct iov_iter *iter,
0247                          enum netfs_read_from_hole read_hole,
0248                          netfs_io_terminated_t term_func,
0249                          void *term_func_priv);
0250         int fscache_write(struct netfs_cache_resources *cres,
0251                           loff_t start_pos,
0252                           struct iov_iter *iter,
0253                           netfs_io_terminated_t term_func,
0254                           void *term_func_priv);
0255 
0256 The *begin* function sets up an operation, attaching the resources required to
0257 the cache resources block from the cookie.  Assuming it doesn't return an error
0258 (for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
0259 nothing), then one of the other two functions can be issued.
0260 
0261 The *read* and *write* functions initiate a direct-IO operation.  Both take the
0262 previously set up cache resources block, an indication of the start file
0263 position, and an I/O iterator that describes buffer and indicates the amount of
0264 data.
0265 
0266 The read function also takes a parameter to indicate how it should handle a
0267 partially populated region (a hole) in the disk content.  This may be to ignore
0268 it, skip over an initial hole and place zeros in the buffer or give an error.
0269 
0270 The read and write functions can be given an optional termination function that
0271 will be run on completion::
0272 
0273         typedef
0274         void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
0275                                       bool was_async);
0276 
0277 If a termination function is given, the operation will be run asynchronously
0278 and the termination function will be called upon completion.  If not given, the
0279 operation will be run synchronously.  Note that in the asynchronous case, it is
0280 possible for the operation to complete before the function returns.
0281 
0282 Both the read and write functions end the operation when they complete,
0283 detaching any pinned resources.
0284 
0285 The read operation will fail with ESTALE if invalidation occurred whilst the
0286 operation was ongoing.
0287 
0288 
0289 Data File Coherency
0290 ===================
0291 
0292 To request an update of the coherency data and file size on a cookie, the
0293 following should be called::
0294 
0295         void fscache_update_cookie(struct fscache_cookie *cookie,
0296                                    const void *aux_data,
0297                                    const loff_t *object_size);
0298 
0299 This will update the cookie's coherency data and/or file size.
0300 
0301 
0302 Data File Invalidation
0303 ======================
0304 
0305 Sometimes it will be necessary to invalidate an object that contains data.
0306 Typically this will be necessary when the server informs the network filesystem
0307 of a remote third-party change - at which point the filesystem has to throw
0308 away the state and cached data that it had for an file and reload from the
0309 server.
0310 
0311 To indicate that a cache object should be invalidated, the following should be
0312 called::
0313 
0314         void fscache_invalidate(struct fscache_cookie *cookie,
0315                                 const void *aux_data,
0316                                 loff_t size,
0317                                 unsigned int flags);
0318 
0319 This increases the invalidation counter in the cookie to cause outstanding
0320 reads to fail with -ESTALE, sets the coherency data and file size from the
0321 information supplied, blocks new I/O on the cookie and dispatches the cache to
0322 go and get rid of the old data.
0323 
0324 Invalidation runs asynchronously in a worker thread so that it doesn't block
0325 too much.
0326 
0327 
0328 Write-Back Resource Management
0329 ==============================
0330 
0331 To write data to the cache from network filesystem writeback, the cache
0332 resources required need to be pinned at the point the modification is made (for
0333 instance when the page is marked dirty) as it's not possible to open a file in
0334 a thread that's exiting.
0335 
0336 The following facilities are provided to manage this:
0337 
0338  * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
0339    in-use is held on the cookie for this inode.  It can only be changed if the
0340    the inode lock is held.
0341 
0342  * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
0343    struct that gets set if ``__writeback_single_inode()`` clears
0344    ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
0345 
0346 To support this, the following functions are provided::
0347 
0348         bool fscache_dirty_folio(struct address_space *mapping,
0349                                  struct folio *folio,
0350                                  struct fscache_cookie *cookie);
0351         void fscache_unpin_writeback(struct writeback_control *wbc,
0352                                      struct fscache_cookie *cookie);
0353         void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
0354                                            struct inode *inode,
0355                                            const void *aux);
0356 
0357 The *set* function is intended to be called from the filesystem's
0358 ``dirty_folio`` address space operation.  If ``I_PINNING_FSCACHE_WB`` is not
0359 set, it sets that flag and increments the use count on the cookie (the caller
0360 must already have called ``fscache_use_cookie()``).
0361 
0362 The *unpin* function is intended to be called from the filesystem's
0363 ``write_inode`` superblock operation.  It cleans up after writing by unusing
0364 the cookie if unpinned_fscache_wb is set in the writeback_control struct.
0365 
0366 The *clear* function is intended to be called from the netfs's ``evict_inode``
0367 superblock operation.  It must be called *after*
0368 ``truncate_inode_pages_final()``, but *before* ``clear_inode()``.  This cleans
0369 up any hanging ``I_PINNING_FSCACHE_WB``.  It also allows the coherency data to
0370 be updated.
0371 
0372 
0373 Caching of Local Modifications
0374 ==============================
0375 
0376 If a network filesystem has locally modified data that it wants to write to the
0377 cache, it needs to mark the pages to indicate that a write is in progress, and
0378 if the mark is already present, it needs to wait for it to be removed first
0379 (presumably due to an already in-progress operation).  This prevents multiple
0380 competing DIO writes to the same storage in the cache.
0381 
0382 Firstly, the netfs should determine if caching is available by doing something
0383 like::
0384 
0385         bool caching = fscache_cookie_enabled(cookie);
0386 
0387 If caching is to be attempted, pages should be waited for and then marked using
0388 the following functions provided by the netfs helper library::
0389 
0390         void set_page_fscache(struct page *page);
0391         void wait_on_page_fscache(struct page *page);
0392         int wait_on_page_fscache_killable(struct page *page);
0393 
0394 Once all the pages in the span are marked, the netfs can ask fscache to
0395 schedule a write of that region::
0396 
0397         void fscache_write_to_cache(struct fscache_cookie *cookie,
0398                                     struct address_space *mapping,
0399                                     loff_t start, size_t len, loff_t i_size,
0400                                     netfs_io_terminated_t term_func,
0401                                     void *term_func_priv,
0402                                     bool caching)
0403 
0404 And if an error occurs before that point is reached, the marks can be removed
0405 by calling::
0406 
0407         void fscache_clear_page_bits(struct address_space *mapping,
0408                                      loff_t start, size_t len,
0409                                      bool caching)
0410 
0411 In these functions, a pointer to the mapping to which the source pages are
0412 attached is passed in and start and len indicate the size of the region that's
0413 going to be written (it doesn't have to align to page boundaries necessarily,
0414 but it does have to align to DIO boundaries on the backing filesystem).  The
0415 caching parameter indicates if caching should be skipped, and if false, the
0416 functions do nothing.
0417 
0418 The write function takes some additional parameters: the cookie representing
0419 the cache object to be written to, i_size indicates the size of the netfs file
0420 and term_func indicates an optional completion function, to which
0421 term_func_priv will be passed, along with the error or amount written.
0422 
0423 Note that the write function will always run asynchronously and will unmark all
0424 the pages upon completion before calling term_func.
0425 
0426 
0427 Page Release and Invalidation
0428 =============================
0429 
0430 Fscache keeps track of whether we have any data in the cache yet for a cache
0431 object we've just created.  It knows it doesn't have to do any reading until it
0432 has done a write and then the page it wrote from has been released by the VM,
0433 after which it *has* to look in the cache.
0434 
0435 To inform fscache that a page might now be in the cache, the following function
0436 should be called from the ``release_folio`` address space op::
0437 
0438         void fscache_note_page_release(struct fscache_cookie *cookie);
0439 
0440 if the page has been released (ie. release_folio returned true).
0441 
0442 Page release and page invalidation should also wait for any mark left on the
0443 page to say that a DIO write is underway from that page::
0444 
0445         void wait_on_page_fscache(struct page *page);
0446         int wait_on_page_fscache_killable(struct page *page);
0447 
0448 
0449 API Function Reference
0450 ======================
0451 
0452 .. kernel-doc:: include/linux/fscache.h