Documentation/filesystems/netfs_library.rst

0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =================================
0004 Network Filesystem Helper Library
0005 =================================
0006
0007 .. Contents:
0008
0009  - Overview.
0010  - Per-inode context.
0011    - Inode context helper functions.
0012  - Buffered read helpers.
0013    - Read helper functions.
0014    - Read helper structures.
0015    - Read helper operations.
0016    - Read helper procedure.
0017    - Read helper cache API.
0018
0019
0020 Overview
0021 ========
0022
0023 The network filesystem helper library is a set of functions designed to aid a
0024 network filesystem in implementing VM/VFS operations.  For the moment, that
0025 just includes turning various VM buffered read operations into requests to read
0026 from the server.  The helper library, however, can also interpose other
0027 services, such as local caching or local data encryption.
0028
0029 Note that the library module doesn't link against local caching directly, so
0030 access must be provided by the netfs.
0031
0032
0033 Per-Inode Context
0034 =================
0035
0036 The network filesystem helper library needs a place to store a bit of state for
0037 its use on each netfs inode it is helping to manage.  To this end, a context
0038 structure is defined::
0039
0040         struct netfs_inode {
0041                 struct inode inode;
0042                 const struct netfs_request_ops *ops;
0043                 struct fscache_cookie *cache;
0044         };
0045
0046 A network filesystem that wants to use netfs lib must place one of these in its
0047 inode wrapper struct instead of the VFS ``struct inode``.  This can be done in
0048 a way similar to the following::
0049
0050         struct my_inode {
0051                 struct netfs_inode netfs; /* Netfslib context and vfs inode */
0052                 ...
0053         };
0054
0055 This allows netfslib to find its state by using ``container_of()`` from the
0056 inode pointer, thereby allowing the netfslib helper functions to be pointed to
0057 directly by the VFS/VM operation tables.
0058
0059 The structure contains the following fields:
0060
0061  * ``inode``
0062
0063    The VFS inode structure.
0064
0065  * ``ops``
0066
0067    The set of operations provided by the network filesystem to netfslib.
0068
0069  * ``cache``
0070
0071    Local caching cookie, or NULL if no caching is enabled.  This field does not
0072    exist if fscache is disabled.
0073
0074
0075 Inode Context Helper Functions
0076 ------------------------------
0077
0078 To help deal with the per-inode context, a number helper functions are
0079 provided.  Firstly, a function to perform basic initialisation on a context and
0080 set the operations table pointer::
0081
0082         void netfs_inode_init(struct netfs_inode *ctx,
0083                               const struct netfs_request_ops *ops);
0084
0085 then a function to cast from the VFS inode structure to the netfs context::
0086
0087         struct netfs_inode *netfs_node(struct inode *inode);
0088
0089 and finally, a function to get the cache cookie pointer from the context
0090 attached to an inode (or NULL if fscache is disabled)::
0091
0092         struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
0093
0094
0095 Buffered Read Helpers
0096 =====================
0097
0098 The library provides a set of read helpers that handle the ->read_folio(),
0099 ->readahead() and much of the ->write_begin() VM operations and translate them
0100 into a common call framework.
0101
0102 The following services are provided:
0103
0104  * Handle folios that span multiple pages.
0105
0106  * Insulate the netfs from VM interface changes.
0107
0108  * Allow the netfs to arbitrarily split reads up into pieces, even ones that
0109    don't match folio sizes or folio alignments and that may cross folios.
0110
0111  * Allow the netfs to expand a readahead request in both directions to meet its
0112    needs.
0113
0114  * Allow the netfs to partially fulfil a read, which will then be resubmitted.
0115
0116  * Handle local caching, allowing cached data and server-read data to be
0117    interleaved for a single request.
0118
0119  * Handle clearing of bufferage that aren't on the server.
0120
0121  * Handle retrying of reads that failed, switching reads from the cache to the
0122    server as necessary.
0123
0124  * In the future, this is a place that other services can be performed, such as
0125    local encryption of data to be stored remotely or in the cache.
0126
0127 From the network filesystem, the helpers require a table of operations.  This
0128 includes a mandatory method to issue a read operation along with a number of
0129 optional methods.
0130
0131
0132 Read Helper Functions
0133 ---------------------
0134
0135 Three read helpers are provided::
0136
0137         void netfs_readahead(struct readahead_control *ractl);
0138         int netfs_read_folio(struct file *file,
0139                              struct folio *folio);
0140         int netfs_write_begin(struct netfs_inode *ctx,
0141                               struct file *file,
0142                               struct address_space *mapping,
0143                               loff_t pos,
0144                               unsigned int len,
0145                               struct folio **_folio,
0146                               void **_fsdata);
0147
0148 Each corresponds to a VM address space operation.  These operations use the
0149 state in the per-inode context.
0150
0151 For ->readahead() and ->read_folio(), the network filesystem just point directly
0152 at the corresponding read helper; whereas for ->write_begin(), it may be a
0153 little more complicated as the network filesystem might want to flush
0154 conflicting writes or track dirty data and needs to put the acquired folio if
0155 an error occurs after calling the helper.
0156
0157 The helpers manage the read request, calling back into the network filesystem
0158 through the suppplied table of operations.  Waits will be performed as
0159 necessary before returning for helpers that are meant to be synchronous.
0160
0161 If an error occurs, the ->free_request() will be called to clean up the
0162 netfs_io_request struct allocated.  If some parts of the request are in
0163 progress when an error occurs, the request will get partially completed if
0164 sufficient data is read.
0165
0166 Additionally, there is::
0167
0168   * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
0169                                  ssize_t transferred_or_error,
0170                                  bool was_async);
0171
0172 which should be called to complete a read subrequest.  This is given the number
0173 of bytes transferred or a negative error code, plus a flag indicating whether
0174 the operation was asynchronous (ie. whether the follow-on processing can be
0175 done in the current context, given this may involve sleeping).
0176
0177
0178 Read Helper Structures
0179 ----------------------
0180
0181 The read helpers make use of a couple of structures to maintain the state of
0182 the read.  The first is a structure that manages a read request as a whole::
0183
0184         struct netfs_io_request {
0185                 struct inode            *inode;
0186                 struct address_space    *mapping;
0187                 struct netfs_cache_resources cache_resources;
0188                 void                    *netfs_priv;
0189                 loff_t                  start;
0190                 size_t                  len;
0191                 loff_t                  i_size;
0192                 const struct netfs_request_ops *netfs_ops;
0193                 unsigned int            debug_id;
0194                 ...
0195         };
0196
0197 The above fields are the ones the netfs can use.  They are:
0198
0199  * ``inode``
0200  * ``mapping``
0201
0202    The inode and the address space of the file being read from.  The mapping
0203    may or may not point to inode->i_data.
0204
0205  * ``cache_resources``
0206
0207    Resources for the local cache to use, if present.
0208
0209  * ``netfs_priv``
0210
0211    The network filesystem's private data.  The value for this can be passed in
0212    to the helper functions or set during the request.
0213
0214  * ``start``
0215  * ``len``
0216
0217    The file position of the start of the read request and the length.  These
0218    may be altered by the ->expand_readahead() op.
0219
0220  * ``i_size``
0221
0222    The size of the file at the start of the request.
0223
0224  * ``netfs_ops``
0225
0226    A pointer to the operation table.  The value for this is passed into the
0227    helper functions.
0228
0229  * ``debug_id``
0230
0231    A number allocated to this operation that can be displayed in trace lines
0232    for reference.
0233
0234
0235 The second structure is used to manage individual slices of the overall read
0236 request::
0237
0238         struct netfs_io_subrequest {
0239                 struct netfs_io_request *rreq;
0240                 loff_t                  start;
0241                 size_t                  len;
0242                 size_t                  transferred;
0243                 unsigned long           flags;
0244                 unsigned short          debug_index;
0245                 ...
0246         };
0247
0248 Each subrequest is expected to access a single source, though the helpers will
0249 handle falling back from one source type to another.  The members are:
0250
0251  * ``rreq``
0252
0253    A pointer to the read request.
0254
0255  * ``start``
0256  * ``len``
0257
0258    The file position of the start of this slice of the read request and the
0259    length.
0260
0261  * ``transferred``
0262
0263    The amount of data transferred so far of the length of this slice.  The
0264    network filesystem or cache should start the operation this far into the
0265    slice.  If a short read occurs, the helpers will call again, having updated
0266    this to reflect the amount read so far.
0267
0268  * ``flags``
0269
0270    Flags pertaining to the read.  There are two of interest to the filesystem
0271    or cache:
0272
0273    * ``NETFS_SREQ_CLEAR_TAIL``
0274
0275      This can be set to indicate that the remainder of the slice, from
0276      transferred to len, should be cleared.
0277
0278    * ``NETFS_SREQ_SEEK_DATA_READ``
0279
0280      This is a hint to the cache that it might want to try skipping ahead to
0281      the next data (ie. using SEEK_DATA).
0282
0283  * ``debug_index``
0284
0285    A number allocated to this slice that can be displayed in trace lines for
0286    reference.
0287
0288
0289 Read Helper Operations
0290 ----------------------
0291
0292 The network filesystem must provide the read helpers with a table of operations
0293 through which it can issue requests and negotiate::
0294
0295         struct netfs_request_ops {
0296                 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
0297                 void (*free_request)(struct netfs_io_request *rreq);
0298                 int (*begin_cache_operation)(struct netfs_io_request *rreq);
0299                 void (*expand_readahead)(struct netfs_io_request *rreq);
0300                 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
0301                 void (*issue_read)(struct netfs_io_subrequest *subreq);
0302                 bool (*is_still_valid)(struct netfs_io_request *rreq);
0303                 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
0304                                          struct folio **foliop, void **_fsdata);
0305                 void (*done)(struct netfs_io_request *rreq);
0306         };
0307
0308 The operations are as follows:
0309
0310  * ``init_request()``
0311
0312    [Optional] This is called to initialise the request structure.  It is given
0313    the file for reference.
0314
0315  * ``free_request()``
0316
0317    [Optional] This is called as the request is being deallocated so that the
0318    filesystem can clean up any state it has attached there.
0319
0320  * ``begin_cache_operation()``
0321
0322    [Optional] This is called to ask the network filesystem to call into the
0323    cache (if present) to initialise the caching state for this read.  The netfs
0324    library module cannot access the cache directly, so the cache should call
0325    something like fscache_begin_read_operation() to do this.
0326
0327    The cache gets to store its state in ->cache_resources and must set a table
0328    of operations of its own there (though of a different type).
0329
0330    This should return 0 on success and an error code otherwise.  If an error is
0331    reported, the operation may proceed anyway, just without local caching (only
0332    out of memory and interruption errors cause failure here).
0333
0334  * ``expand_readahead()``
0335
0336    [Optional] This is called to allow the filesystem to expand the size of a
0337    readahead read request.  The filesystem gets to expand the request in both
0338    directions, though it's not permitted to reduce it as the numbers may
0339    represent an allocation already made.  If local caching is enabled, it gets
0340    to expand the request first.
0341
0342    Expansion is communicated by changing ->start and ->len in the request
0343    structure.  Note that if any change is made, ->len must be increased by at
0344    least as much as ->start is reduced.
0345
0346  * ``clamp_length()``
0347
0348    [Optional] This is called to allow the filesystem to reduce the size of a
0349    subrequest.  The filesystem can use this, for example, to chop up a request
0350    that has to be split across multiple servers or to put multiple reads in
0351    flight.
0352
0353    This should return 0 on success and an error code on error.
0354
0355  * ``issue_read()``
0356
0357    [Required] The helpers use this to dispatch a subrequest to the server for
0358    reading.  In the subrequest, ->start, ->len and ->transferred indicate what
0359    data should be read from the server.
0360
0361    There is no return value; the netfs_subreq_terminated() function should be
0362    called to indicate whether or not the operation succeeded and how much data
0363    it transferred.  The filesystem also should not deal with setting folios
0364    uptodate, unlocking them or dropping their refs - the helpers need to deal
0365    with this as they have to coordinate with copying to the local cache.
0366
0367    Note that the helpers have the folios locked, but not pinned.  It is
0368    possible to use the ITER_XARRAY iov iterator to refer to the range of the
0369    inode that is being operated upon without the need to allocate large bvec
0370    tables.
0371
0372  * ``is_still_valid()``
0373
0374    [Optional] This is called to find out if the data just read from the local
0375    cache is still valid.  It should return true if it is still valid and false
0376    if not.  If it's not still valid, it will be reread from the server.
0377
0378  * ``check_write_begin()``
0379
0380    [Optional] This is called from the netfs_write_begin() helper once it has
0381    allocated/grabbed the folio to be modified to allow the filesystem to flush
0382    conflicting state before allowing it to be modified.
0383
0384    It may unlock and discard the folio it was given and set the caller's folio
0385    pointer to NULL.  It should return 0 if everything is now fine (``*foliop``
0386    left set) or the op should be retried (``*foliop`` cleared) and any other
0387    error code to abort the operation.
0388
0389  * ``done``
0390
0391    [Optional] This is called after the folios in the request have all been
0392    unlocked (and marked uptodate if applicable).
0393
0394
0395
0396 Read Helper Procedure
0397 ---------------------
0398
0399 The read helpers work by the following general procedure:
0400
0401  * Set up the request.
0402
0403  * For readahead, allow the local cache and then the network filesystem to
0404    propose expansions to the read request.  This is then proposed to the VM.
0405    If the VM cannot fully perform the expansion, a partially expanded read will
0406    be performed, though this may not get written to the cache in its entirety.
0407
0408  * Loop around slicing chunks off of the request to form subrequests:
0409
0410    * If a local cache is present, it gets to do the slicing, otherwise the
0411      helpers just try to generate maximal slices.
0412
0413    * The network filesystem gets to clamp the size of each slice if it is to be
0414      the source.  This allows rsize and chunking to be implemented.
0415
0416    * The helpers issue a read from the cache or a read from the server or just
0417      clears the slice as appropriate.
0418
0419    * The next slice begins at the end of the last one.
0420
0421    * As slices finish being read, they terminate.
0422
0423  * When all the subrequests have terminated, the subrequests are assessed and
0424    any that are short or have failed are reissued:
0425
0426    * Failed cache requests are issued against the server instead.
0427
0428    * Failed server requests just fail.
0429
0430    * Short reads against either source will be reissued against that source
0431      provided they have transferred some more data:
0432
0433      * The cache may need to skip holes that it can't do DIO from.
0434
0435      * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
0436        end of the slice instead of reissuing.
0437
0438  * Once the data is read, the folios that have been fully read/cleared:
0439
0440    * Will be marked uptodate.
0441
0442    * If a cache is present, will be marked with PG_fscache.
0443
0444    * Unlocked
0445
0446  * Any folios that need writing to the cache will then have DIO writes issued.
0447
0448  * Synchronous operations will wait for reading to be complete.
0449
0450  * Writes to the cache will proceed asynchronously and the folios will have the
0451    PG_fscache mark removed when that completes.
0452
0453  * The request structures will be cleaned up when everything has completed.
0454
0455
0456 Read Helper Cache API
0457 ---------------------
0458
0459 When implementing a local cache to be used by the read helpers, two things are
0460 required: some way for the network filesystem to initialise the caching for a
0461 read request and a table of operations for the helpers to call.
0462
0463 The network filesystem's ->begin_cache_operation() method is called to set up a
0464 cache and this must call into the cache to do the work.  If using fscache, for
0465 example, the cache would call::
0466
0467         int fscache_begin_read_operation(struct netfs_io_request *rreq,
0468                                          struct fscache_cookie *cookie);
0469
0470 passing in the request pointer and the cookie corresponding to the file.
0471
0472 The netfs_io_request object contains a place for the cache to hang its
0473 state::
0474
0475         struct netfs_cache_resources {
0476                 const struct netfs_cache_ops    *ops;
0477                 void                            *cache_priv;
0478                 void                            *cache_priv2;
0479         };
0480
0481 This contains an operations table pointer and two private pointers.  The
0482 operation table looks like the following::
0483
0484         struct netfs_cache_ops {
0485                 void (*end_operation)(struct netfs_cache_resources *cres);
0486
0487                 void (*expand_readahead)(struct netfs_cache_resources *cres,
0488                                          loff_t *_start, size_t *_len, loff_t i_size);
0489
0490                 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
0491                                                        loff_t i_size);
0492
0493                 int (*read)(struct netfs_cache_resources *cres,
0494                             loff_t start_pos,
0495                             struct iov_iter *iter,
0496                             bool seek_data,
0497                             netfs_io_terminated_t term_func,
0498                             void *term_func_priv);
0499
0500                 int (*prepare_write)(struct netfs_cache_resources *cres,
0501                                      loff_t *_start, size_t *_len, loff_t i_size,
0502                                      bool no_space_allocated_yet);
0503
0504                 int (*write)(struct netfs_cache_resources *cres,
0505                              loff_t start_pos,
0506                              struct iov_iter *iter,
0507                              netfs_io_terminated_t term_func,
0508                              void *term_func_priv);
0509
0510                 int (*query_occupancy)(struct netfs_cache_resources *cres,
0511                                        loff_t start, size_t len, size_t granularity,
0512                                        loff_t *_data_start, size_t *_data_len);
0513         };
0514
0515 With a termination handler function pointer::
0516
0517         typedef void (*netfs_io_terminated_t)(void *priv,
0518                                               ssize_t transferred_or_error,
0519                                               bool was_async);
0520
0521 The methods defined in the table are:
0522
0523  * ``end_operation()``
0524
0525    [Required] Called to clean up the resources at the end of the read request.
0526
0527  * ``expand_readahead()``
0528
0529    [Optional] Called at the beginning of a netfs_readahead() operation to allow
0530    the cache to expand a request in either direction.  This allows the cache to
0531    size the request appropriately for the cache granularity.
0532
0533    The function is passed poiners to the start and length in its parameters,
0534    plus the size of the file for reference, and adjusts the start and length
0535    appropriately.  It should return one of:
0536
0537    * ``NETFS_FILL_WITH_ZEROES``
0538    * ``NETFS_DOWNLOAD_FROM_SERVER``
0539    * ``NETFS_READ_FROM_CACHE``
0540    * ``NETFS_INVALID_READ``
0541
0542    to indicate whether the slice should just be cleared or whether it should be
0543    downloaded from the server or read from the cache - or whether slicing
0544    should be given up at the current point.
0545
0546  * ``prepare_read()``
0547
0548    [Required] Called to configure the next slice of a request.  ->start and
0549    ->len in the subrequest indicate where and how big the next slice can be;
0550    the cache gets to reduce the length to match its granularity requirements.
0551
0552  * ``read()``
0553
0554    [Required] Called to read from the cache.  The start file offset is given
0555    along with an iterator to read to, which gives the length also.  It can be
0556    given a hint requesting that it seek forward from that start position for
0557    data.
0558
0559    Also provided is a pointer to a termination handler function and private
0560    data to pass to that function.  The termination function should be called
0561    with the number of bytes transferred or an error code, plus a flag
0562    indicating whether the termination is definitely happening in the caller's
0563    context.
0564
0565  * ``prepare_write()``
0566
0567    [Required] Called to prepare a write to the cache to take place.  This
0568    involves checking to see whether the cache has sufficient space to honour
0569    the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
0570    region can be shrunk or it can be expanded to a page boundary either way as
0571    necessary to align for direct I/O.  i_size holds the size of the object and
0572    is provided for reference.  no_space_allocated_yet is set to true if the
0573    caller is certain that no data has been written to that region - for example
0574    if it tried to do a read from there already.
0575
0576  * ``write()``
0577
0578    [Required] Called to write to the cache.  The start file offset is given
0579    along with an iterator to write from, which gives the length also.
0580
0581    Also provided is a pointer to a termination handler function and private
0582    data to pass to that function.  The termination function should be called
0583    with the number of bytes transferred or an error code, plus a flag
0584    indicating whether the termination is definitely happening in the caller's
0585    context.
0586
0587  * ``query_occupancy()``
0588
0589    [Required] Called to find out where the next piece of data is within a
0590    particular region of the cache.  The start and length of the region to be
0591    queried are passed in, along with the granularity to which the answer needs
0592    to be aligned.  The function passes back the start and length of the data,
0593    if any, available within that region.  Note that there may be a hole at the
0594    front.
0595
0596    It returns 0 if some data was found, -ENODATA if there was no usable data
0597    within the region or -ENOBUFS if there is no caching on this file.
0598
0599 Note that these methods are passed a pointer to the cache resource structure,
0600 not the read request structure as they could be used in other situations where
0601 there isn't a read request structure as well, such as writing dirty data to the
0602 cache.
0603
0604
0605 API Function Reference
0606 ======================
0607
0608 .. kernel-doc:: include/linux/netfs.h
0609 .. kernel-doc:: fs/netfs/buffered_read.c
0610 .. kernel-doc:: fs/netfs/io.c