Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =================================
0004 Network Filesystem Helper Library
0005 =================================
0006 
0007 .. Contents:
0008 
0009  - Overview.
0010  - Per-inode context.
0011    - Inode context helper functions.
0012  - Buffered read helpers.
0013    - Read helper functions.
0014    - Read helper structures.
0015    - Read helper operations.
0016    - Read helper procedure.
0017    - Read helper cache API.
0018 
0019 
0020 Overview
0021 ========
0022 
0023 The network filesystem helper library is a set of functions designed to aid a
0024 network filesystem in implementing VM/VFS operations.  For the moment, that
0025 just includes turning various VM buffered read operations into requests to read
0026 from the server.  The helper library, however, can also interpose other
0027 services, such as local caching or local data encryption.
0028 
0029 Note that the library module doesn't link against local caching directly, so
0030 access must be provided by the netfs.
0031 
0032 
0033 Per-Inode Context
0034 =================
0035 
0036 The network filesystem helper library needs a place to store a bit of state for
0037 its use on each netfs inode it is helping to manage.  To this end, a context
0038 structure is defined::
0039 
0040         struct netfs_inode {
0041                 struct inode inode;
0042                 const struct netfs_request_ops *ops;
0043                 struct fscache_cookie *cache;
0044         };
0045 
0046 A network filesystem that wants to use netfs lib must place one of these in its
0047 inode wrapper struct instead of the VFS ``struct inode``.  This can be done in
0048 a way similar to the following::
0049 
0050         struct my_inode {
0051                 struct netfs_inode netfs; /* Netfslib context and vfs inode */
0052                 ...
0053         };
0054 
0055 This allows netfslib to find its state by using ``container_of()`` from the
0056 inode pointer, thereby allowing the netfslib helper functions to be pointed to
0057 directly by the VFS/VM operation tables.
0058 
0059 The structure contains the following fields:
0060 
0061  * ``inode``
0062 
0063    The VFS inode structure.
0064 
0065  * ``ops``
0066 
0067    The set of operations provided by the network filesystem to netfslib.
0068 
0069  * ``cache``
0070 
0071    Local caching cookie, or NULL if no caching is enabled.  This field does not
0072    exist if fscache is disabled.
0073 
0074 
0075 Inode Context Helper Functions
0076 ------------------------------
0077 
0078 To help deal with the per-inode context, a number helper functions are
0079 provided.  Firstly, a function to perform basic initialisation on a context and
0080 set the operations table pointer::
0081 
0082         void netfs_inode_init(struct netfs_inode *ctx,
0083                               const struct netfs_request_ops *ops);
0084 
0085 then a function to cast from the VFS inode structure to the netfs context::
0086 
0087         struct netfs_inode *netfs_node(struct inode *inode);
0088 
0089 and finally, a function to get the cache cookie pointer from the context
0090 attached to an inode (or NULL if fscache is disabled)::
0091 
0092         struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
0093 
0094 
0095 Buffered Read Helpers
0096 =====================
0097 
0098 The library provides a set of read helpers that handle the ->read_folio(),
0099 ->readahead() and much of the ->write_begin() VM operations and translate them
0100 into a common call framework.
0101 
0102 The following services are provided:
0103 
0104  * Handle folios that span multiple pages.
0105 
0106  * Insulate the netfs from VM interface changes.
0107 
0108  * Allow the netfs to arbitrarily split reads up into pieces, even ones that
0109    don't match folio sizes or folio alignments and that may cross folios.
0110 
0111  * Allow the netfs to expand a readahead request in both directions to meet its
0112    needs.
0113 
0114  * Allow the netfs to partially fulfil a read, which will then be resubmitted.
0115 
0116  * Handle local caching, allowing cached data and server-read data to be
0117    interleaved for a single request.
0118 
0119  * Handle clearing of bufferage that aren't on the server.
0120 
0121  * Handle retrying of reads that failed, switching reads from the cache to the
0122    server as necessary.
0123 
0124  * In the future, this is a place that other services can be performed, such as
0125    local encryption of data to be stored remotely or in the cache.
0126 
0127 From the network filesystem, the helpers require a table of operations.  This
0128 includes a mandatory method to issue a read operation along with a number of
0129 optional methods.
0130 
0131 
0132 Read Helper Functions
0133 ---------------------
0134 
0135 Three read helpers are provided::
0136 
0137         void netfs_readahead(struct readahead_control *ractl);
0138         int netfs_read_folio(struct file *file,
0139                              struct folio *folio);
0140         int netfs_write_begin(struct netfs_inode *ctx,
0141                               struct file *file,
0142                               struct address_space *mapping,
0143                               loff_t pos,
0144                               unsigned int len,
0145                               struct folio **_folio,
0146                               void **_fsdata);
0147 
0148 Each corresponds to a VM address space operation.  These operations use the
0149 state in the per-inode context.
0150 
0151 For ->readahead() and ->read_folio(), the network filesystem just point directly
0152 at the corresponding read helper; whereas for ->write_begin(), it may be a
0153 little more complicated as the network filesystem might want to flush
0154 conflicting writes or track dirty data and needs to put the acquired folio if
0155 an error occurs after calling the helper.
0156 
0157 The helpers manage the read request, calling back into the network filesystem
0158 through the suppplied table of operations.  Waits will be performed as
0159 necessary before returning for helpers that are meant to be synchronous.
0160 
0161 If an error occurs, the ->free_request() will be called to clean up the
0162 netfs_io_request struct allocated.  If some parts of the request are in
0163 progress when an error occurs, the request will get partially completed if
0164 sufficient data is read.
0165 
0166 Additionally, there is::
0167 
0168   * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
0169                                  ssize_t transferred_or_error,
0170                                  bool was_async);
0171 
0172 which should be called to complete a read subrequest.  This is given the number
0173 of bytes transferred or a negative error code, plus a flag indicating whether
0174 the operation was asynchronous (ie. whether the follow-on processing can be
0175 done in the current context, given this may involve sleeping).
0176 
0177 
0178 Read Helper Structures
0179 ----------------------
0180 
0181 The read helpers make use of a couple of structures to maintain the state of
0182 the read.  The first is a structure that manages a read request as a whole::
0183 
0184         struct netfs_io_request {
0185                 struct inode            *inode;
0186                 struct address_space    *mapping;
0187                 struct netfs_cache_resources cache_resources;
0188                 void                    *netfs_priv;
0189                 loff_t                  start;
0190                 size_t                  len;
0191                 loff_t                  i_size;
0192                 const struct netfs_request_ops *netfs_ops;
0193                 unsigned int            debug_id;
0194                 ...
0195         };
0196 
0197 The above fields are the ones the netfs can use.  They are:
0198 
0199  * ``inode``
0200  * ``mapping``
0201 
0202    The inode and the address space of the file being read from.  The mapping
0203    may or may not point to inode->i_data.
0204 
0205  * ``cache_resources``
0206 
0207    Resources for the local cache to use, if present.
0208 
0209  * ``netfs_priv``
0210 
0211    The network filesystem's private data.  The value for this can be passed in
0212    to the helper functions or set during the request.
0213 
0214  * ``start``
0215  * ``len``
0216 
0217    The file position of the start of the read request and the length.  These
0218    may be altered by the ->expand_readahead() op.
0219 
0220  * ``i_size``
0221 
0222    The size of the file at the start of the request.
0223 
0224  * ``netfs_ops``
0225 
0226    A pointer to the operation table.  The value for this is passed into the
0227    helper functions.
0228 
0229  * ``debug_id``
0230 
0231    A number allocated to this operation that can be displayed in trace lines
0232    for reference.
0233 
0234 
0235 The second structure is used to manage individual slices of the overall read
0236 request::
0237 
0238         struct netfs_io_subrequest {
0239                 struct netfs_io_request *rreq;
0240                 loff_t                  start;
0241                 size_t                  len;
0242                 size_t                  transferred;
0243                 unsigned long           flags;
0244                 unsigned short          debug_index;
0245                 ...
0246         };
0247 
0248 Each subrequest is expected to access a single source, though the helpers will
0249 handle falling back from one source type to another.  The members are:
0250 
0251  * ``rreq``
0252 
0253    A pointer to the read request.
0254 
0255  * ``start``
0256  * ``len``
0257 
0258    The file position of the start of this slice of the read request and the
0259    length.
0260 
0261  * ``transferred``
0262 
0263    The amount of data transferred so far of the length of this slice.  The
0264    network filesystem or cache should start the operation this far into the
0265    slice.  If a short read occurs, the helpers will call again, having updated
0266    this to reflect the amount read so far.
0267 
0268  * ``flags``
0269 
0270    Flags pertaining to the read.  There are two of interest to the filesystem
0271    or cache:
0272 
0273    * ``NETFS_SREQ_CLEAR_TAIL``
0274 
0275      This can be set to indicate that the remainder of the slice, from
0276      transferred to len, should be cleared.
0277 
0278    * ``NETFS_SREQ_SEEK_DATA_READ``
0279 
0280      This is a hint to the cache that it might want to try skipping ahead to
0281      the next data (ie. using SEEK_DATA).
0282 
0283  * ``debug_index``
0284 
0285    A number allocated to this slice that can be displayed in trace lines for
0286    reference.
0287 
0288 
0289 Read Helper Operations
0290 ----------------------
0291 
0292 The network filesystem must provide the read helpers with a table of operations
0293 through which it can issue requests and negotiate::
0294 
0295         struct netfs_request_ops {
0296                 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
0297                 void (*free_request)(struct netfs_io_request *rreq);
0298                 int (*begin_cache_operation)(struct netfs_io_request *rreq);
0299                 void (*expand_readahead)(struct netfs_io_request *rreq);
0300                 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
0301                 void (*issue_read)(struct netfs_io_subrequest *subreq);
0302                 bool (*is_still_valid)(struct netfs_io_request *rreq);
0303                 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
0304                                          struct folio **foliop, void **_fsdata);
0305                 void (*done)(struct netfs_io_request *rreq);
0306         };
0307 
0308 The operations are as follows:
0309 
0310  * ``init_request()``
0311 
0312    [Optional] This is called to initialise the request structure.  It is given
0313    the file for reference.
0314 
0315  * ``free_request()``
0316 
0317    [Optional] This is called as the request is being deallocated so that the
0318    filesystem can clean up any state it has attached there.
0319 
0320  * ``begin_cache_operation()``
0321 
0322    [Optional] This is called to ask the network filesystem to call into the
0323    cache (if present) to initialise the caching state for this read.  The netfs
0324    library module cannot access the cache directly, so the cache should call
0325    something like fscache_begin_read_operation() to do this.
0326 
0327    The cache gets to store its state in ->cache_resources and must set a table
0328    of operations of its own there (though of a different type).
0329 
0330    This should return 0 on success and an error code otherwise.  If an error is
0331    reported, the operation may proceed anyway, just without local caching (only
0332    out of memory and interruption errors cause failure here).
0333 
0334  * ``expand_readahead()``
0335 
0336    [Optional] This is called to allow the filesystem to expand the size of a
0337    readahead read request.  The filesystem gets to expand the request in both
0338    directions, though it's not permitted to reduce it as the numbers may
0339    represent an allocation already made.  If local caching is enabled, it gets
0340    to expand the request first.
0341 
0342    Expansion is communicated by changing ->start and ->len in the request
0343    structure.  Note that if any change is made, ->len must be increased by at
0344    least as much as ->start is reduced.
0345 
0346  * ``clamp_length()``
0347 
0348    [Optional] This is called to allow the filesystem to reduce the size of a
0349    subrequest.  The filesystem can use this, for example, to chop up a request
0350    that has to be split across multiple servers or to put multiple reads in
0351    flight.
0352 
0353    This should return 0 on success and an error code on error.
0354 
0355  * ``issue_read()``
0356 
0357    [Required] The helpers use this to dispatch a subrequest to the server for
0358    reading.  In the subrequest, ->start, ->len and ->transferred indicate what
0359    data should be read from the server.
0360 
0361    There is no return value; the netfs_subreq_terminated() function should be
0362    called to indicate whether or not the operation succeeded and how much data
0363    it transferred.  The filesystem also should not deal with setting folios
0364    uptodate, unlocking them or dropping their refs - the helpers need to deal
0365    with this as they have to coordinate with copying to the local cache.
0366 
0367    Note that the helpers have the folios locked, but not pinned.  It is
0368    possible to use the ITER_XARRAY iov iterator to refer to the range of the
0369    inode that is being operated upon without the need to allocate large bvec
0370    tables.
0371 
0372  * ``is_still_valid()``
0373 
0374    [Optional] This is called to find out if the data just read from the local
0375    cache is still valid.  It should return true if it is still valid and false
0376    if not.  If it's not still valid, it will be reread from the server.
0377 
0378  * ``check_write_begin()``
0379 
0380    [Optional] This is called from the netfs_write_begin() helper once it has
0381    allocated/grabbed the folio to be modified to allow the filesystem to flush
0382    conflicting state before allowing it to be modified.
0383 
0384    It may unlock and discard the folio it was given and set the caller's folio
0385    pointer to NULL.  It should return 0 if everything is now fine (``*foliop``
0386    left set) or the op should be retried (``*foliop`` cleared) and any other
0387    error code to abort the operation.
0388 
0389  * ``done``
0390 
0391    [Optional] This is called after the folios in the request have all been
0392    unlocked (and marked uptodate if applicable).
0393 
0394 
0395 
0396 Read Helper Procedure
0397 ---------------------
0398 
0399 The read helpers work by the following general procedure:
0400 
0401  * Set up the request.
0402 
0403  * For readahead, allow the local cache and then the network filesystem to
0404    propose expansions to the read request.  This is then proposed to the VM.
0405    If the VM cannot fully perform the expansion, a partially expanded read will
0406    be performed, though this may not get written to the cache in its entirety.
0407 
0408  * Loop around slicing chunks off of the request to form subrequests:
0409 
0410    * If a local cache is present, it gets to do the slicing, otherwise the
0411      helpers just try to generate maximal slices.
0412 
0413    * The network filesystem gets to clamp the size of each slice if it is to be
0414      the source.  This allows rsize and chunking to be implemented.
0415 
0416    * The helpers issue a read from the cache or a read from the server or just
0417      clears the slice as appropriate.
0418 
0419    * The next slice begins at the end of the last one.
0420 
0421    * As slices finish being read, they terminate.
0422 
0423  * When all the subrequests have terminated, the subrequests are assessed and
0424    any that are short or have failed are reissued:
0425 
0426    * Failed cache requests are issued against the server instead.
0427 
0428    * Failed server requests just fail.
0429 
0430    * Short reads against either source will be reissued against that source
0431      provided they have transferred some more data:
0432 
0433      * The cache may need to skip holes that it can't do DIO from.
0434 
0435      * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
0436        end of the slice instead of reissuing.
0437 
0438  * Once the data is read, the folios that have been fully read/cleared:
0439 
0440    * Will be marked uptodate.
0441 
0442    * If a cache is present, will be marked with PG_fscache.
0443 
0444    * Unlocked
0445 
0446  * Any folios that need writing to the cache will then have DIO writes issued.
0447 
0448  * Synchronous operations will wait for reading to be complete.
0449 
0450  * Writes to the cache will proceed asynchronously and the folios will have the
0451    PG_fscache mark removed when that completes.
0452 
0453  * The request structures will be cleaned up when everything has completed.
0454 
0455 
0456 Read Helper Cache API
0457 ---------------------
0458 
0459 When implementing a local cache to be used by the read helpers, two things are
0460 required: some way for the network filesystem to initialise the caching for a
0461 read request and a table of operations for the helpers to call.
0462 
0463 The network filesystem's ->begin_cache_operation() method is called to set up a
0464 cache and this must call into the cache to do the work.  If using fscache, for
0465 example, the cache would call::
0466 
0467         int fscache_begin_read_operation(struct netfs_io_request *rreq,
0468                                          struct fscache_cookie *cookie);
0469 
0470 passing in the request pointer and the cookie corresponding to the file.
0471 
0472 The netfs_io_request object contains a place for the cache to hang its
0473 state::
0474 
0475         struct netfs_cache_resources {
0476                 const struct netfs_cache_ops    *ops;
0477                 void                            *cache_priv;
0478                 void                            *cache_priv2;
0479         };
0480 
0481 This contains an operations table pointer and two private pointers.  The
0482 operation table looks like the following::
0483 
0484         struct netfs_cache_ops {
0485                 void (*end_operation)(struct netfs_cache_resources *cres);
0486 
0487                 void (*expand_readahead)(struct netfs_cache_resources *cres,
0488                                          loff_t *_start, size_t *_len, loff_t i_size);
0489 
0490                 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
0491                                                        loff_t i_size);
0492 
0493                 int (*read)(struct netfs_cache_resources *cres,
0494                             loff_t start_pos,
0495                             struct iov_iter *iter,
0496                             bool seek_data,
0497                             netfs_io_terminated_t term_func,
0498                             void *term_func_priv);
0499 
0500                 int (*prepare_write)(struct netfs_cache_resources *cres,
0501                                      loff_t *_start, size_t *_len, loff_t i_size,
0502                                      bool no_space_allocated_yet);
0503 
0504                 int (*write)(struct netfs_cache_resources *cres,
0505                              loff_t start_pos,
0506                              struct iov_iter *iter,
0507                              netfs_io_terminated_t term_func,
0508                              void *term_func_priv);
0509 
0510                 int (*query_occupancy)(struct netfs_cache_resources *cres,
0511                                        loff_t start, size_t len, size_t granularity,
0512                                        loff_t *_data_start, size_t *_data_len);
0513         };
0514 
0515 With a termination handler function pointer::
0516 
0517         typedef void (*netfs_io_terminated_t)(void *priv,
0518                                               ssize_t transferred_or_error,
0519                                               bool was_async);
0520 
0521 The methods defined in the table are:
0522 
0523  * ``end_operation()``
0524 
0525    [Required] Called to clean up the resources at the end of the read request.
0526 
0527  * ``expand_readahead()``
0528 
0529    [Optional] Called at the beginning of a netfs_readahead() operation to allow
0530    the cache to expand a request in either direction.  This allows the cache to
0531    size the request appropriately for the cache granularity.
0532 
0533    The function is passed poiners to the start and length in its parameters,
0534    plus the size of the file for reference, and adjusts the start and length
0535    appropriately.  It should return one of:
0536 
0537    * ``NETFS_FILL_WITH_ZEROES``
0538    * ``NETFS_DOWNLOAD_FROM_SERVER``
0539    * ``NETFS_READ_FROM_CACHE``
0540    * ``NETFS_INVALID_READ``
0541 
0542    to indicate whether the slice should just be cleared or whether it should be
0543    downloaded from the server or read from the cache - or whether slicing
0544    should be given up at the current point.
0545 
0546  * ``prepare_read()``
0547 
0548    [Required] Called to configure the next slice of a request.  ->start and
0549    ->len in the subrequest indicate where and how big the next slice can be;
0550    the cache gets to reduce the length to match its granularity requirements.
0551 
0552  * ``read()``
0553 
0554    [Required] Called to read from the cache.  The start file offset is given
0555    along with an iterator to read to, which gives the length also.  It can be
0556    given a hint requesting that it seek forward from that start position for
0557    data.
0558 
0559    Also provided is a pointer to a termination handler function and private
0560    data to pass to that function.  The termination function should be called
0561    with the number of bytes transferred or an error code, plus a flag
0562    indicating whether the termination is definitely happening in the caller's
0563    context.
0564 
0565  * ``prepare_write()``
0566 
0567    [Required] Called to prepare a write to the cache to take place.  This
0568    involves checking to see whether the cache has sufficient space to honour
0569    the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
0570    region can be shrunk or it can be expanded to a page boundary either way as
0571    necessary to align for direct I/O.  i_size holds the size of the object and
0572    is provided for reference.  no_space_allocated_yet is set to true if the
0573    caller is certain that no data has been written to that region - for example
0574    if it tried to do a read from there already.
0575 
0576  * ``write()``
0577 
0578    [Required] Called to write to the cache.  The start file offset is given
0579    along with an iterator to write from, which gives the length also.
0580 
0581    Also provided is a pointer to a termination handler function and private
0582    data to pass to that function.  The termination function should be called
0583    with the number of bytes transferred or an error code, plus a flag
0584    indicating whether the termination is definitely happening in the caller's
0585    context.
0586 
0587  * ``query_occupancy()``
0588 
0589    [Required] Called to find out where the next piece of data is within a
0590    particular region of the cache.  The start and length of the region to be
0591    queried are passed in, along with the granularity to which the answer needs
0592    to be aligned.  The function passes back the start and length of the data,
0593    if any, available within that region.  Note that there may be a hole at the
0594    front.
0595 
0596    It returns 0 if some data was found, -ENODATA if there was no usable data
0597    within the region or -ENOBUFS if there is no caching on this file.
0598 
0599 Note that these methods are passed a pointer to the cache resource structure,
0600 not the read request structure as they could be used in other situations where
0601 there isn't a read request structure as well, such as writing dirty data to the
0602 cache.
0603 
0604 
0605 API Function Reference
0606 ======================
0607 
0608 .. kernel-doc:: include/linux/netfs.h
0609 .. kernel-doc:: fs/netfs/buffered_read.c
0610 .. kernel-doc:: fs/netfs/io.c