0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =================================
0004 Network Filesystem Helper Library
0005 =================================
0006
0007 .. Contents:
0008
0009 - Overview.
0010 - Per-inode context.
0011 - Inode context helper functions.
0012 - Buffered read helpers.
0013 - Read helper functions.
0014 - Read helper structures.
0015 - Read helper operations.
0016 - Read helper procedure.
0017 - Read helper cache API.
0018
0019
0020 Overview
0021 ========
0022
0023 The network filesystem helper library is a set of functions designed to aid a
0024 network filesystem in implementing VM/VFS operations. For the moment, that
0025 just includes turning various VM buffered read operations into requests to read
0026 from the server. The helper library, however, can also interpose other
0027 services, such as local caching or local data encryption.
0028
0029 Note that the library module doesn't link against local caching directly, so
0030 access must be provided by the netfs.
0031
0032
0033 Per-Inode Context
0034 =================
0035
0036 The network filesystem helper library needs a place to store a bit of state for
0037 its use on each netfs inode it is helping to manage. To this end, a context
0038 structure is defined::
0039
0040 struct netfs_inode {
0041 struct inode inode;
0042 const struct netfs_request_ops *ops;
0043 struct fscache_cookie *cache;
0044 };
0045
0046 A network filesystem that wants to use netfs lib must place one of these in its
0047 inode wrapper struct instead of the VFS ``struct inode``. This can be done in
0048 a way similar to the following::
0049
0050 struct my_inode {
0051 struct netfs_inode netfs; /* Netfslib context and vfs inode */
0052 ...
0053 };
0054
0055 This allows netfslib to find its state by using ``container_of()`` from the
0056 inode pointer, thereby allowing the netfslib helper functions to be pointed to
0057 directly by the VFS/VM operation tables.
0058
0059 The structure contains the following fields:
0060
0061 * ``inode``
0062
0063 The VFS inode structure.
0064
0065 * ``ops``
0066
0067 The set of operations provided by the network filesystem to netfslib.
0068
0069 * ``cache``
0070
0071 Local caching cookie, or NULL if no caching is enabled. This field does not
0072 exist if fscache is disabled.
0073
0074
0075 Inode Context Helper Functions
0076 ------------------------------
0077
0078 To help deal with the per-inode context, a number helper functions are
0079 provided. Firstly, a function to perform basic initialisation on a context and
0080 set the operations table pointer::
0081
0082 void netfs_inode_init(struct netfs_inode *ctx,
0083 const struct netfs_request_ops *ops);
0084
0085 then a function to cast from the VFS inode structure to the netfs context::
0086
0087 struct netfs_inode *netfs_node(struct inode *inode);
0088
0089 and finally, a function to get the cache cookie pointer from the context
0090 attached to an inode (or NULL if fscache is disabled)::
0091
0092 struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
0093
0094
0095 Buffered Read Helpers
0096 =====================
0097
0098 The library provides a set of read helpers that handle the ->read_folio(),
0099 ->readahead() and much of the ->write_begin() VM operations and translate them
0100 into a common call framework.
0101
0102 The following services are provided:
0103
0104 * Handle folios that span multiple pages.
0105
0106 * Insulate the netfs from VM interface changes.
0107
0108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
0109 don't match folio sizes or folio alignments and that may cross folios.
0110
0111 * Allow the netfs to expand a readahead request in both directions to meet its
0112 needs.
0113
0114 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
0115
0116 * Handle local caching, allowing cached data and server-read data to be
0117 interleaved for a single request.
0118
0119 * Handle clearing of bufferage that aren't on the server.
0120
0121 * Handle retrying of reads that failed, switching reads from the cache to the
0122 server as necessary.
0123
0124 * In the future, this is a place that other services can be performed, such as
0125 local encryption of data to be stored remotely or in the cache.
0126
0127 From the network filesystem, the helpers require a table of operations. This
0128 includes a mandatory method to issue a read operation along with a number of
0129 optional methods.
0130
0131
0132 Read Helper Functions
0133 ---------------------
0134
0135 Three read helpers are provided::
0136
0137 void netfs_readahead(struct readahead_control *ractl);
0138 int netfs_read_folio(struct file *file,
0139 struct folio *folio);
0140 int netfs_write_begin(struct netfs_inode *ctx,
0141 struct file *file,
0142 struct address_space *mapping,
0143 loff_t pos,
0144 unsigned int len,
0145 struct folio **_folio,
0146 void **_fsdata);
0147
0148 Each corresponds to a VM address space operation. These operations use the
0149 state in the per-inode context.
0150
0151 For ->readahead() and ->read_folio(), the network filesystem just point directly
0152 at the corresponding read helper; whereas for ->write_begin(), it may be a
0153 little more complicated as the network filesystem might want to flush
0154 conflicting writes or track dirty data and needs to put the acquired folio if
0155 an error occurs after calling the helper.
0156
0157 The helpers manage the read request, calling back into the network filesystem
0158 through the suppplied table of operations. Waits will be performed as
0159 necessary before returning for helpers that are meant to be synchronous.
0160
0161 If an error occurs, the ->free_request() will be called to clean up the
0162 netfs_io_request struct allocated. If some parts of the request are in
0163 progress when an error occurs, the request will get partially completed if
0164 sufficient data is read.
0165
0166 Additionally, there is::
0167
0168 * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
0169 ssize_t transferred_or_error,
0170 bool was_async);
0171
0172 which should be called to complete a read subrequest. This is given the number
0173 of bytes transferred or a negative error code, plus a flag indicating whether
0174 the operation was asynchronous (ie. whether the follow-on processing can be
0175 done in the current context, given this may involve sleeping).
0176
0177
0178 Read Helper Structures
0179 ----------------------
0180
0181 The read helpers make use of a couple of structures to maintain the state of
0182 the read. The first is a structure that manages a read request as a whole::
0183
0184 struct netfs_io_request {
0185 struct inode *inode;
0186 struct address_space *mapping;
0187 struct netfs_cache_resources cache_resources;
0188 void *netfs_priv;
0189 loff_t start;
0190 size_t len;
0191 loff_t i_size;
0192 const struct netfs_request_ops *netfs_ops;
0193 unsigned int debug_id;
0194 ...
0195 };
0196
0197 The above fields are the ones the netfs can use. They are:
0198
0199 * ``inode``
0200 * ``mapping``
0201
0202 The inode and the address space of the file being read from. The mapping
0203 may or may not point to inode->i_data.
0204
0205 * ``cache_resources``
0206
0207 Resources for the local cache to use, if present.
0208
0209 * ``netfs_priv``
0210
0211 The network filesystem's private data. The value for this can be passed in
0212 to the helper functions or set during the request.
0213
0214 * ``start``
0215 * ``len``
0216
0217 The file position of the start of the read request and the length. These
0218 may be altered by the ->expand_readahead() op.
0219
0220 * ``i_size``
0221
0222 The size of the file at the start of the request.
0223
0224 * ``netfs_ops``
0225
0226 A pointer to the operation table. The value for this is passed into the
0227 helper functions.
0228
0229 * ``debug_id``
0230
0231 A number allocated to this operation that can be displayed in trace lines
0232 for reference.
0233
0234
0235 The second structure is used to manage individual slices of the overall read
0236 request::
0237
0238 struct netfs_io_subrequest {
0239 struct netfs_io_request *rreq;
0240 loff_t start;
0241 size_t len;
0242 size_t transferred;
0243 unsigned long flags;
0244 unsigned short debug_index;
0245 ...
0246 };
0247
0248 Each subrequest is expected to access a single source, though the helpers will
0249 handle falling back from one source type to another. The members are:
0250
0251 * ``rreq``
0252
0253 A pointer to the read request.
0254
0255 * ``start``
0256 * ``len``
0257
0258 The file position of the start of this slice of the read request and the
0259 length.
0260
0261 * ``transferred``
0262
0263 The amount of data transferred so far of the length of this slice. The
0264 network filesystem or cache should start the operation this far into the
0265 slice. If a short read occurs, the helpers will call again, having updated
0266 this to reflect the amount read so far.
0267
0268 * ``flags``
0269
0270 Flags pertaining to the read. There are two of interest to the filesystem
0271 or cache:
0272
0273 * ``NETFS_SREQ_CLEAR_TAIL``
0274
0275 This can be set to indicate that the remainder of the slice, from
0276 transferred to len, should be cleared.
0277
0278 * ``NETFS_SREQ_SEEK_DATA_READ``
0279
0280 This is a hint to the cache that it might want to try skipping ahead to
0281 the next data (ie. using SEEK_DATA).
0282
0283 * ``debug_index``
0284
0285 A number allocated to this slice that can be displayed in trace lines for
0286 reference.
0287
0288
0289 Read Helper Operations
0290 ----------------------
0291
0292 The network filesystem must provide the read helpers with a table of operations
0293 through which it can issue requests and negotiate::
0294
0295 struct netfs_request_ops {
0296 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
0297 void (*free_request)(struct netfs_io_request *rreq);
0298 int (*begin_cache_operation)(struct netfs_io_request *rreq);
0299 void (*expand_readahead)(struct netfs_io_request *rreq);
0300 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
0301 void (*issue_read)(struct netfs_io_subrequest *subreq);
0302 bool (*is_still_valid)(struct netfs_io_request *rreq);
0303 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
0304 struct folio **foliop, void **_fsdata);
0305 void (*done)(struct netfs_io_request *rreq);
0306 };
0307
0308 The operations are as follows:
0309
0310 * ``init_request()``
0311
0312 [Optional] This is called to initialise the request structure. It is given
0313 the file for reference.
0314
0315 * ``free_request()``
0316
0317 [Optional] This is called as the request is being deallocated so that the
0318 filesystem can clean up any state it has attached there.
0319
0320 * ``begin_cache_operation()``
0321
0322 [Optional] This is called to ask the network filesystem to call into the
0323 cache (if present) to initialise the caching state for this read. The netfs
0324 library module cannot access the cache directly, so the cache should call
0325 something like fscache_begin_read_operation() to do this.
0326
0327 The cache gets to store its state in ->cache_resources and must set a table
0328 of operations of its own there (though of a different type).
0329
0330 This should return 0 on success and an error code otherwise. If an error is
0331 reported, the operation may proceed anyway, just without local caching (only
0332 out of memory and interruption errors cause failure here).
0333
0334 * ``expand_readahead()``
0335
0336 [Optional] This is called to allow the filesystem to expand the size of a
0337 readahead read request. The filesystem gets to expand the request in both
0338 directions, though it's not permitted to reduce it as the numbers may
0339 represent an allocation already made. If local caching is enabled, it gets
0340 to expand the request first.
0341
0342 Expansion is communicated by changing ->start and ->len in the request
0343 structure. Note that if any change is made, ->len must be increased by at
0344 least as much as ->start is reduced.
0345
0346 * ``clamp_length()``
0347
0348 [Optional] This is called to allow the filesystem to reduce the size of a
0349 subrequest. The filesystem can use this, for example, to chop up a request
0350 that has to be split across multiple servers or to put multiple reads in
0351 flight.
0352
0353 This should return 0 on success and an error code on error.
0354
0355 * ``issue_read()``
0356
0357 [Required] The helpers use this to dispatch a subrequest to the server for
0358 reading. In the subrequest, ->start, ->len and ->transferred indicate what
0359 data should be read from the server.
0360
0361 There is no return value; the netfs_subreq_terminated() function should be
0362 called to indicate whether or not the operation succeeded and how much data
0363 it transferred. The filesystem also should not deal with setting folios
0364 uptodate, unlocking them or dropping their refs - the helpers need to deal
0365 with this as they have to coordinate with copying to the local cache.
0366
0367 Note that the helpers have the folios locked, but not pinned. It is
0368 possible to use the ITER_XARRAY iov iterator to refer to the range of the
0369 inode that is being operated upon without the need to allocate large bvec
0370 tables.
0371
0372 * ``is_still_valid()``
0373
0374 [Optional] This is called to find out if the data just read from the local
0375 cache is still valid. It should return true if it is still valid and false
0376 if not. If it's not still valid, it will be reread from the server.
0377
0378 * ``check_write_begin()``
0379
0380 [Optional] This is called from the netfs_write_begin() helper once it has
0381 allocated/grabbed the folio to be modified to allow the filesystem to flush
0382 conflicting state before allowing it to be modified.
0383
0384 It may unlock and discard the folio it was given and set the caller's folio
0385 pointer to NULL. It should return 0 if everything is now fine (``*foliop``
0386 left set) or the op should be retried (``*foliop`` cleared) and any other
0387 error code to abort the operation.
0388
0389 * ``done``
0390
0391 [Optional] This is called after the folios in the request have all been
0392 unlocked (and marked uptodate if applicable).
0393
0394
0395
0396 Read Helper Procedure
0397 ---------------------
0398
0399 The read helpers work by the following general procedure:
0400
0401 * Set up the request.
0402
0403 * For readahead, allow the local cache and then the network filesystem to
0404 propose expansions to the read request. This is then proposed to the VM.
0405 If the VM cannot fully perform the expansion, a partially expanded read will
0406 be performed, though this may not get written to the cache in its entirety.
0407
0408 * Loop around slicing chunks off of the request to form subrequests:
0409
0410 * If a local cache is present, it gets to do the slicing, otherwise the
0411 helpers just try to generate maximal slices.
0412
0413 * The network filesystem gets to clamp the size of each slice if it is to be
0414 the source. This allows rsize and chunking to be implemented.
0415
0416 * The helpers issue a read from the cache or a read from the server or just
0417 clears the slice as appropriate.
0418
0419 * The next slice begins at the end of the last one.
0420
0421 * As slices finish being read, they terminate.
0422
0423 * When all the subrequests have terminated, the subrequests are assessed and
0424 any that are short or have failed are reissued:
0425
0426 * Failed cache requests are issued against the server instead.
0427
0428 * Failed server requests just fail.
0429
0430 * Short reads against either source will be reissued against that source
0431 provided they have transferred some more data:
0432
0433 * The cache may need to skip holes that it can't do DIO from.
0434
0435 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
0436 end of the slice instead of reissuing.
0437
0438 * Once the data is read, the folios that have been fully read/cleared:
0439
0440 * Will be marked uptodate.
0441
0442 * If a cache is present, will be marked with PG_fscache.
0443
0444 * Unlocked
0445
0446 * Any folios that need writing to the cache will then have DIO writes issued.
0447
0448 * Synchronous operations will wait for reading to be complete.
0449
0450 * Writes to the cache will proceed asynchronously and the folios will have the
0451 PG_fscache mark removed when that completes.
0452
0453 * The request structures will be cleaned up when everything has completed.
0454
0455
0456 Read Helper Cache API
0457 ---------------------
0458
0459 When implementing a local cache to be used by the read helpers, two things are
0460 required: some way for the network filesystem to initialise the caching for a
0461 read request and a table of operations for the helpers to call.
0462
0463 The network filesystem's ->begin_cache_operation() method is called to set up a
0464 cache and this must call into the cache to do the work. If using fscache, for
0465 example, the cache would call::
0466
0467 int fscache_begin_read_operation(struct netfs_io_request *rreq,
0468 struct fscache_cookie *cookie);
0469
0470 passing in the request pointer and the cookie corresponding to the file.
0471
0472 The netfs_io_request object contains a place for the cache to hang its
0473 state::
0474
0475 struct netfs_cache_resources {
0476 const struct netfs_cache_ops *ops;
0477 void *cache_priv;
0478 void *cache_priv2;
0479 };
0480
0481 This contains an operations table pointer and two private pointers. The
0482 operation table looks like the following::
0483
0484 struct netfs_cache_ops {
0485 void (*end_operation)(struct netfs_cache_resources *cres);
0486
0487 void (*expand_readahead)(struct netfs_cache_resources *cres,
0488 loff_t *_start, size_t *_len, loff_t i_size);
0489
0490 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
0491 loff_t i_size);
0492
0493 int (*read)(struct netfs_cache_resources *cres,
0494 loff_t start_pos,
0495 struct iov_iter *iter,
0496 bool seek_data,
0497 netfs_io_terminated_t term_func,
0498 void *term_func_priv);
0499
0500 int (*prepare_write)(struct netfs_cache_resources *cres,
0501 loff_t *_start, size_t *_len, loff_t i_size,
0502 bool no_space_allocated_yet);
0503
0504 int (*write)(struct netfs_cache_resources *cres,
0505 loff_t start_pos,
0506 struct iov_iter *iter,
0507 netfs_io_terminated_t term_func,
0508 void *term_func_priv);
0509
0510 int (*query_occupancy)(struct netfs_cache_resources *cres,
0511 loff_t start, size_t len, size_t granularity,
0512 loff_t *_data_start, size_t *_data_len);
0513 };
0514
0515 With a termination handler function pointer::
0516
0517 typedef void (*netfs_io_terminated_t)(void *priv,
0518 ssize_t transferred_or_error,
0519 bool was_async);
0520
0521 The methods defined in the table are:
0522
0523 * ``end_operation()``
0524
0525 [Required] Called to clean up the resources at the end of the read request.
0526
0527 * ``expand_readahead()``
0528
0529 [Optional] Called at the beginning of a netfs_readahead() operation to allow
0530 the cache to expand a request in either direction. This allows the cache to
0531 size the request appropriately for the cache granularity.
0532
0533 The function is passed poiners to the start and length in its parameters,
0534 plus the size of the file for reference, and adjusts the start and length
0535 appropriately. It should return one of:
0536
0537 * ``NETFS_FILL_WITH_ZEROES``
0538 * ``NETFS_DOWNLOAD_FROM_SERVER``
0539 * ``NETFS_READ_FROM_CACHE``
0540 * ``NETFS_INVALID_READ``
0541
0542 to indicate whether the slice should just be cleared or whether it should be
0543 downloaded from the server or read from the cache - or whether slicing
0544 should be given up at the current point.
0545
0546 * ``prepare_read()``
0547
0548 [Required] Called to configure the next slice of a request. ->start and
0549 ->len in the subrequest indicate where and how big the next slice can be;
0550 the cache gets to reduce the length to match its granularity requirements.
0551
0552 * ``read()``
0553
0554 [Required] Called to read from the cache. The start file offset is given
0555 along with an iterator to read to, which gives the length also. It can be
0556 given a hint requesting that it seek forward from that start position for
0557 data.
0558
0559 Also provided is a pointer to a termination handler function and private
0560 data to pass to that function. The termination function should be called
0561 with the number of bytes transferred or an error code, plus a flag
0562 indicating whether the termination is definitely happening in the caller's
0563 context.
0564
0565 * ``prepare_write()``
0566
0567 [Required] Called to prepare a write to the cache to take place. This
0568 involves checking to see whether the cache has sufficient space to honour
0569 the write. ``*_start`` and ``*_len`` indicate the region to be written; the
0570 region can be shrunk or it can be expanded to a page boundary either way as
0571 necessary to align for direct I/O. i_size holds the size of the object and
0572 is provided for reference. no_space_allocated_yet is set to true if the
0573 caller is certain that no data has been written to that region - for example
0574 if it tried to do a read from there already.
0575
0576 * ``write()``
0577
0578 [Required] Called to write to the cache. The start file offset is given
0579 along with an iterator to write from, which gives the length also.
0580
0581 Also provided is a pointer to a termination handler function and private
0582 data to pass to that function. The termination function should be called
0583 with the number of bytes transferred or an error code, plus a flag
0584 indicating whether the termination is definitely happening in the caller's
0585 context.
0586
0587 * ``query_occupancy()``
0588
0589 [Required] Called to find out where the next piece of data is within a
0590 particular region of the cache. The start and length of the region to be
0591 queried are passed in, along with the granularity to which the answer needs
0592 to be aligned. The function passes back the start and length of the data,
0593 if any, available within that region. Note that there may be a hole at the
0594 front.
0595
0596 It returns 0 if some data was found, -ENODATA if there was no usable data
0597 within the region or -ENOBUFS if there is no caching on this file.
0598
0599 Note that these methods are passed a pointer to the cache resource structure,
0600 not the read request structure as they could be used in other situations where
0601 there isn't a read request structure as well, such as writing dirty data to the
0602 cache.
0603
0604
0605 API Function Reference
0606 ======================
0607
0608 .. kernel-doc:: include/linux/netfs.h
0609 .. kernel-doc:: fs/netfs/buffered_read.c
0610 .. kernel-doc:: fs/netfs/io.c