Back to home page

OSCL-LXR

 
 

    


0001 ======================================
0002 Immutable biovecs and biovec iterators
0003 ======================================
0004 
0005 Kent Overstreet <kmo@daterainc.com>
0006 
0007 As of 3.13, biovecs should never be modified after a bio has been submitted.
0008 Instead, we have a new struct bvec_iter which represents a range of a biovec -
0009 the iterator will be modified as the bio is completed, not the biovec.
0010 
0011 More specifically, old code that needed to partially complete a bio would
0012 update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
0013 ended up partway through a biovec, it would increment bv_offset and decrement
0014 bv_len by the number of bytes completed in that biovec.
0015 
0016 In the new scheme of things, everything that must be mutated in order to
0017 partially complete a bio is segregated into struct bvec_iter: bi_sector,
0018 bi_size and bi_idx have been moved there; and instead of modifying bv_offset
0019 and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
0020 bytes completed in the current bvec.
0021 
0022 There are a bunch of new helper macros for hiding the gory details - in
0023 particular, presenting the illusion of partially completed biovecs so that
0024 normal code doesn't have to deal with bi_bvec_done.
0025 
0026  * Driver code should no longer refer to biovecs directly; we now have
0027    bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
0028    constructed from the raw biovecs but taking into account bi_bvec_done and
0029    bi_size.
0030 
0031    bio_for_each_segment() has been updated to take a bvec_iter argument
0032    instead of an integer (that corresponded to bi_idx); for a lot of code the
0033    conversion just required changing the types of the arguments to
0034    bio_for_each_segment().
0035 
0036  * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
0037    wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
0038    advances the bio integrity's iter if present.
0039 
0040    There is a lower level advance function - bvec_iter_advance() - which takes
0041    a pointer to a biovec, not a bio; this is used by the bio integrity code.
0042 
0043 As of 5.12 bvec segments with zero bv_len are not supported.
0044 
0045 What's all this get us?
0046 =======================
0047 
0048 Having a real iterator, and making biovecs immutable, has a number of
0049 advantages:
0050 
0051  * Before, iterating over bios was very awkward when you weren't processing
0052    exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
0053    which copies the contents of one bio into another. Because the biovecs
0054    wouldn't necessarily be the same size, the old code was tricky convoluted -
0055    it had to walk two different bios at the same time, keeping both bi_idx and
0056    and offset into the current biovec for each.
0057 
0058    The new code is much more straightforward - have a look. This sort of
0059    pattern comes up in a lot of places; a lot of drivers were essentially open
0060    coding bvec iterators before, and having common implementation considerably
0061    simplifies a lot of code.
0062 
0063  * Before, any code that might need to use the biovec after the bio had been
0064    completed (perhaps to copy the data somewhere else, or perhaps to resubmit
0065    it somewhere else if there was an error) had to save the entire bvec array
0066    - again, this was being done in a fair number of places.
0067 
0068  * Biovecs can be shared between multiple bios - a bvec iter can represent an
0069    arbitrary range of an existing biovec, both starting and ending midway
0070    through biovecs. This is what enables efficient splitting of arbitrary
0071    bios. Note that this means we _only_ use bi_size to determine when we've
0072    reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
0073    bi_size into account when constructing biovecs.
0074 
0075  * Splitting bios is now much simpler. The old bio_split() didn't even work on
0076    bios with more than a single bvec! Now, we can efficiently split arbitrary
0077    size bios - because the new bio can share the old bio's biovec.
0078 
0079    Care must be taken to ensure the biovec isn't freed while the split bio is
0080    still using it, in case the original bio completes first, though. Using
0081    bio_chain() when splitting bios helps with this.
0082 
0083  * Submitting partially completed bios is now perfectly fine - this comes up
0084    occasionally in stacking block drivers and various code (e.g. md and
0085    bcache) had some ugly workarounds for this.
0086 
0087    It used to be the case that submitting a partially completed bio would work
0088    fine to _most_ devices, but since accessing the raw bvec array was the
0089    norm, not all drivers would respect bi_idx and those would break. Now,
0090    since all drivers _must_ go through the bvec iterator - and have been
0091    audited to make sure they are - submitting partially completed bios is
0092    perfectly fine.
0093 
0094 Other implications:
0095 ===================
0096 
0097  * Almost all usage of bi_idx is now incorrect and has been removed; instead,
0098    where previously you would have used bi_idx you'd now use a bvec_iter,
0099    probably passing it to one of the helper macros.
0100 
0101    I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
0102    now use bio_iter_iovec(), which takes a bvec_iter and returns a
0103    literal struct bio_vec - constructed on the fly from the raw biovec but
0104    taking into account bi_bvec_done (and bi_size).
0105 
0106  * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
0107    doesn't actually own the bio. The reason is twofold: firstly, it's not
0108    actually needed for iterating over the bio anymore - we only use bi_size.
0109    Secondly, when cloning a bio and reusing (a portion of) the original bio's
0110    biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
0111    over all the biovecs in the new bio - which is silly as it's not needed.
0112 
0113    So, don't use bi_vcnt anymore.
0114 
0115  * The current interface allows the block layer to split bios as needed, so we
0116    could eliminate a lot of complexity particularly in stacked drivers. Code
0117    that creates bios can then create whatever size bios are convenient, and
0118    more importantly stacked drivers don't have to deal with both their own bio
0119    size limitations and the limitations of the underlying devices. Thus
0120    there's no need to define ->merge_bvec_fn() callbacks for individual block
0121    drivers.
0122 
0123 Usage of helpers:
0124 =================
0125 
0126 * The following helpers whose names have the suffix of `_all` can only be used
0127   on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
0128   shouldn't use them because the bio may have been split before it reached the
0129   driver.
0130 
0131 ::
0132 
0133         bio_for_each_segment_all()
0134         bio_for_each_bvec_all()
0135         bio_first_bvec_all()
0136         bio_first_page_all()
0137         bio_last_bvec_all()
0138 
0139 * The following helpers iterate over single-page segment. The passed 'struct
0140   bio_vec' will contain a single-page IO vector during the iteration::
0141 
0142         bio_for_each_segment()
0143         bio_for_each_segment_all()
0144 
0145 * The following helpers iterate over multi-page bvec. The passed 'struct
0146   bio_vec' will contain a multi-page IO vector during the iteration::
0147 
0148         bio_for_each_bvec()
0149         bio_for_each_bvec_all()
0150         rq_for_each_bvec()