0001 ==========================================
0002 Explicit volatile write back cache control
0003 ==========================================
0004
0005 Introduction
0006 ------------
0007
0008 Many storage devices, especially in the consumer market, come with volatile
0009 write back caches. That means the devices signal I/O completion to the
0010 operating system before data actually has hit the non-volatile storage. This
0011 behavior obviously speeds up various workloads, but it means the operating
0012 system needs to force data out to the non-volatile storage when it performs
0013 a data integrity operation like fsync, sync or an unmount.
0014
0015 The Linux block layer provides two simple mechanisms that let filesystems
0016 control the caching behavior of the storage device. These mechanisms are
0017 a forced cache flush, and the Force Unit Access (FUA) flag for requests.
0018
0019
0020 Explicit cache flushes
0021 ----------------------
0022
0023 The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
0024 the filesystem and will make sure the volatile cache of the storage device
0025 has been flushed before the actual I/O operation is started. This explicitly
0026 guarantees that previously completed write requests are on non-volatile
0027 storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
0028 set on an otherwise empty bio structure, which causes only an explicit cache
0029 flush without any dependent I/O. It is recommend to use
0030 the blkdev_issue_flush() helper for a pure cache flush.
0031
0032
0033 Forced Unit Access
0034 ------------------
0035
0036 The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
0037 filesystem and will make sure that I/O completion for this request is only
0038 signaled after the data has been committed to non-volatile storage.
0039
0040
0041 Implementation details for filesystems
0042 --------------------------------------
0043
0044 Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
0045 worry if the underlying devices need any explicit cache flushing and how
0046 the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags
0047 may both be set on a single bio.
0048
0049
0050 Implementation details for bio based block drivers
0051 --------------------------------------------------------------
0052
0053 These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
0054 directly below the submit_bio interface. For remapping drivers the REQ_FUA
0055 bits need to be propagated to underlying devices, and a global flush needs
0056 to be implemented for bios with the REQ_PREFLUSH bit set. For real device
0057 drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
0058 on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
0059 data can be completed successfully without doing any work. Drivers for
0060 devices with volatile caches need to implement the support for these
0061 flags themselves without any help from the block layer.
0062
0063
0064 Implementation details for request_fn based block drivers
0065 ---------------------------------------------------------
0066
0067 For devices that do not support volatile write caches there is no driver
0068 support required, the block layer completes empty REQ_PREFLUSH requests before
0069 entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
0070 requests that have a payload. For devices with volatile write caches the
0071 driver needs to tell the block layer that it supports flushing caches by
0072 doing::
0073
0074 blk_queue_write_cache(sdkp->disk->queue, true, false);
0075
0076 and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that
0077 REQ_PREFLUSH requests with a payload are automatically turned into a sequence
0078 of an empty REQ_OP_FLUSH request followed by the actual write by the block
0079 layer. For devices that also support the FUA bit the block layer needs
0080 to be told to pass through the REQ_FUA bit using::
0081
0082 blk_queue_write_cache(sdkp->disk->queue, true, true);
0083
0084 and the driver must handle write requests that have the REQ_FUA bit set
0085 in prep_fn/request_fn. If the FUA bit is not natively supported the block
0086 layer turns it into an empty REQ_OP_FLUSH request after the actual write.