Back to home page

OSCL-LXR

 
 

    


0001 ===================
0002 Block IO Controller
0003 ===================
0004 
0005 Overview
0006 ========
0007 cgroup subsys "blkio" implements the block io controller. There seems to be
0008 a need of various kinds of IO control policies (like proportional BW, max BW)
0009 both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
0010 Plan is to use the same cgroup based management interface for blkio controller
0011 and based on user options switch IO policies in the background.
0012 
0013 One IO control policy is throttling policy which can be used to
0014 specify upper IO rate limits on devices. This policy is implemented in
0015 generic block layer and can be used on leaf nodes as well as higher
0016 level logical devices like device mapper.
0017 
0018 HOWTO
0019 =====
0020 
0021 Throttling/Upper Limit policy
0022 -----------------------------
0023 Enable Block IO controller::
0024 
0025         CONFIG_BLK_CGROUP=y
0026 
0027 Enable throttling in block layer::
0028 
0029         CONFIG_BLK_DEV_THROTTLING=y
0030 
0031 Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
0032 
0033         mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
0034 
0035 Specify a bandwidth rate on particular device for root group. The format
0036 for policy is "<major>:<minor>  <bytes_per_second>"::
0037 
0038         echo "8:16  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
0039 
0040 This will put a limit of 1MB/second on reads happening for root group
0041 on device having major/minor number 8:16.
0042 
0043 Run dd to read a file and see if rate is throttled to 1MB/s or not::
0044 
0045         # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
0046         1024+0 records in
0047         1024+0 records out
0048         4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
0049 
0050 Limits for writes can be put using blkio.throttle.write_bps_device file.
0051 
0052 Hierarchical Cgroups
0053 ====================
0054 
0055 Throttling implements hierarchy support; however,
0056 throttling's hierarchy support is enabled iff "sane_behavior" is
0057 enabled from cgroup side, which currently is a development option and
0058 not publicly available.
0059 
0060 If somebody created a hierarchy like as follows::
0061 
0062                         root
0063                         /  \
0064                      test1 test2
0065                         |
0066                      test3
0067 
0068 Throttling with "sane_behavior" will handle the
0069 hierarchy correctly. For throttling, all limits apply
0070 to the whole subtree while all statistics are local to the IOs
0071 directly generated by tasks in that cgroup.
0072 
0073 Throttling without "sane_behavior" enabled from cgroup side will
0074 practically treat all groups at same level as if it looks like the
0075 following::
0076 
0077                                 pivot
0078                              /  /   \  \
0079                         root  test1 test2  test3
0080 
0081 Various user visible config options
0082 ===================================
0083 
0084   CONFIG_BLK_CGROUP
0085           Block IO controller.
0086 
0087   CONFIG_BFQ_CGROUP_DEBUG
0088           Debug help. Right now some additional stats file show up in cgroup
0089           if this option is enabled.
0090 
0091   CONFIG_BLK_DEV_THROTTLING
0092           Enable block device throttling support in block layer.
0093 
0094 Details of cgroup files
0095 =======================
0096 
0097 Proportional weight policy files
0098 --------------------------------
0099 
0100   blkio.bfq.weight
0101           Specifies per cgroup weight. This is default weight of the group
0102           on all the devices until and unless overridden by per device rule
0103           (see `blkio.bfq.weight_device` below).
0104 
0105           Currently allowed range of weights is from 1 to 1000. For more details,
0106           see Documentation/block/bfq-iosched.rst.
0107 
0108   blkio.bfq.weight_device
0109           Specifes per cgroup per device weights, overriding the default group
0110           weight. For more details, see Documentation/block/bfq-iosched.rst.
0111 
0112           Following is the format::
0113 
0114             # echo dev_maj:dev_minor weight > blkio.bfq.weight_device
0115 
0116           Configure weight=300 on /dev/sdb (8:16) in this cgroup::
0117 
0118             # echo 8:16 300 > blkio.bfq.weight_device
0119             # cat blkio.bfq.weight_device
0120             dev     weight
0121             8:16    300
0122 
0123           Configure weight=500 on /dev/sda (8:0) in this cgroup::
0124 
0125             # echo 8:0 500 > blkio.bfq.weight_device
0126             # cat blkio.bfq.weight_device
0127             dev     weight
0128             8:0     500
0129             8:16    300
0130 
0131           Remove specific weight for /dev/sda in this cgroup::
0132 
0133             # echo 8:0 0 > blkio.bfq.weight_device
0134             # cat blkio.bfq.weight_device
0135             dev     weight
0136             8:16    300
0137 
0138   blkio.time
0139           Disk time allocated to cgroup per device in milliseconds. First
0140           two fields specify the major and minor number of the device and
0141           third field specifies the disk time allocated to group in
0142           milliseconds.
0143 
0144   blkio.sectors
0145           Number of sectors transferred to/from disk by the group. First
0146           two fields specify the major and minor number of the device and
0147           third field specifies the number of sectors transferred by the
0148           group to/from the device.
0149 
0150   blkio.io_service_bytes
0151           Number of bytes transferred to/from the disk by the group. These
0152           are further divided by the type of operation - read or write, sync
0153           or async. First two fields specify the major and minor number of the
0154           device, third field specifies the operation type and the fourth field
0155           specifies the number of bytes.
0156 
0157   blkio.io_serviced
0158           Number of IOs (bio) issued to the disk by the group. These
0159           are further divided by the type of operation - read or write, sync
0160           or async. First two fields specify the major and minor number of the
0161           device, third field specifies the operation type and the fourth field
0162           specifies the number of IOs.
0163 
0164   blkio.io_service_time
0165           Total amount of time between request dispatch and request completion
0166           for the IOs done by this cgroup. This is in nanoseconds to make it
0167           meaningful for flash devices too. For devices with queue depth of 1,
0168           this time represents the actual service time. When queue_depth > 1,
0169           that is no longer true as requests may be served out of order. This
0170           may cause the service time for a given IO to include the service time
0171           of multiple IOs when served out of order which may result in total
0172           io_service_time > actual time elapsed. This time is further divided by
0173           the type of operation - read or write, sync or async. First two fields
0174           specify the major and minor number of the device, third field
0175           specifies the operation type and the fourth field specifies the
0176           io_service_time in ns.
0177 
0178   blkio.io_wait_time
0179           Total amount of time the IOs for this cgroup spent waiting in the
0180           scheduler queues for service. This can be greater than the total time
0181           elapsed since it is cumulative io_wait_time for all IOs. It is not a
0182           measure of total time the cgroup spent waiting but rather a measure of
0183           the wait_time for its individual IOs. For devices with queue_depth > 1
0184           this metric does not include the time spent waiting for service once
0185           the IO is dispatched to the device but till it actually gets serviced
0186           (there might be a time lag here due to re-ordering of requests by the
0187           device). This is in nanoseconds to make it meaningful for flash
0188           devices too. This time is further divided by the type of operation -
0189           read or write, sync or async. First two fields specify the major and
0190           minor number of the device, third field specifies the operation type
0191           and the fourth field specifies the io_wait_time in ns.
0192 
0193   blkio.io_merged
0194           Total number of bios/requests merged into requests belonging to this
0195           cgroup. This is further divided by the type of operation - read or
0196           write, sync or async.
0197 
0198   blkio.io_queued
0199           Total number of requests queued up at any given instant for this
0200           cgroup. This is further divided by the type of operation - read or
0201           write, sync or async.
0202 
0203   blkio.avg_queue_size
0204           Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
0205           The average queue size for this cgroup over the entire time of this
0206           cgroup's existence. Queue size samples are taken each time one of the
0207           queues of this cgroup gets a timeslice.
0208 
0209   blkio.group_wait_time
0210           Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
0211           This is the amount of time the cgroup had to wait since it became busy
0212           (i.e., went from 0 to 1 request queued) to get a timeslice for one of
0213           its queues. This is different from the io_wait_time which is the
0214           cumulative total of the amount of time spent by each IO in that cgroup
0215           waiting in the scheduler queue. This is in nanoseconds. If this is
0216           read when the cgroup is in a waiting (for timeslice) state, the stat
0217           will only report the group_wait_time accumulated till the last time it
0218           got a timeslice and will not include the current delta.
0219 
0220   blkio.empty_time
0221           Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
0222           This is the amount of time a cgroup spends without any pending
0223           requests when not being served, i.e., it does not include any time
0224           spent idling for one of the queues of the cgroup. This is in
0225           nanoseconds. If this is read when the cgroup is in an empty state,
0226           the stat will only report the empty_time accumulated till the last
0227           time it had a pending request and will not include the current delta.
0228 
0229   blkio.idle_time
0230           Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
0231           This is the amount of time spent by the IO scheduler idling for a
0232           given cgroup in anticipation of a better request than the existing ones
0233           from other queues/cgroups. This is in nanoseconds. If this is read
0234           when the cgroup is in an idling state, the stat will only report the
0235           idle_time accumulated till the last idle period and will not include
0236           the current delta.
0237 
0238   blkio.dequeue
0239           Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
0240           gives the statistics about how many a times a group was dequeued
0241           from service tree of the device. First two fields specify the major
0242           and minor number of the device and third field specifies the number
0243           of times a group was dequeued from a particular device.
0244 
0245   blkio.*_recursive
0246           Recursive version of various stats. These files show the
0247           same information as their non-recursive counterparts but
0248           include stats from all the descendant cgroups.
0249 
0250 Throttling/Upper limit policy files
0251 -----------------------------------
0252   blkio.throttle.read_bps_device
0253           Specifies upper limit on READ rate from the device. IO rate is
0254           specified in bytes per second. Rules are per device. Following is
0255           the format::
0256 
0257             echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
0258 
0259   blkio.throttle.write_bps_device
0260           Specifies upper limit on WRITE rate to the device. IO rate is
0261           specified in bytes per second. Rules are per device. Following is
0262           the format::
0263 
0264             echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
0265 
0266   blkio.throttle.read_iops_device
0267           Specifies upper limit on READ rate from the device. IO rate is
0268           specified in IO per second. Rules are per device. Following is
0269           the format::
0270 
0271            echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
0272 
0273   blkio.throttle.write_iops_device
0274           Specifies upper limit on WRITE rate to the device. IO rate is
0275           specified in io per second. Rules are per device. Following is
0276           the format::
0277 
0278             echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
0279 
0280           Note: If both BW and IOPS rules are specified for a device, then IO is
0281           subjected to both the constraints.
0282 
0283   blkio.throttle.io_serviced
0284           Number of IOs (bio) issued to the disk by the group. These
0285           are further divided by the type of operation - read or write, sync
0286           or async. First two fields specify the major and minor number of the
0287           device, third field specifies the operation type and the fourth field
0288           specifies the number of IOs.
0289 
0290   blkio.throttle.io_service_bytes
0291           Number of bytes transferred to/from the disk by the group. These
0292           are further divided by the type of operation - read or write, sync
0293           or async. First two fields specify the major and minor number of the
0294           device, third field specifies the operation type and the fourth field
0295           specifies the number of bytes.
0296 
0297 Common files among various policies
0298 -----------------------------------
0299   blkio.reset_stats
0300           Writing an int to this file will result in resetting all the stats
0301           for that cgroup.