Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ===========================
0004 The KVM halt polling system
0005 ===========================
0006 
0007 The KVM halt polling system provides a feature within KVM whereby the latency
0008 of a guest can, under some circumstances, be reduced by polling in the host
0009 for some time period after the guest has elected to no longer run by cedeing.
0010 That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
0011 vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
0012 before giving up the cpu to the scheduler in order to let something else run.
0013 
0014 Polling provides a latency advantage in cases where the guest can be run again
0015 very quickly by at least saving us a trip through the scheduler, normally on
0016 the order of a few micro-seconds, although performance benefits are workload
0017 dependant. In the event that no wakeup source arrives during the polling
0018 interval or some other task on the runqueue is runnable the scheduler is
0019 invoked. Thus halt polling is especially useful on workloads with very short
0020 wakeup periods where the time spent halt polling is minimised and the time
0021 savings of not invoking the scheduler are distinguishable.
0022 
0023 The generic halt polling code is implemented in:
0024 
0025         virt/kvm/kvm_main.c: kvm_vcpu_block()
0026 
0027 The powerpc kvm-hv specific case is implemented in:
0028 
0029         arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
0030 
0031 Halt Polling Interval
0032 =====================
0033 
0034 The maximum time for which to poll before invoking the scheduler, referred to
0035 as the halt polling interval, is increased and decreased based on the perceived
0036 effectiveness of the polling in an attempt to limit pointless polling.
0037 This value is stored in either the vcpu struct:
0038 
0039         kvm_vcpu->halt_poll_ns
0040 
0041 or in the case of powerpc kvm-hv, in the vcore struct:
0042 
0043         kvmppc_vcore->halt_poll_ns
0044 
0045 Thus this is a per vcpu (or vcore) value.
0046 
0047 During polling if a wakeup source is received within the halt polling interval,
0048 the interval is left unchanged. In the event that a wakeup source isn't
0049 received during the polling interval (and thus schedule is invoked) there are
0050 two options, either the polling interval and total block time[0] were less than
0051 the global max polling interval (see module params below), or the total block
0052 time was greater than the global max polling interval.
0053 
0054 In the event that both the polling interval and total block time were less than
0055 the global max polling interval then the polling interval can be increased in
0056 the hope that next time during the longer polling interval the wake up source
0057 will be received while the host is polling and the latency benefits will be
0058 received. The polling interval is grown in the function grow_halt_poll_ns() and
0059 is multiplied by the module parameters halt_poll_ns_grow and
0060 halt_poll_ns_grow_start.
0061 
0062 In the event that the total block time was greater than the global max polling
0063 interval then the host will never poll for long enough (limited by the global
0064 max) to wakeup during the polling interval so it may as well be shrunk in order
0065 to avoid pointless polling. The polling interval is shrunk in the function
0066 shrink_halt_poll_ns() and is divided by the module parameter
0067 halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
0068 
0069 It is worth noting that this adjustment process attempts to hone in on some
0070 steady state polling interval but will only really do a good job for wakeups
0071 which come at an approximately constant rate, otherwise there will be constant
0072 adjustment of the polling interval.
0073 
0074 [0] total block time:
0075                       the time between when the halt polling function is
0076                       invoked and a wakeup source received (irrespective of
0077                       whether the scheduler is invoked within that function).
0078 
0079 Module Parameters
0080 =================
0081 
0082 The kvm module has 3 tuneable module parameters to adjust the global max
0083 polling interval as well as the rate at which the polling interval is grown and
0084 shrunk. These variables are defined in include/linux/kvm_host.h and as module
0085 parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
0086 powerpc kvm-hv case.
0087 
0088 +-----------------------+---------------------------+-------------------------+
0089 |Module Parameter       |   Description             |        Default Value    |
0090 +-----------------------+---------------------------+-------------------------+
0091 |halt_poll_ns           | The global max polling    | KVM_HALT_POLL_NS_DEFAULT|
0092 |                       | interval which defines    |                         |
0093 |                       | the ceiling value of the  |                         |
0094 |                       | polling interval for      | (per arch value)        |
0095 |                       | each vcpu.                |                         |
0096 +-----------------------+---------------------------+-------------------------+
0097 |halt_poll_ns_grow      | The value by which the    | 2                       |
0098 |                       | halt polling interval is  |                         |
0099 |                       | multiplied in the         |                         |
0100 |                       | grow_halt_poll_ns()       |                         |
0101 |                       | function.                 |                         |
0102 +-----------------------+---------------------------+-------------------------+
0103 |halt_poll_ns_grow_start| The initial value to grow | 10000                   |
0104 |                       | to from zero in the       |                         |
0105 |                       | grow_halt_poll_ns()       |                         |
0106 |                       | function.                 |                         |
0107 +-----------------------+---------------------------+-------------------------+
0108 |halt_poll_ns_shrink    | The value by which the    | 0                       |
0109 |                       | halt polling interval is  |                         |
0110 |                       | divided in the            |                         |
0111 |                       | shrink_halt_poll_ns()     |                         |
0112 |                       | function.                 |                         |
0113 +-----------------------+---------------------------+-------------------------+
0114 
0115 These module parameters can be set from the debugfs files in:
0116 
0117         /sys/module/kvm/parameters/
0118 
0119 Note: that these module parameters are system wide values and are not able to
0120       be tuned on a per vm basis.
0121 
0122 Further Notes
0123 =============
0124 
0125 - Care should be taken when setting the halt_poll_ns module parameter as a large value
0126   has the potential to drive the cpu usage to 100% on a machine which would be almost
0127   entirely idle otherwise. This is because even if a guest has wakeups during which very
0128   little work is done and which are quite far apart, if the period is shorter than the
0129   global max polling interval (halt_poll_ns) then the host will always poll for the
0130   entire block time and thus cpu utilisation will go to 100%.
0131 
0132 - Halt polling essentially presents a trade off between power usage and latency and
0133   the module parameters should be used to tune the affinity for this. Idle cpu time is
0134   essentially converted to host kernel time with the aim of decreasing latency when
0135   entering the guest.
0136 
0137 - Halt polling will only be conducted by the host when no other tasks are runnable on
0138   that cpu, otherwise the polling will cease immediately and schedule will be invoked to
0139   allow that other task to run. Thus this doesn't allow a guest to denial of service the
0140   cpu.