0001 ==================================================
0002 Runtime Power Management Framework for I/O Devices
0003 ==================================================
0004
0005 (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
0006
0007 (C) 2010 Alan Stern <stern@rowland.harvard.edu>
0008
0009 (C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
0010
0011 1. Introduction
0012 ===============
0013
0014 Support for runtime power management (runtime PM) of I/O devices is provided
0015 at the power management core (PM core) level by means of:
0016
0017 * The power management workqueue pm_wq in which bus types and device drivers can
0018 put their PM-related work items. It is strongly recommended that pm_wq be
0019 used for queuing all work items related to runtime PM, because this allows
0020 them to be synchronized with system-wide power transitions (suspend to RAM,
0021 hibernation and resume from system sleep states). pm_wq is declared in
0022 include/linux/pm_runtime.h and defined in kernel/power/main.c.
0023
0024 * A number of runtime PM fields in the 'power' member of 'struct device' (which
0025 is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
0026 be used for synchronizing runtime PM operations with one another.
0027
0028 * Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in
0029 include/linux/pm.h).
0030
0031 * A set of helper functions defined in drivers/base/power/runtime.c that can be
0032 used for carrying out runtime PM operations in such a way that the
0033 synchronization between them is taken care of by the PM core. Bus types and
0034 device drivers are encouraged to use these functions.
0035
0036 The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM
0037 fields of 'struct dev_pm_info' and the core helper functions provided for
0038 runtime PM are described below.
0039
0040 2. Device Runtime PM Callbacks
0041 ==============================
0042
0043 There are three device runtime PM callbacks defined in 'struct dev_pm_ops'::
0044
0045 struct dev_pm_ops {
0046 ...
0047 int (*runtime_suspend)(struct device *dev);
0048 int (*runtime_resume)(struct device *dev);
0049 int (*runtime_idle)(struct device *dev);
0050 ...
0051 };
0052
0053 The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
0054 are executed by the PM core for the device's subsystem that may be either of
0055 the following:
0056
0057 1. PM domain of the device, if the device's PM domain object, dev->pm_domain,
0058 is present.
0059
0060 2. Device type of the device, if both dev->type and dev->type->pm are present.
0061
0062 3. Device class of the device, if both dev->class and dev->class->pm are
0063 present.
0064
0065 4. Bus type of the device, if both dev->bus and dev->bus->pm are present.
0066
0067 If the subsystem chosen by applying the above rules doesn't provide the relevant
0068 callback, the PM core will invoke the corresponding driver callback stored in
0069 dev->driver->pm directly (if present).
0070
0071 The PM core always checks which callback to use in the order given above, so the
0072 priority order of callbacks from high to low is: PM domain, device type, class
0073 and bus type. Moreover, the high-priority one will always take precedence over
0074 a low-priority one. The PM domain, bus type, device type and class callbacks
0075 are referred to as subsystem-level callbacks in what follows.
0076
0077 By default, the callbacks are always invoked in process context with interrupts
0078 enabled. However, the pm_runtime_irq_safe() helper function can be used to tell
0079 the PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume()
0080 and ->runtime_idle() callbacks for the given device in atomic context with
0081 interrupts disabled. This implies that the callback routines in question must
0082 not block or sleep, but it also means that the synchronous helper functions
0083 listed at the end of Section 4 may be used for that device within an interrupt
0084 handler or generally in an atomic context.
0085
0086 The subsystem-level suspend callback, if present, is _entirely_ _responsible_
0087 for handling the suspend of the device as appropriate, which may, but need not
0088 include executing the device driver's own ->runtime_suspend() callback (from the
0089 PM core's point of view it is not necessary to implement a ->runtime_suspend()
0090 callback in a device driver as long as the subsystem-level suspend callback
0091 knows what to do to handle the device).
0092
0093 * Once the subsystem-level suspend callback (or the driver suspend callback,
0094 if invoked directly) has completed successfully for the given device, the PM
0095 core regards the device as suspended, which need not mean that it has been
0096 put into a low power state. It is supposed to mean, however, that the
0097 device will not process data and will not communicate with the CPU(s) and
0098 RAM until the appropriate resume callback is executed for it. The runtime
0099 PM status of a device after successful execution of the suspend callback is
0100 'suspended'.
0101
0102 * If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM
0103 status remains 'active', which means that the device _must_ be fully
0104 operational afterwards.
0105
0106 * If the suspend callback returns an error code different from -EBUSY and
0107 -EAGAIN, the PM core regards this as a fatal error and will refuse to run
0108 the helper functions described in Section 4 for the device until its status
0109 is directly set to either 'active', or 'suspended' (the PM core provides
0110 special helper functions for this purpose).
0111
0112 In particular, if the driver requires remote wakeup capability (i.e. hardware
0113 mechanism allowing the device to request a change of its power state, such as
0114 PCI PME) for proper functioning and device_can_wakeup() returns 'false' for the
0115 device, then ->runtime_suspend() should return -EBUSY. On the other hand, if
0116 device_can_wakeup() returns 'true' for the device and the device is put into a
0117 low-power state during the execution of the suspend callback, it is expected
0118 that remote wakeup will be enabled for the device. Generally, remote wakeup
0119 should be enabled for all input devices put into low-power states at run time.
0120
0121 The subsystem-level resume callback, if present, is **entirely responsible** for
0122 handling the resume of the device as appropriate, which may, but need not
0123 include executing the device driver's own ->runtime_resume() callback (from the
0124 PM core's point of view it is not necessary to implement a ->runtime_resume()
0125 callback in a device driver as long as the subsystem-level resume callback knows
0126 what to do to handle the device).
0127
0128 * Once the subsystem-level resume callback (or the driver resume callback, if
0129 invoked directly) has completed successfully, the PM core regards the device
0130 as fully operational, which means that the device _must_ be able to complete
0131 I/O operations as needed. The runtime PM status of the device is then
0132 'active'.
0133
0134 * If the resume callback returns an error code, the PM core regards this as a
0135 fatal error and will refuse to run the helper functions described in Section
0136 4 for the device, until its status is directly set to either 'active', or
0137 'suspended' (by means of special helper functions provided by the PM core
0138 for this purpose).
0139
0140 The idle callback (a subsystem-level one, if present, or the driver one) is
0141 executed by the PM core whenever the device appears to be idle, which is
0142 indicated to the PM core by two counters, the device's usage counter and the
0143 counter of 'active' children of the device.
0144
0145 * If any of these counters is decreased using a helper function provided by
0146 the PM core and it turns out to be equal to zero, the other counter is
0147 checked. If that counter also is equal to zero, the PM core executes the
0148 idle callback with the device as its argument.
0149
0150 The action performed by the idle callback is totally dependent on the subsystem
0151 (or driver) in question, but the expected and recommended action is to check
0152 if the device can be suspended (i.e. if all of the conditions necessary for
0153 suspending the device are satisfied) and to queue up a suspend request for the
0154 device in that case. If there is no idle callback, or if the callback returns
0155 0, then the PM core will attempt to carry out a runtime suspend of the device,
0156 also respecting devices configured for autosuspend. In essence this means a
0157 call to pm_runtime_autosuspend() (do note that drivers needs to update the
0158 device last busy mark, pm_runtime_mark_last_busy(), to control the delay under
0159 this circumstance). To prevent this (for example, if the callback routine has
0160 started a delayed suspend), the routine must return a non-zero value. Negative
0161 error return codes are ignored by the PM core.
0162
0163 The helper functions provided by the PM core, described in Section 4, guarantee
0164 that the following constraints are met with respect to runtime PM callbacks for
0165 one device:
0166
0167 (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
0168 ->runtime_suspend() in parallel with ->runtime_resume() or with another
0169 instance of ->runtime_suspend() for the same device) with the exception that
0170 ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
0171 ->runtime_idle() (although ->runtime_idle() will not be started while any
0172 of the other callbacks is being executed for the same device).
0173
0174 (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
0175 devices (i.e. the PM core will only execute ->runtime_idle() or
0176 ->runtime_suspend() for the devices the runtime PM status of which is
0177 'active').
0178
0179 (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
0180 the usage counter of which is equal to zero _and_ either the counter of
0181 'active' children of which is equal to zero, or the 'power.ignore_children'
0182 flag of which is set.
0183
0184 (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the
0185 PM core will only execute ->runtime_resume() for the devices the runtime
0186 PM status of which is 'suspended').
0187
0188 Additionally, the helper functions provided by the PM core obey the following
0189 rules:
0190
0191 * If ->runtime_suspend() is about to be executed or there's a pending request
0192 to execute it, ->runtime_idle() will not be executed for the same device.
0193
0194 * A request to execute or to schedule the execution of ->runtime_suspend()
0195 will cancel any pending requests to execute ->runtime_idle() for the same
0196 device.
0197
0198 * If ->runtime_resume() is about to be executed or there's a pending request
0199 to execute it, the other callbacks will not be executed for the same device.
0200
0201 * A request to execute ->runtime_resume() will cancel any pending or
0202 scheduled requests to execute the other callbacks for the same device,
0203 except for scheduled autosuspends.
0204
0205 3. Runtime PM Device Fields
0206 ===========================
0207
0208 The following device runtime PM fields are present in 'struct dev_pm_info', as
0209 defined in include/linux/pm.h:
0210
0211 `struct timer_list suspend_timer;`
0212 - timer used for scheduling (delayed) suspend and autosuspend requests
0213
0214 `unsigned long timer_expires;`
0215 - timer expiration time, in jiffies (if this is different from zero, the
0216 timer is running and will expire at that time, otherwise the timer is not
0217 running)
0218
0219 `struct work_struct work;`
0220 - work structure used for queuing up requests (i.e. work items in pm_wq)
0221
0222 `wait_queue_head_t wait_queue;`
0223 - wait queue used if any of the helper functions needs to wait for another
0224 one to complete
0225
0226 `spinlock_t lock;`
0227 - lock used for synchronization
0228
0229 `atomic_t usage_count;`
0230 - the usage counter of the device
0231
0232 `atomic_t child_count;`
0233 - the count of 'active' children of the device
0234
0235 `unsigned int ignore_children;`
0236 - if set, the value of child_count is ignored (but still updated)
0237
0238 `unsigned int disable_depth;`
0239 - used for disabling the helper functions (they work normally if this is
0240 equal to zero); the initial value of it is 1 (i.e. runtime PM is
0241 initially disabled for all devices)
0242
0243 `int runtime_error;`
0244 - if set, there was a fatal error (one of the callbacks returned error code
0245 as described in Section 2), so the helper functions will not work until
0246 this flag is cleared; this is the error code returned by the failing
0247 callback
0248
0249 `unsigned int idle_notification;`
0250 - if set, ->runtime_idle() is being executed
0251
0252 `unsigned int request_pending;`
0253 - if set, there's a pending request (i.e. a work item queued up into pm_wq)
0254
0255 `enum rpm_request request;`
0256 - type of request that's pending (valid if request_pending is set)
0257
0258 `unsigned int deferred_resume;`
0259 - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
0260 being executed for that device and it is not practical to wait for the
0261 suspend to complete; means "start a resume as soon as you've suspended"
0262
0263 `enum rpm_status runtime_status;`
0264 - the runtime PM status of the device; this field's initial value is
0265 RPM_SUSPENDED, which means that each device is initially regarded by the
0266 PM core as 'suspended', regardless of its real hardware status
0267
0268 `enum rpm_status last_status;`
0269 - the last runtime PM status of the device captured before disabling runtime
0270 PM for it (invalid initially and when disable_depth is 0)
0271
0272 `unsigned int runtime_auto;`
0273 - if set, indicates that the user space has allowed the device driver to
0274 power manage the device at run time via the /sys/devices/.../power/control
0275 `interface;` it may only be modified with the help of the
0276 pm_runtime_allow() and pm_runtime_forbid() helper functions
0277
0278 `unsigned int no_callbacks;`
0279 - indicates that the device does not use the runtime PM callbacks (see
0280 Section 8); it may be modified only by the pm_runtime_no_callbacks()
0281 helper function
0282
0283 `unsigned int irq_safe;`
0284 - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
0285 will be invoked with the spinlock held and interrupts disabled
0286
0287 `unsigned int use_autosuspend;`
0288 - indicates that the device's driver supports delayed autosuspend (see
0289 Section 9); it may be modified only by the
0290 pm_runtime{_dont}_use_autosuspend() helper functions
0291
0292 `unsigned int timer_autosuspends;`
0293 - indicates that the PM core should attempt to carry out an autosuspend
0294 when the timer expires rather than a normal suspend
0295
0296 `int autosuspend_delay;`
0297 - the delay time (in milliseconds) to be used for autosuspend
0298
0299 `unsigned long last_busy;`
0300 - the time (in jiffies) when the pm_runtime_mark_last_busy() helper
0301 function was last called for this device; used in calculating inactivity
0302 periods for autosuspend
0303
0304 All of the above fields are members of the 'power' member of 'struct device'.
0305
0306 4. Runtime PM Device Helper Functions
0307 =====================================
0308
0309 The following runtime PM helper functions are defined in
0310 drivers/base/power/runtime.c and include/linux/pm_runtime.h:
0311
0312 `void pm_runtime_init(struct device *dev);`
0313 - initialize the device runtime PM fields in 'struct dev_pm_info'
0314
0315 `void pm_runtime_remove(struct device *dev);`
0316 - make sure that the runtime PM of the device will be disabled after
0317 removing the device from device hierarchy
0318
0319 `int pm_runtime_idle(struct device *dev);`
0320 - execute the subsystem-level idle callback for the device; returns an
0321 error code on failure, where -EINPROGRESS means that ->runtime_idle() is
0322 already being executed; if there is no callback or the callback returns 0
0323 then run pm_runtime_autosuspend(dev) and return its result
0324
0325 `int pm_runtime_suspend(struct device *dev);`
0326 - execute the subsystem-level suspend callback for the device; returns 0 on
0327 success, 1 if the device's runtime PM status was already 'suspended', or
0328 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
0329 to suspend the device again in future and -EACCES means that
0330 'power.disable_depth' is different from 0
0331
0332 `int pm_runtime_autosuspend(struct device *dev);`
0333 - same as pm_runtime_suspend() except that the autosuspend delay is taken
0334 `into account;` if pm_runtime_autosuspend_expiration() says the delay has
0335 not yet expired then an autosuspend is scheduled for the appropriate time
0336 and 0 is returned
0337
0338 `int pm_runtime_resume(struct device *dev);`
0339 - execute the subsystem-level resume callback for the device; returns 0 on
0340 success, 1 if the device's runtime PM status is already 'active' (also if
0341 'power.disable_depth' is nonzero, but the status was 'active' when it was
0342 changing from 0 to 1) or error code on failure, where -EAGAIN means it may
0343 be safe to attempt to resume the device again in future, but
0344 'power.runtime_error' should be checked additionally, and -EACCES means
0345 that the callback could not be run, because 'power.disable_depth' was
0346 different from 0
0347
0348 `int pm_runtime_resume_and_get(struct device *dev);`
0349 - run pm_runtime_resume(dev) and if successful, increment the device's
0350 usage counter; return the result of pm_runtime_resume
0351
0352 `int pm_request_idle(struct device *dev);`
0353 - submit a request to execute the subsystem-level idle callback for the
0354 device (the request is represented by a work item in pm_wq); returns 0 on
0355 success or error code if the request has not been queued up
0356
0357 `int pm_request_autosuspend(struct device *dev);`
0358 - schedule the execution of the subsystem-level suspend callback for the
0359 device when the autosuspend delay has expired; if the delay has already
0360 expired then the work item is queued up immediately
0361
0362 `int pm_schedule_suspend(struct device *dev, unsigned int delay);`
0363 - schedule the execution of the subsystem-level suspend callback for the
0364 device in future, where 'delay' is the time to wait before queuing up a
0365 suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
0366 item is queued up immediately); returns 0 on success, 1 if the device's PM
0367 runtime status was already 'suspended', or error code if the request
0368 hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
0369 ->runtime_suspend() is already scheduled and not yet expired, the new
0370 value of 'delay' will be used as the time to wait
0371
0372 `int pm_request_resume(struct device *dev);`
0373 - submit a request to execute the subsystem-level resume callback for the
0374 device (the request is represented by a work item in pm_wq); returns 0 on
0375 success, 1 if the device's runtime PM status was already 'active', or
0376 error code if the request hasn't been queued up
0377
0378 `void pm_runtime_get_noresume(struct device *dev);`
0379 - increment the device's usage counter
0380
0381 `int pm_runtime_get(struct device *dev);`
0382 - increment the device's usage counter, run pm_request_resume(dev) and
0383 return its result
0384
0385 `int pm_runtime_get_sync(struct device *dev);`
0386 - increment the device's usage counter, run pm_runtime_resume(dev) and
0387 return its result;
0388 note that it does not drop the device's usage counter on errors, so
0389 consider using pm_runtime_resume_and_get() instead of it, especially
0390 if its return value is checked by the caller, as this is likely to
0391 result in cleaner code.
0392
0393 `int pm_runtime_get_if_in_use(struct device *dev);`
0394 - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
0395 runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
0396 nonzero, increment the counter and return 1; otherwise return 0 without
0397 changing the counter
0398
0399 `int pm_runtime_get_if_active(struct device *dev, bool ign_usage_count);`
0400 - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
0401 runtime PM status is RPM_ACTIVE, and either ign_usage_count is true
0402 or the device's usage_count is non-zero, increment the counter and
0403 return 1; otherwise return 0 without changing the counter
0404
0405 `void pm_runtime_put_noidle(struct device *dev);`
0406 - decrement the device's usage counter
0407
0408 `int pm_runtime_put(struct device *dev);`
0409 - decrement the device's usage counter; if the result is 0 then run
0410 pm_request_idle(dev) and return its result
0411
0412 `int pm_runtime_put_autosuspend(struct device *dev);`
0413 - decrement the device's usage counter; if the result is 0 then run
0414 pm_request_autosuspend(dev) and return its result
0415
0416 `int pm_runtime_put_sync(struct device *dev);`
0417 - decrement the device's usage counter; if the result is 0 then run
0418 pm_runtime_idle(dev) and return its result
0419
0420 `int pm_runtime_put_sync_suspend(struct device *dev);`
0421 - decrement the device's usage counter; if the result is 0 then run
0422 pm_runtime_suspend(dev) and return its result
0423
0424 `int pm_runtime_put_sync_autosuspend(struct device *dev);`
0425 - decrement the device's usage counter; if the result is 0 then run
0426 pm_runtime_autosuspend(dev) and return its result
0427
0428 `void pm_runtime_enable(struct device *dev);`
0429 - decrement the device's 'power.disable_depth' field; if that field is equal
0430 to zero, the runtime PM helper functions can execute subsystem-level
0431 callbacks described in Section 2 for the device
0432
0433 `int pm_runtime_disable(struct device *dev);`
0434 - increment the device's 'power.disable_depth' field (if the value of that
0435 field was previously zero, this prevents subsystem-level runtime PM
0436 callbacks from being run for the device), make sure that all of the
0437 pending runtime PM operations on the device are either completed or
0438 canceled; returns 1 if there was a resume request pending and it was
0439 necessary to execute the subsystem-level resume callback for the device
0440 to satisfy that request, otherwise 0 is returned
0441
0442 `int pm_runtime_barrier(struct device *dev);`
0443 - check if there's a resume request pending for the device and resume it
0444 (synchronously) in that case, cancel any other pending runtime PM requests
0445 regarding it and wait for all runtime PM operations on it in progress to
0446 complete; returns 1 if there was a resume request pending and it was
0447 necessary to execute the subsystem-level resume callback for the device to
0448 satisfy that request, otherwise 0 is returned
0449
0450 `void pm_suspend_ignore_children(struct device *dev, bool enable);`
0451 - set/unset the power.ignore_children flag of the device
0452
0453 `int pm_runtime_set_active(struct device *dev);`
0454 - clear the device's 'power.runtime_error' flag, set the device's runtime
0455 PM status to 'active' and update its parent's counter of 'active'
0456 children as appropriate (it is only valid to use this function if
0457 'power.runtime_error' is set or 'power.disable_depth' is greater than
0458 zero); it will fail and return error code if the device has a parent
0459 which is not active and the 'power.ignore_children' flag of which is unset
0460
0461 `void pm_runtime_set_suspended(struct device *dev);`
0462 - clear the device's 'power.runtime_error' flag, set the device's runtime
0463 PM status to 'suspended' and update its parent's counter of 'active'
0464 children as appropriate (it is only valid to use this function if
0465 'power.runtime_error' is set or 'power.disable_depth' is greater than
0466 zero)
0467
0468 `bool pm_runtime_active(struct device *dev);`
0469 - return true if the device's runtime PM status is 'active' or its
0470 'power.disable_depth' field is not equal to zero, or false otherwise
0471
0472 `bool pm_runtime_suspended(struct device *dev);`
0473 - return true if the device's runtime PM status is 'suspended' and its
0474 'power.disable_depth' field is equal to zero, or false otherwise
0475
0476 `bool pm_runtime_status_suspended(struct device *dev);`
0477 - return true if the device's runtime PM status is 'suspended'
0478
0479 `void pm_runtime_allow(struct device *dev);`
0480 - set the power.runtime_auto flag for the device and decrease its usage
0481 counter (used by the /sys/devices/.../power/control interface to
0482 effectively allow the device to be power managed at run time)
0483
0484 `void pm_runtime_forbid(struct device *dev);`
0485 - unset the power.runtime_auto flag for the device and increase its usage
0486 counter (used by the /sys/devices/.../power/control interface to
0487 effectively prevent the device from being power managed at run time)
0488
0489 `void pm_runtime_no_callbacks(struct device *dev);`
0490 - set the power.no_callbacks flag for the device and remove the runtime
0491 PM attributes from /sys/devices/.../power (or prevent them from being
0492 added when the device is registered)
0493
0494 `void pm_runtime_irq_safe(struct device *dev);`
0495 - set the power.irq_safe flag for the device, causing the runtime-PM
0496 callbacks to be invoked with interrupts off
0497
0498 `bool pm_runtime_is_irq_safe(struct device *dev);`
0499 - return true if power.irq_safe flag was set for the device, causing
0500 the runtime-PM callbacks to be invoked with interrupts off
0501
0502 `void pm_runtime_mark_last_busy(struct device *dev);`
0503 - set the power.last_busy field to the current time
0504
0505 `void pm_runtime_use_autosuspend(struct device *dev);`
0506 - set the power.use_autosuspend flag, enabling autosuspend delays; call
0507 pm_runtime_get_sync if the flag was previously cleared and
0508 power.autosuspend_delay is negative
0509
0510 `void pm_runtime_dont_use_autosuspend(struct device *dev);`
0511 - clear the power.use_autosuspend flag, disabling autosuspend delays;
0512 decrement the device's usage counter if the flag was previously set and
0513 power.autosuspend_delay is negative; call pm_runtime_idle
0514
0515 `void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);`
0516 - set the power.autosuspend_delay value to 'delay' (expressed in
0517 milliseconds); if 'delay' is negative then runtime suspends are
0518 prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be
0519 called or the device's usage counter may be decremented and
0520 pm_runtime_idle called depending on if power.autosuspend_delay is
0521 changed to or from a negative value; if power.use_autosuspend is clear,
0522 pm_runtime_idle is called
0523
0524 `unsigned long pm_runtime_autosuspend_expiration(struct device *dev);`
0525 - calculate the time when the current autosuspend delay period will expire,
0526 based on power.last_busy and power.autosuspend_delay; if the delay time
0527 is 1000 ms or larger then the expiration time is rounded up to the
0528 nearest second; returns 0 if the delay period has already expired or
0529 power.use_autosuspend isn't set, otherwise returns the expiration time
0530 in jiffies
0531
0532 It is safe to execute the following helper functions from interrupt context:
0533
0534 - pm_request_idle()
0535 - pm_request_autosuspend()
0536 - pm_schedule_suspend()
0537 - pm_request_resume()
0538 - pm_runtime_get_noresume()
0539 - pm_runtime_get()
0540 - pm_runtime_put_noidle()
0541 - pm_runtime_put()
0542 - pm_runtime_put_autosuspend()
0543 - pm_runtime_enable()
0544 - pm_suspend_ignore_children()
0545 - pm_runtime_set_active()
0546 - pm_runtime_set_suspended()
0547 - pm_runtime_suspended()
0548 - pm_runtime_mark_last_busy()
0549 - pm_runtime_autosuspend_expiration()
0550
0551 If pm_runtime_irq_safe() has been called for a device then the following helper
0552 functions may also be used in interrupt context:
0553
0554 - pm_runtime_idle()
0555 - pm_runtime_suspend()
0556 - pm_runtime_autosuspend()
0557 - pm_runtime_resume()
0558 - pm_runtime_get_sync()
0559 - pm_runtime_put_sync()
0560 - pm_runtime_put_sync_suspend()
0561 - pm_runtime_put_sync_autosuspend()
0562
0563 5. Runtime PM Initialization, Device Probing and Removal
0564 ========================================================
0565
0566 Initially, the runtime PM is disabled for all devices, which means that the
0567 majority of the runtime PM helper functions described in Section 4 will return
0568 -EAGAIN until pm_runtime_enable() is called for the device.
0569
0570 In addition to that, the initial runtime PM status of all devices is
0571 'suspended', but it need not reflect the actual physical state of the device.
0572 Thus, if the device is initially active (i.e. it is able to process I/O), its
0573 runtime PM status must be changed to 'active', with the help of
0574 pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
0575
0576 However, if the device has a parent and the parent's runtime PM is enabled,
0577 calling pm_runtime_set_active() for the device will affect the parent, unless
0578 the parent's 'power.ignore_children' flag is set. Namely, in that case the
0579 parent won't be able to suspend at run time, using the PM core's helper
0580 functions, as long as the child's status is 'active', even if the child's
0581 runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
0582 the child yet or pm_runtime_disable() has been called for it). For this reason,
0583 once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
0584 should be called for it too as soon as reasonably possible or its runtime PM
0585 status should be changed back to 'suspended' with the help of
0586 pm_runtime_set_suspended().
0587
0588 If the default initial runtime PM status of the device (i.e. 'suspended')
0589 reflects the actual state of the device, its bus type's or its driver's
0590 ->probe() callback will likely need to wake it up using one of the PM core's
0591 helper functions described in Section 4. In that case, pm_runtime_resume()
0592 should be used. Of course, for this purpose the device's runtime PM has to be
0593 enabled earlier by calling pm_runtime_enable().
0594
0595 Note, if the device may execute pm_runtime calls during the probe (such as
0596 if it is registered with a subsystem that may call back in) then the
0597 pm_runtime_get_sync() call paired with a pm_runtime_put() call will be
0598 appropriate to ensure that the device is not put back to sleep during the
0599 probe. This can happen with systems such as the network device layer.
0600
0601 It may be desirable to suspend the device once ->probe() has finished.
0602 Therefore the driver core uses the asynchronous pm_request_idle() to submit a
0603 request to execute the subsystem-level idle callback for the device at that
0604 time. A driver that makes use of the runtime autosuspend feature may want to
0605 update the last busy mark before returning from ->probe().
0606
0607 Moreover, the driver core prevents runtime PM callbacks from racing with the bus
0608 notifier callback in __device_release_driver(), which is necessary because the
0609 notifier is used by some subsystems to carry out operations affecting the
0610 runtime PM functionality. It does so by calling pm_runtime_get_sync() before
0611 driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This
0612 resumes the device if it's in the suspended state and prevents it from
0613 being suspended again while those routines are being executed.
0614
0615 To allow bus types and drivers to put devices into the suspended state by
0616 calling pm_runtime_suspend() from their ->remove() routines, the driver core
0617 executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
0618 notifications in __device_release_driver(). This requires bus types and
0619 drivers to make their ->remove() callbacks avoid races with runtime PM directly,
0620 but it also allows more flexibility in the handling of devices during the
0621 removal of their drivers.
0622
0623 Drivers in ->remove() callback should undo the runtime PM changes done
0624 in ->probe(). Usually this means calling pm_runtime_disable(),
0625 pm_runtime_dont_use_autosuspend() etc.
0626
0627 The user space can effectively disallow the driver of the device to power manage
0628 it at run time by changing the value of its /sys/devices/.../power/control
0629 attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
0630 this mechanism may also be used by the driver to effectively turn off the
0631 runtime power management of the device until the user space turns it on.
0632 Namely, during the initialization the driver can make sure that the runtime PM
0633 status of the device is 'active' and call pm_runtime_forbid(). It should be
0634 noted, however, that if the user space has already intentionally changed the
0635 value of /sys/devices/.../power/control to "auto" to allow the driver to power
0636 manage the device at run time, the driver may confuse it by using
0637 pm_runtime_forbid() this way.
0638
0639 6. Runtime PM and System Sleep
0640 ==============================
0641
0642 Runtime PM and system sleep (i.e., system suspend and hibernation, also known
0643 as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
0644 ways. If a device is active when a system sleep starts, everything is
0645 straightforward. But what should happen if the device is already suspended?
0646
0647 The device may have different wake-up settings for runtime PM and system sleep.
0648 For example, remote wake-up may be enabled for runtime suspend but disallowed
0649 for system sleep (device_may_wakeup(dev) returns 'false'). When this happens,
0650 the subsystem-level system suspend callback is responsible for changing the
0651 device's wake-up setting (it may leave that to the device driver's system
0652 suspend routine). It may be necessary to resume the device and suspend it again
0653 in order to do so. The same is true if the driver uses different power levels
0654 or other settings for runtime suspend and system sleep.
0655
0656 During system resume, the simplest approach is to bring all devices back to full
0657 power, even if they had been suspended before the system suspend began. There
0658 are several reasons for this, including:
0659
0660 * The device might need to switch power levels, wake-up settings, etc.
0661
0662 * Remote wake-up events might have been lost by the firmware.
0663
0664 * The device's children may need the device to be at full power in order
0665 to resume themselves.
0666
0667 * The driver's idea of the device state may not agree with the device's
0668 physical state. This can happen during resume from hibernation.
0669
0670 * The device might need to be reset.
0671
0672 * Even though the device was suspended, if its usage counter was > 0 then most
0673 likely it would need a runtime resume in the near future anyway.
0674
0675 If the device had been suspended before the system suspend began and it's
0676 brought back to full power during resume, then its runtime PM status will have
0677 to be updated to reflect the actual post-system sleep status. The way to do
0678 this is:
0679
0680 - pm_runtime_disable(dev);
0681 - pm_runtime_set_active(dev);
0682 - pm_runtime_enable(dev);
0683
0684 The PM core always increments the runtime usage counter before calling the
0685 ->suspend() callback and decrements it after calling the ->resume() callback.
0686 Hence disabling runtime PM temporarily like this will not cause any runtime
0687 suspend attempts to be permanently lost. If the usage count goes to zero
0688 following the return of the ->resume() callback, the ->runtime_idle() callback
0689 will be invoked as usual.
0690
0691 On some systems, however, system sleep is not entered through a global firmware
0692 or hardware operation. Instead, all hardware components are put into low-power
0693 states directly by the kernel in a coordinated way. Then, the system sleep
0694 state effectively follows from the states the hardware components end up in
0695 and the system is woken up from that state by a hardware interrupt or a similar
0696 mechanism entirely under the kernel's control. As a result, the kernel never
0697 gives control away and the states of all devices during resume are precisely
0698 known to it. If that is the case and none of the situations listed above takes
0699 place (in particular, if the system is not waking up from hibernation), it may
0700 be more efficient to leave the devices that had been suspended before the system
0701 suspend began in the suspended state.
0702
0703 To this end, the PM core provides a mechanism allowing some coordination between
0704 different levels of device hierarchy. Namely, if a system suspend .prepare()
0705 callback returns a positive number for a device, that indicates to the PM core
0706 that the device appears to be runtime-suspended and its state is fine, so it
0707 may be left in runtime suspend provided that all of its descendants are also
0708 left in runtime suspend. If that happens, the PM core will not execute any
0709 system suspend and resume callbacks for all of those devices, except for the
0710 .complete() callback, which is then entirely responsible for handling the device
0711 as appropriate. This only applies to system suspend transitions that are not
0712 related to hibernation (see Documentation/driver-api/pm/devices.rst for more
0713 information).
0714
0715 The PM core does its best to reduce the probability of race conditions between
0716 the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
0717 out the following operations:
0718
0719 * During system suspend pm_runtime_get_noresume() is called for every device
0720 right before executing the subsystem-level .prepare() callback for it and
0721 pm_runtime_barrier() is called for every device right before executing the
0722 subsystem-level .suspend() callback for it. In addition to that the PM core
0723 calls __pm_runtime_disable() with 'false' as the second argument for every
0724 device right before executing the subsystem-level .suspend_late() callback
0725 for it.
0726
0727 * During system resume pm_runtime_enable() and pm_runtime_put() are called for
0728 every device right after executing the subsystem-level .resume_early()
0729 callback and right after executing the subsystem-level .complete() callback
0730 for it, respectively.
0731
0732 7. Generic subsystem callbacks
0733
0734 Subsystems may wish to conserve code space by using the set of generic power
0735 management callbacks provided by the PM core, defined in
0736 driver/base/power/generic_ops.c:
0737
0738 `int pm_generic_runtime_suspend(struct device *dev);`
0739 - invoke the ->runtime_suspend() callback provided by the driver of this
0740 device and return its result, or return 0 if not defined
0741
0742 `int pm_generic_runtime_resume(struct device *dev);`
0743 - invoke the ->runtime_resume() callback provided by the driver of this
0744 device and return its result, or return 0 if not defined
0745
0746 `int pm_generic_suspend(struct device *dev);`
0747 - if the device has not been suspended at run time, invoke the ->suspend()
0748 callback provided by its driver and return its result, or return 0 if not
0749 defined
0750
0751 `int pm_generic_suspend_noirq(struct device *dev);`
0752 - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
0753 callback provided by the device's driver and return its result, or return
0754 0 if not defined
0755
0756 `int pm_generic_resume(struct device *dev);`
0757 - invoke the ->resume() callback provided by the driver of this device and,
0758 if successful, change the device's runtime PM status to 'active'
0759
0760 `int pm_generic_resume_noirq(struct device *dev);`
0761 - invoke the ->resume_noirq() callback provided by the driver of this device
0762
0763 `int pm_generic_freeze(struct device *dev);`
0764 - if the device has not been suspended at run time, invoke the ->freeze()
0765 callback provided by its driver and return its result, or return 0 if not
0766 defined
0767
0768 `int pm_generic_freeze_noirq(struct device *dev);`
0769 - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
0770 callback provided by the device's driver and return its result, or return
0771 0 if not defined
0772
0773 `int pm_generic_thaw(struct device *dev);`
0774 - if the device has not been suspended at run time, invoke the ->thaw()
0775 callback provided by its driver and return its result, or return 0 if not
0776 defined
0777
0778 `int pm_generic_thaw_noirq(struct device *dev);`
0779 - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
0780 callback provided by the device's driver and return its result, or return
0781 0 if not defined
0782
0783 `int pm_generic_poweroff(struct device *dev);`
0784 - if the device has not been suspended at run time, invoke the ->poweroff()
0785 callback provided by its driver and return its result, or return 0 if not
0786 defined
0787
0788 `int pm_generic_poweroff_noirq(struct device *dev);`
0789 - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
0790 callback provided by the device's driver and return its result, or return
0791 0 if not defined
0792
0793 `int pm_generic_restore(struct device *dev);`
0794 - invoke the ->restore() callback provided by the driver of this device and,
0795 if successful, change the device's runtime PM status to 'active'
0796
0797 `int pm_generic_restore_noirq(struct device *dev);`
0798 - invoke the ->restore_noirq() callback provided by the device's driver
0799
0800 These functions are the defaults used by the PM core if a subsystem doesn't
0801 provide its own callbacks for ->runtime_idle(), ->runtime_suspend(),
0802 ->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(),
0803 ->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(),
0804 ->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() in the
0805 subsystem-level dev_pm_ops structure.
0806
0807 Device drivers that wish to use the same function as a system suspend, freeze,
0808 poweroff and runtime suspend callback, and similarly for system resume, thaw,
0809 restore, and runtime resume, can achieve this with the help of the
0810 UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
0811 last argument to NULL).
0812
0813 8. "No-Callback" Devices
0814 ========================
0815
0816 Some "devices" are only logical sub-devices of their parent and cannot be
0817 power-managed on their own. (The prototype example is a USB interface. Entire
0818 USB devices can go into low-power mode or send wake-up requests, but neither is
0819 possible for individual interfaces.) The drivers for these devices have no
0820 need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend()
0821 and ->runtime_resume() would always return 0 without doing anything else and
0822 ->runtime_idle() would always call pm_runtime_suspend().
0823
0824 Subsystems can tell the PM core about these devices by calling
0825 pm_runtime_no_callbacks(). This should be done after the device structure is
0826 initialized and before it is registered (although after device registration is
0827 also okay). The routine will set the device's power.no_callbacks flag and
0828 prevent the non-debugging runtime PM sysfs attributes from being created.
0829
0830 When power.no_callbacks is set, the PM core will not invoke the
0831 ->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
0832 Instead it will assume that suspends and resumes always succeed and that idle
0833 devices should be suspended.
0834
0835 As a consequence, the PM core will never directly inform the device's subsystem
0836 or driver about runtime power changes. Instead, the driver for the device's
0837 parent must take responsibility for telling the device's driver when the
0838 parent's power state changes.
0839
0840 Note that, in some cases it may not be desirable for subsystems/drivers to call
0841 pm_runtime_no_callbacks() for their devices. This could be because a subset of
0842 the runtime PM callbacks needs to be implemented, a platform dependent PM
0843 domain could get attached to the device or that the device is power managed
0844 through a supplier device link. For these reasons and to avoid boilerplate code
0845 in subsystems/drivers, the PM core allows runtime PM callbacks to be
0846 unassigned. More precisely, if a callback pointer is NULL, the PM core will act
0847 as though there was a callback and it returned 0.
0848
0849 9. Autosuspend, or automatically-delayed suspends
0850 =================================================
0851
0852 Changing a device's power state isn't free; it requires both time and energy.
0853 A device should be put in a low-power state only when there's some reason to
0854 think it will remain in that state for a substantial time. A common heuristic
0855 says that a device which hasn't been used for a while is liable to remain
0856 unused; following this advice, drivers should not allow devices to be suspended
0857 at runtime until they have been inactive for some minimum period. Even when
0858 the heuristic ends up being non-optimal, it will still prevent devices from
0859 "bouncing" too rapidly between low-power and full-power states.
0860
0861 The term "autosuspend" is an historical remnant. It doesn't mean that the
0862 device is automatically suspended (the subsystem or driver still has to call
0863 the appropriate PM routines); rather it means that runtime suspends will
0864 automatically be delayed until the desired period of inactivity has elapsed.
0865
0866 Inactivity is determined based on the power.last_busy field. Drivers should
0867 call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
0868 typically just before calling pm_runtime_put_autosuspend(). The desired length
0869 of the inactivity period is a matter of policy. Subsystems can set this length
0870 initially by calling pm_runtime_set_autosuspend_delay(), but after device
0871 registration the length should be controlled by user space, using the
0872 /sys/devices/.../power/autosuspend_delay_ms attribute.
0873
0874 In order to use autosuspend, subsystems or drivers must call
0875 pm_runtime_use_autosuspend() (preferably before registering the device), and
0876 thereafter they should use the various `*_autosuspend()` helper functions
0877 instead of the non-autosuspend counterparts::
0878
0879 Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
0880 Instead of: pm_schedule_suspend use: pm_request_autosuspend;
0881 Instead of: pm_runtime_put use: pm_runtime_put_autosuspend;
0882 Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend.
0883
0884 Drivers may also continue to use the non-autosuspend helper functions; they
0885 will behave normally, which means sometimes taking the autosuspend delay into
0886 account (see pm_runtime_idle).
0887
0888 Under some circumstances a driver or subsystem may want to prevent a device
0889 from autosuspending immediately, even though the usage counter is zero and the
0890 autosuspend delay time has expired. If the ->runtime_suspend() callback
0891 returns -EAGAIN or -EBUSY, and if the next autosuspend delay expiration time is
0892 in the future (as it normally would be if the callback invoked
0893 pm_runtime_mark_last_busy()), the PM core will automatically reschedule the
0894 autosuspend. The ->runtime_suspend() callback can't do this rescheduling
0895 itself because no suspend requests of any kind are accepted while the device is
0896 suspending (i.e., while the callback is running).
0897
0898 The implementation is well suited for asynchronous use in interrupt contexts.
0899 However such use inevitably involves races, because the PM core can't
0900 synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
0901 This synchronization must be handled by the driver, using its private lock.
0902 Here is a schematic pseudo-code example::
0903
0904 foo_read_or_write(struct foo_priv *foo, void *data)
0905 {
0906 lock(&foo->private_lock);
0907 add_request_to_io_queue(foo, data);
0908 if (foo->num_pending_requests++ == 0)
0909 pm_runtime_get(&foo->dev);
0910 if (!foo->is_suspended)
0911 foo_process_next_request(foo);
0912 unlock(&foo->private_lock);
0913 }
0914
0915 foo_io_completion(struct foo_priv *foo, void *req)
0916 {
0917 lock(&foo->private_lock);
0918 if (--foo->num_pending_requests == 0) {
0919 pm_runtime_mark_last_busy(&foo->dev);
0920 pm_runtime_put_autosuspend(&foo->dev);
0921 } else {
0922 foo_process_next_request(foo);
0923 }
0924 unlock(&foo->private_lock);
0925 /* Send req result back to the user ... */
0926 }
0927
0928 int foo_runtime_suspend(struct device *dev)
0929 {
0930 struct foo_priv foo = container_of(dev, ...);
0931 int ret = 0;
0932
0933 lock(&foo->private_lock);
0934 if (foo->num_pending_requests > 0) {
0935 ret = -EBUSY;
0936 } else {
0937 /* ... suspend the device ... */
0938 foo->is_suspended = 1;
0939 }
0940 unlock(&foo->private_lock);
0941 return ret;
0942 }
0943
0944 int foo_runtime_resume(struct device *dev)
0945 {
0946 struct foo_priv foo = container_of(dev, ...);
0947
0948 lock(&foo->private_lock);
0949 /* ... resume the device ... */
0950 foo->is_suspended = 0;
0951 pm_runtime_mark_last_busy(&foo->dev);
0952 if (foo->num_pending_requests > 0)
0953 foo_process_next_request(foo);
0954 unlock(&foo->private_lock);
0955 return 0;
0956 }
0957
0958 The important point is that after foo_io_completion() asks for an autosuspend,
0959 the foo_runtime_suspend() callback may race with foo_read_or_write().
0960 Therefore foo_runtime_suspend() has to check whether there are any pending I/O
0961 requests (while holding the private lock) before allowing the suspend to
0962 proceed.
0963
0964 In addition, the power.autosuspend_delay field can be changed by user space at
0965 any time. If a driver cares about this, it can call
0966 pm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
0967 callback while holding its private lock. If the function returns a nonzero
0968 value then the delay has not yet expired and the callback should return
0969 -EAGAIN.