Back to home page

OSCL-LXR

 
 

    


0001 =========================
0002 CPU hotplug in the Kernel
0003 =========================
0004 
0005 :Date: September, 2021
0006 :Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
0007          Rusty Russell <rusty@rustcorp.com.au>,
0008          Srivatsa Vaddagiri <vatsa@in.ibm.com>,
0009          Ashok Raj <ashok.raj@intel.com>,
0010          Joel Schopp <jschopp@austin.ibm.com>,
0011          Thomas Gleixner <tglx@linutronix.de>
0012 
0013 Introduction
0014 ============
0015 
0016 Modern advances in system architectures have introduced advanced error
0017 reporting and correction capabilities in processors. There are couple OEMS that
0018 support NUMA hardware which are hot pluggable as well, where physical node
0019 insertion and removal require support for CPU hotplug.
0020 
0021 Such advances require CPUs available to a kernel to be removed either for
0022 provisioning reasons, or for RAS purposes to keep an offending CPU off
0023 system execution path. Hence the need for CPU hotplug support in the
0024 Linux kernel.
0025 
0026 A more novel use of CPU-hotplug support is its use today in suspend resume
0027 support for SMP. Dual-core and HT support makes even a laptop run SMP kernels
0028 which didn't support these methods.
0029 
0030 
0031 Command Line Switches
0032 =====================
0033 ``maxcpus=n``
0034   Restrict boot time CPUs to *n*. Say if you have four CPUs, using
0035   ``maxcpus=2`` will only boot two. You can choose to bring the
0036   other CPUs later online.
0037 
0038 ``nr_cpus=n``
0039   Restrict the total amount of CPUs the kernel will support. If the number
0040   supplied here is lower than the number of physically available CPUs, then
0041   those CPUs can not be brought online later.
0042 
0043 ``additional_cpus=n``
0044   Use this to limit hotpluggable CPUs. This option sets
0045   ``cpu_possible_mask = cpu_present_mask + additional_cpus``
0046 
0047   This option is limited to the IA64 architecture.
0048 
0049 ``possible_cpus=n``
0050   This option sets ``possible_cpus`` bits in ``cpu_possible_mask``.
0051 
0052   This option is limited to the X86 and S390 architecture.
0053 
0054 ``cpu0_hotplug``
0055   Allow to shutdown CPU0.
0056 
0057   This option is limited to the X86 architecture.
0058 
0059 CPU maps
0060 ========
0061 
0062 ``cpu_possible_mask``
0063   Bitmap of possible CPUs that can ever be available in the
0064   system. This is used to allocate some boot time memory for per_cpu variables
0065   that aren't designed to grow/shrink as CPUs are made available or removed.
0066   Once set during boot time discovery phase, the map is static, i.e no bits
0067   are added or removed anytime. Trimming it accurately for your system needs
0068   upfront can save some boot time memory.
0069 
0070 ``cpu_online_mask``
0071   Bitmap of all CPUs currently online. Its set in ``__cpu_up()``
0072   after a CPU is available for kernel scheduling and ready to receive
0073   interrupts from devices. Its cleared when a CPU is brought down using
0074   ``__cpu_disable()``, before which all OS services including interrupts are
0075   migrated to another target CPU.
0076 
0077 ``cpu_present_mask``
0078   Bitmap of CPUs currently present in the system. Not all
0079   of them may be online. When physical hotplug is processed by the relevant
0080   subsystem (e.g ACPI) can change and new bit either be added or removed
0081   from the map depending on the event is hot-add/hot-remove. There are currently
0082   no locking rules as of now. Typical usage is to init topology during boot,
0083   at which time hotplug is disabled.
0084 
0085 You really don't need to manipulate any of the system CPU maps. They should
0086 be read-only for most use. When setting up per-cpu resources almost always use
0087 ``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro
0088 ``for_each_cpu()`` can be used to iterate over a custom CPU mask.
0089 
0090 Never use anything other than ``cpumask_t`` to represent bitmap of CPUs.
0091 
0092 
0093 Using CPU hotplug
0094 =================
0095 
0096 The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently
0097 available on multiple architectures including ARM, MIPS, PowerPC and X86. The
0098 configuration is done via the sysfs interface::
0099 
0100  $ ls -lh /sys/devices/system/cpu
0101  total 0
0102  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu0
0103  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu1
0104  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu2
0105  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu3
0106  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu4
0107  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu5
0108  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu6
0109  drwxr-xr-x  9 root root    0 Dec 21 16:33 cpu7
0110  drwxr-xr-x  2 root root    0 Dec 21 16:33 hotplug
0111  -r--r--r--  1 root root 4.0K Dec 21 16:33 offline
0112  -r--r--r--  1 root root 4.0K Dec 21 16:33 online
0113  -r--r--r--  1 root root 4.0K Dec 21 16:33 possible
0114  -r--r--r--  1 root root 4.0K Dec 21 16:33 present
0115 
0116 The files *offline*, *online*, *possible*, *present* represent the CPU masks.
0117 Each CPU folder contains an *online* file which controls the logical on (1) and
0118 off (0) state. To logically shutdown CPU4::
0119 
0120  $ echo 0 > /sys/devices/system/cpu/cpu4/online
0121   smpboot: CPU 4 is now offline
0122 
0123 Once the CPU is shutdown, it will be removed from */proc/interrupts*,
0124 */proc/cpuinfo* and should also not be shown visible by the *top* command. To
0125 bring CPU4 back online::
0126 
0127  $ echo 1 > /sys/devices/system/cpu/cpu4/online
0128  smpboot: Booting Node 0 Processor 4 APIC 0x1
0129 
0130 The CPU is usable again. This should work on all CPUs. CPU0 is often special
0131 and excluded from CPU hotplug. On X86 the kernel option
0132 *CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
0133 shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
0134 used. Some known dependencies of CPU0:
0135 
0136 * Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
0137 * PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
0138 
0139 Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
0140 on CPU0.
0141 
0142 The CPU hotplug coordination
0143 ============================
0144 
0145 The offline case
0146 ----------------
0147 
0148 Once a CPU has been logically shutdown the teardown callbacks of registered
0149 hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating
0150 at state ``CPUHP_OFFLINE``. This includes:
0151 
0152 * If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen*
0153   will be set to true.
0154 * All processes are migrated away from this outgoing CPU to new CPUs.
0155   The new CPU is chosen from each process' current cpuset, which may be
0156   a subset of all online CPUs.
0157 * All interrupts targeted to this CPU are migrated to a new CPU
0158 * timers are also migrated to a new CPU
0159 * Once all services are migrated, kernel calls an arch specific routine
0160   ``__cpu_disable()`` to perform arch specific cleanup.
0161 
0162 
0163 The CPU hotplug API
0164 ===================
0165 
0166 CPU hotplug state machine
0167 -------------------------
0168 
0169 CPU hotplug uses a trivial state machine with a linear state space from
0170 CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown
0171 callback.
0172 
0173 When a CPU is onlined, the startup callbacks are invoked sequentially until
0174 the state CPUHP_ONLINE is reached. They can also be invoked when the
0175 callbacks of a state are set up or an instance is added to a multi-instance
0176 state.
0177 
0178 When a CPU is offlined the teardown callbacks are invoked in the reverse
0179 order sequentially until the state CPUHP_OFFLINE is reached. They can also
0180 be invoked when the callbacks of a state are removed or an instance is
0181 removed from a multi-instance state.
0182 
0183 If a usage site requires only a callback in one direction of the hotplug
0184 operations (CPU online or CPU offline) then the other not-required callback
0185 can be set to NULL when the state is set up.
0186 
0187 The state space is divided into three sections:
0188 
0189 * The PREPARE section
0190 
0191   The PREPARE section covers the state space from CPUHP_OFFLINE to
0192   CPUHP_BRINGUP_CPU.
0193 
0194   The startup callbacks in this section are invoked before the CPU is
0195   started during a CPU online operation. The teardown callbacks are invoked
0196   after the CPU has become dysfunctional during a CPU offline operation.
0197 
0198   The callbacks are invoked on a control CPU as they can't obviously run on
0199   the hotplugged CPU which is either not yet started or has become
0200   dysfunctional already.
0201 
0202   The startup callbacks are used to setup resources which are required to
0203   bring a CPU successfully online. The teardown callbacks are used to free
0204   resources or to move pending work to an online CPU after the hotplugged
0205   CPU became dysfunctional.
0206 
0207   The startup callbacks are allowed to fail. If a callback fails, the CPU
0208   online operation is aborted and the CPU is brought down to the previous
0209   state (usually CPUHP_OFFLINE) again.
0210 
0211   The teardown callbacks in this section are not allowed to fail.
0212 
0213 * The STARTING section
0214 
0215   The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1
0216   and CPUHP_AP_ONLINE.
0217 
0218   The startup callbacks in this section are invoked on the hotplugged CPU
0219   with interrupts disabled during a CPU online operation in the early CPU
0220   setup code. The teardown callbacks are invoked with interrupts disabled
0221   on the hotplugged CPU during a CPU offline operation shortly before the
0222   CPU is completely shut down.
0223 
0224   The callbacks in this section are not allowed to fail.
0225 
0226   The callbacks are used for low level hardware initialization/shutdown and
0227   for core subsystems.
0228 
0229 * The ONLINE section
0230 
0231   The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and
0232   CPUHP_ONLINE.
0233 
0234   The startup callbacks in this section are invoked on the hotplugged CPU
0235   during a CPU online operation. The teardown callbacks are invoked on the
0236   hotplugged CPU during a CPU offline operation.
0237 
0238   The callbacks are invoked in the context of the per CPU hotplug thread,
0239   which is pinned on the hotplugged CPU. The callbacks are invoked with
0240   interrupts and preemption enabled.
0241 
0242   The callbacks are allowed to fail. When a callback fails the hotplug
0243   operation is aborted and the CPU is brought back to the previous state.
0244 
0245 CPU online/offline operations
0246 -----------------------------
0247 
0248 A successful online operation looks like this::
0249 
0250   [CPUHP_OFFLINE]
0251   [CPUHP_OFFLINE + 1]->startup()       -> success
0252   [CPUHP_OFFLINE + 2]->startup()       -> success
0253   [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
0254   ...
0255   [CPUHP_BRINGUP_CPU]->startup()       -> success
0256   === End of PREPARE section
0257   [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
0258   ...
0259   [CPUHP_AP_ONLINE]->startup()         -> success
0260   === End of STARTUP section
0261   [CPUHP_AP_ONLINE + 1]->startup()     -> success
0262   ...
0263   [CPUHP_ONLINE - 1]->startup()        -> success
0264   [CPUHP_ONLINE]
0265 
0266 A successful offline operation looks like this::
0267 
0268   [CPUHP_ONLINE]
0269   [CPUHP_ONLINE - 1]->teardown()       -> success
0270   ...
0271   [CPUHP_AP_ONLINE + 1]->teardown()    -> success
0272   === Start of STARTUP section
0273   [CPUHP_AP_ONLINE]->teardown()        -> success
0274   ...
0275   [CPUHP_BRINGUP_ONLINE - 1]->teardown()
0276   ...
0277   === Start of PREPARE section
0278   [CPUHP_BRINGUP_CPU]->teardown()
0279   [CPUHP_OFFLINE + 3]->teardown()
0280   [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
0281   [CPUHP_OFFLINE + 1]->teardown()
0282   [CPUHP_OFFLINE]
0283 
0284 A failed online operation looks like this::
0285 
0286   [CPUHP_OFFLINE]
0287   [CPUHP_OFFLINE + 1]->startup()       -> success
0288   [CPUHP_OFFLINE + 2]->startup()       -> success
0289   [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
0290   ...
0291   [CPUHP_BRINGUP_CPU]->startup()       -> success
0292   === End of PREPARE section
0293   [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
0294   ...
0295   [CPUHP_AP_ONLINE]->startup()         -> success
0296   === End of STARTUP section
0297   [CPUHP_AP_ONLINE + 1]->startup()     -> success
0298   ---
0299   [CPUHP_AP_ONLINE + N]->startup()     -> fail
0300   [CPUHP_AP_ONLINE + (N - 1)]->teardown()
0301   ...
0302   [CPUHP_AP_ONLINE + 1]->teardown()
0303   === Start of STARTUP section
0304   [CPUHP_AP_ONLINE]->teardown()
0305   ...
0306   [CPUHP_BRINGUP_ONLINE - 1]->teardown()
0307   ...
0308   === Start of PREPARE section
0309   [CPUHP_BRINGUP_CPU]->teardown()
0310   [CPUHP_OFFLINE + 3]->teardown()
0311   [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
0312   [CPUHP_OFFLINE + 1]->teardown()
0313   [CPUHP_OFFLINE]
0314 
0315 A failed offline operation looks like this::
0316 
0317   [CPUHP_ONLINE]
0318   [CPUHP_ONLINE - 1]->teardown()       -> success
0319   ...
0320   [CPUHP_ONLINE - N]->teardown()       -> fail
0321   [CPUHP_ONLINE - (N - 1)]->startup()
0322   ...
0323   [CPUHP_ONLINE - 1]->startup()
0324   [CPUHP_ONLINE]
0325 
0326 Recursive failures cannot be handled sensibly. Look at the following
0327 example of a recursive fail due to a failed offline operation: ::
0328 
0329   [CPUHP_ONLINE]
0330   [CPUHP_ONLINE - 1]->teardown()       -> success
0331   ...
0332   [CPUHP_ONLINE - N]->teardown()       -> fail
0333   [CPUHP_ONLINE - (N - 1)]->startup()  -> success
0334   [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
0335 
0336 The CPU hotplug state machine stops right here and does not try to go back
0337 down again because that would likely result in an endless loop::
0338 
0339   [CPUHP_ONLINE - (N - 1)]->teardown() -> success
0340   [CPUHP_ONLINE - N]->teardown()       -> fail
0341   [CPUHP_ONLINE - (N - 1)]->startup()  -> success
0342   [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
0343   [CPUHP_ONLINE - (N - 1)]->teardown() -> success
0344   [CPUHP_ONLINE - N]->teardown()       -> fail
0345 
0346 Lather, rinse and repeat. In this case the CPU left in state::
0347 
0348   [CPUHP_ONLINE - (N - 1)]
0349 
0350 which at least lets the system make progress and gives the user a chance to
0351 debug or even resolve the situation.
0352 
0353 Allocating a state
0354 ------------------
0355 
0356 There are two ways to allocate a CPU hotplug state:
0357 
0358 * Static allocation
0359 
0360   Static allocation has to be used when the subsystem or driver has
0361   ordering requirements versus other CPU hotplug states. E.g. the PERF core
0362   startup callback has to be invoked before the PERF driver startup
0363   callbacks during a CPU online operation. During a CPU offline operation
0364   the driver teardown callbacks have to be invoked before the core teardown
0365   callback. The statically allocated states are described by constants in
0366   the cpuhp_state enum which can be found in include/linux/cpuhotplug.h.
0367 
0368   Insert the state into the enum at the proper place so the ordering
0369   requirements are fulfilled. The state constant has to be used for state
0370   setup and removal.
0371 
0372   Static allocation is also required when the state callbacks are not set
0373   up at runtime and are part of the initializer of the CPU hotplug state
0374   array in kernel/cpu.c.
0375 
0376 * Dynamic allocation
0377 
0378   When there are no ordering requirements for the state callbacks then
0379   dynamic allocation is the preferred method. The state number is allocated
0380   by the setup function and returned to the caller on success.
0381 
0382   Only the PREPARE and ONLINE sections provide a dynamic allocation
0383   range. The STARTING section does not as most of the callbacks in that
0384   section have explicit ordering requirements.
0385 
0386 Setup of a CPU hotplug state
0387 ----------------------------
0388 
0389 The core code provides the following functions to setup a state:
0390 
0391 * cpuhp_setup_state(state, name, startup, teardown)
0392 * cpuhp_setup_state_nocalls(state, name, startup, teardown)
0393 * cpuhp_setup_state_cpuslocked(state, name, startup, teardown)
0394 * cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown)
0395 
0396 For cases where a driver or a subsystem has multiple instances and the same
0397 CPU hotplug state callbacks need to be invoked for each instance, the CPU
0398 hotplug core provides multi-instance support. The advantage over driver
0399 specific instance lists is that the instance related functions are fully
0400 serialized against CPU hotplug operations and provide the automatic
0401 invocations of the state callbacks on add and removal. To set up such a
0402 multi-instance state the following function is available:
0403 
0404 * cpuhp_setup_state_multi(state, name, startup, teardown)
0405 
0406 The @state argument is either a statically allocated state or one of the
0407 constants for dynamically allocated states - CPUHP_PREPARE_DYN,
0408 CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for
0409 which a dynamic state should be allocated.
0410 
0411 The @name argument is used for sysfs output and for instrumentation. The
0412 naming convention is "subsys:mode" or "subsys/driver:mode",
0413 e.g. "perf:mode" or "perf/x86:mode". The common mode names are:
0414 
0415 ======== =======================================================
0416 prepare  For states in the PREPARE section
0417 
0418 dead     For states in the PREPARE section which do not provide
0419          a startup callback
0420 
0421 starting For states in the STARTING section
0422 
0423 dying    For states in the STARTING section which do not provide
0424          a startup callback
0425 
0426 online   For states in the ONLINE section
0427 
0428 offline  For states in the ONLINE section which do not provide
0429          a startup callback
0430 ======== =======================================================
0431 
0432 As the @name argument is only used for sysfs and instrumentation other mode
0433 descriptors can be used as well if they describe the nature of the state
0434 better than the common ones.
0435 
0436 Examples for @name arguments: "perf/online", "perf/x86:prepare",
0437 "RCU/tree:dying", "sched/waitempty"
0438 
0439 The @startup argument is a function pointer to the callback which should be
0440 invoked during a CPU online operation. If the usage site does not require a
0441 startup callback set the pointer to NULL.
0442 
0443 The @teardown argument is a function pointer to the callback which should
0444 be invoked during a CPU offline operation. If the usage site does not
0445 require a teardown callback set the pointer to NULL.
0446 
0447 The functions differ in the way how the installed callbacks are treated:
0448 
0449   * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked()
0450     and cpuhp_setup_state_multi() only install the callbacks
0451 
0452   * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the
0453     callbacks and invoke the @startup callback (if not NULL) for all online
0454     CPUs which have currently a state greater than the newly installed
0455     state. Depending on the state section the callback is either invoked on
0456     the current CPU (PREPARE section) or on each online CPU (ONLINE
0457     section) in the context of the CPU's hotplug thread.
0458 
0459     If a callback fails for CPU N then the teardown callback for CPU
0460     0 .. N-1 is invoked to rollback the operation. The state setup fails,
0461     the callbacks for the state are not installed and in case of dynamic
0462     allocation the allocated state is freed.
0463 
0464 The state setup and the callback invocations are serialized against CPU
0465 hotplug operations. If the setup function has to be called from a CPU
0466 hotplug read locked region, then the _cpuslocked() variants have to be
0467 used. These functions cannot be used from within CPU hotplug callbacks.
0468 
0469 The function return values:
0470   ======== ===================================================================
0471   0        Statically allocated state was successfully set up
0472 
0473   >0       Dynamically allocated state was successfully set up.
0474 
0475            The returned number is the state number which was allocated. If
0476            the state callbacks have to be removed later, e.g. module
0477            removal, then this number has to be saved by the caller and used
0478            as @state argument for the state remove function. For
0479            multi-instance states the dynamically allocated state number is
0480            also required as @state argument for the instance add/remove
0481            operations.
0482 
0483   <0       Operation failed
0484   ======== ===================================================================
0485 
0486 Removal of a CPU hotplug state
0487 ------------------------------
0488 
0489 To remove a previously set up state, the following functions are provided:
0490 
0491 * cpuhp_remove_state(state)
0492 * cpuhp_remove_state_nocalls(state)
0493 * cpuhp_remove_state_nocalls_cpuslocked(state)
0494 * cpuhp_remove_multi_state(state)
0495 
0496 The @state argument is either a statically allocated state or the state
0497 number which was allocated in the dynamic range by cpuhp_setup_state*(). If
0498 the state is in the dynamic range, then the state number is freed and
0499 available for dynamic allocation again.
0500 
0501 The functions differ in the way how the installed callbacks are treated:
0502 
0503   * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked()
0504     and cpuhp_remove_multi_state() only remove the callbacks.
0505 
0506   * cpuhp_remove_state() removes the callbacks and invokes the teardown
0507     callback (if not NULL) for all online CPUs which have currently a state
0508     greater than the removed state. Depending on the state section the
0509     callback is either invoked on the current CPU (PREPARE section) or on
0510     each online CPU (ONLINE section) in the context of the CPU's hotplug
0511     thread.
0512 
0513     In order to complete the removal, the teardown callback should not fail.
0514 
0515 The state removal and the callback invocations are serialized against CPU
0516 hotplug operations. If the remove function has to be called from a CPU
0517 hotplug read locked region, then the _cpuslocked() variants have to be
0518 used. These functions cannot be used from within CPU hotplug callbacks.
0519 
0520 If a multi-instance state is removed then the caller has to remove all
0521 instances first.
0522 
0523 Multi-Instance state instance management
0524 ----------------------------------------
0525 
0526 Once the multi-instance state is set up, instances can be added to the
0527 state:
0528 
0529   * cpuhp_state_add_instance(state, node)
0530   * cpuhp_state_add_instance_nocalls(state, node)
0531 
0532 The @state argument is either a statically allocated state or the state
0533 number which was allocated in the dynamic range by cpuhp_setup_state_multi().
0534 
0535 The @node argument is a pointer to an hlist_node which is embedded in the
0536 instance's data structure. The pointer is handed to the multi-instance
0537 state callbacks and can be used by the callback to retrieve the instance
0538 via container_of().
0539 
0540 The functions differ in the way how the installed callbacks are treated:
0541 
0542   * cpuhp_state_add_instance_nocalls() and only adds the instance to the
0543     multi-instance state's node list.
0544 
0545   * cpuhp_state_add_instance() adds the instance and invokes the startup
0546     callback (if not NULL) associated with @state for all online CPUs which
0547     have currently a state greater than @state. The callback is only
0548     invoked for the to be added instance. Depending on the state section
0549     the callback is either invoked on the current CPU (PREPARE section) or
0550     on each online CPU (ONLINE section) in the context of the CPU's hotplug
0551     thread.
0552 
0553     If a callback fails for CPU N then the teardown callback for CPU
0554     0 .. N-1 is invoked to rollback the operation, the function fails and
0555     the instance is not added to the node list of the multi-instance state.
0556 
0557 To remove an instance from the state's node list these functions are
0558 available:
0559 
0560   * cpuhp_state_remove_instance(state, node)
0561   * cpuhp_state_remove_instance_nocalls(state, node)
0562 
0563 The arguments are the same as for the the cpuhp_state_add_instance*()
0564 variants above.
0565 
0566 The functions differ in the way how the installed callbacks are treated:
0567 
0568   * cpuhp_state_remove_instance_nocalls() only removes the instance from the
0569     state's node list.
0570 
0571   * cpuhp_state_remove_instance() removes the instance and invokes the
0572     teardown callback (if not NULL) associated with @state for all online
0573     CPUs which have currently a state greater than @state.  The callback is
0574     only invoked for the to be removed instance.  Depending on the state
0575     section the callback is either invoked on the current CPU (PREPARE
0576     section) or on each online CPU (ONLINE section) in the context of the
0577     CPU's hotplug thread.
0578 
0579     In order to complete the removal, the teardown callback should not fail.
0580 
0581 The node list add/remove operations and the callback invocations are
0582 serialized against CPU hotplug operations. These functions cannot be used
0583 from within CPU hotplug callbacks and CPU hotplug read locked regions.
0584 
0585 Examples
0586 --------
0587 
0588 Setup and teardown a statically allocated state in the STARTING section for
0589 notifications on online and offline operations::
0590 
0591    ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying);
0592    if (ret < 0)
0593         return ret;
0594    ....
0595    cpuhp_remove_state(CPUHP_SUBSYS_STARTING);
0596 
0597 Setup and teardown a dynamically allocated state in the ONLINE section
0598 for notifications on offline operations::
0599 
0600    state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline);
0601    if (state < 0)
0602        return state;
0603    ....
0604    cpuhp_remove_state(state);
0605 
0606 Setup and teardown a dynamically allocated state in the ONLINE section
0607 for notifications on online operations without invoking the callbacks::
0608 
0609    state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL);
0610    if (state < 0)
0611        return state;
0612    ....
0613    cpuhp_remove_state_nocalls(state);
0614 
0615 Setup, use and teardown a dynamically allocated multi-instance state in the
0616 ONLINE section for notifications on online and offline operation::
0617 
0618    state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline);
0619    if (state < 0)
0620        return state;
0621    ....
0622    ret = cpuhp_state_add_instance(state, &inst1->node);
0623    if (ret)
0624         return ret;
0625    ....
0626    ret = cpuhp_state_add_instance(state, &inst2->node);
0627    if (ret)
0628         return ret;
0629    ....
0630    cpuhp_remove_instance(state, &inst1->node);
0631    ....
0632    cpuhp_remove_instance(state, &inst2->node);
0633    ....
0634    remove_multi_state(state);
0635 
0636 
0637 Testing of hotplug states
0638 =========================
0639 
0640 One way to verify whether a custom state is working as expected or not is to
0641 shutdown a CPU and then put it online again. It is also possible to put the CPU
0642 to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to
0643 *CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE*
0644 which would lead to rollback to the online state.
0645 
0646 All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` ::
0647 
0648  $ tail /sys/devices/system/cpu/hotplug/states
0649  138: mm/vmscan:online
0650  139: mm/vmstat:online
0651  140: lib/percpu_cnt:online
0652  141: acpi/cpu-drv:online
0653  142: base/cacheinfo:online
0654  143: virtio/net:online
0655  144: x86/mce:online
0656  145: printk:online
0657  168: sched:active
0658  169: online
0659 
0660 To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue::
0661 
0662   $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0663   169
0664   $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target
0665   $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0666   140
0667 
0668 It is important to note that the teardown callback of state 140 have been
0669 invoked. And now get back online::
0670 
0671   $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target
0672   $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0673   169
0674 
0675 With trace events enabled, the individual steps are visible, too::
0676 
0677   #  TASK-PID   CPU#    TIMESTAMP  FUNCTION
0678   #     | |       |        |         |
0679       bash-394  [001]  22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work)
0680    cpuhp/4-31   [004]  22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate)
0681    cpuhp/4-31   [004]  22.990: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
0682    cpuhp/4-31   [004]  22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down)
0683    cpuhp/4-31   [004]  22.992: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
0684    cpuhp/4-31   [004]  22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep)
0685    cpuhp/4-31   [004]  22.994: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
0686    cpuhp/4-31   [004]  22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down)
0687    cpuhp/4-31   [004]  22.996: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
0688       bash-394  [001]  22.997: cpuhp_exit:  cpu: 0004  state: 140 step: 169 ret: 0
0689       bash-394  [005]  95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work)
0690    cpuhp/4-31   [004]  95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online)
0691    cpuhp/4-31   [004]  95.542: cpuhp_exit:  cpu: 0004  state: 141 step: 141 ret: 0
0692    cpuhp/4-31   [004]  95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online)
0693    cpuhp/4-31   [004]  95.544: cpuhp_exit:  cpu: 0004  state: 142 step: 142 ret: 0
0694    cpuhp/4-31   [004]  95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online)
0695    cpuhp/4-31   [004]  95.546: cpuhp_exit:  cpu: 0004  state: 143 step: 143 ret: 0
0696    cpuhp/4-31   [004]  95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online)
0697    cpuhp/4-31   [004]  95.548: cpuhp_exit:  cpu: 0004  state: 144 step: 144 ret: 0
0698    cpuhp/4-31   [004]  95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify)
0699    cpuhp/4-31   [004]  95.550: cpuhp_exit:  cpu: 0004  state: 145 step: 145 ret: 0
0700    cpuhp/4-31   [004]  95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate)
0701    cpuhp/4-31   [004]  95.552: cpuhp_exit:  cpu: 0004  state: 168 step: 168 ret: 0
0702       bash-394  [005]  95.553: cpuhp_exit:  cpu: 0004  state: 169 step: 140 ret: 0
0703 
0704 As it an be seen, CPU4 went down until timestamp 22.996 and then back up until
0705 95.552. All invoked callbacks including their return codes are visible in the
0706 trace.
0707 
0708 Architecture's requirements
0709 ===========================
0710 
0711 The following functions and configurations are required:
0712 
0713 ``CONFIG_HOTPLUG_CPU``
0714   This entry needs to be enabled in Kconfig
0715 
0716 ``__cpu_up()``
0717   Arch interface to bring up a CPU
0718 
0719 ``__cpu_disable()``
0720   Arch interface to shutdown a CPU, no more interrupts can be handled by the
0721   kernel after the routine returns. This includes the shutdown of the timer.
0722 
0723 ``__cpu_die()``
0724   This actually supposed to ensure death of the CPU. Actually look at some
0725   example code in other arch that implement CPU hotplug. The processor is taken
0726   down from the ``idle()`` loop for that specific architecture. ``__cpu_die()``
0727   typically waits for some per_cpu state to be set, to ensure the processor dead
0728   routine is called to be sure positively.
0729 
0730 User Space Notification
0731 =======================
0732 
0733 After CPU successfully onlined or offline udev events are sent. A udev rule like::
0734 
0735   SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh"
0736 
0737 will receive all events. A script like::
0738 
0739   #!/bin/sh
0740 
0741   if [ "${ACTION}" = "offline" ]
0742   then
0743       echo "CPU ${DEVPATH##*/} offline"
0744 
0745   elif [ "${ACTION}" = "online" ]
0746   then
0747       echo "CPU ${DEVPATH##*/} online"
0748 
0749   fi
0750 
0751 can process the event further.
0752 
0753 Kernel Inline Documentations Reference
0754 ======================================
0755 
0756 .. kernel-doc:: include/linux/cpuhotplug.h