0001 =========================
0002 CPU hotplug in the Kernel
0003 =========================
0004
0005 :Date: September, 2021
0006 :Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
0007 Rusty Russell <rusty@rustcorp.com.au>,
0008 Srivatsa Vaddagiri <vatsa@in.ibm.com>,
0009 Ashok Raj <ashok.raj@intel.com>,
0010 Joel Schopp <jschopp@austin.ibm.com>,
0011 Thomas Gleixner <tglx@linutronix.de>
0012
0013 Introduction
0014 ============
0015
0016 Modern advances in system architectures have introduced advanced error
0017 reporting and correction capabilities in processors. There are couple OEMS that
0018 support NUMA hardware which are hot pluggable as well, where physical node
0019 insertion and removal require support for CPU hotplug.
0020
0021 Such advances require CPUs available to a kernel to be removed either for
0022 provisioning reasons, or for RAS purposes to keep an offending CPU off
0023 system execution path. Hence the need for CPU hotplug support in the
0024 Linux kernel.
0025
0026 A more novel use of CPU-hotplug support is its use today in suspend resume
0027 support for SMP. Dual-core and HT support makes even a laptop run SMP kernels
0028 which didn't support these methods.
0029
0030
0031 Command Line Switches
0032 =====================
0033 ``maxcpus=n``
0034 Restrict boot time CPUs to *n*. Say if you have four CPUs, using
0035 ``maxcpus=2`` will only boot two. You can choose to bring the
0036 other CPUs later online.
0037
0038 ``nr_cpus=n``
0039 Restrict the total amount of CPUs the kernel will support. If the number
0040 supplied here is lower than the number of physically available CPUs, then
0041 those CPUs can not be brought online later.
0042
0043 ``additional_cpus=n``
0044 Use this to limit hotpluggable CPUs. This option sets
0045 ``cpu_possible_mask = cpu_present_mask + additional_cpus``
0046
0047 This option is limited to the IA64 architecture.
0048
0049 ``possible_cpus=n``
0050 This option sets ``possible_cpus`` bits in ``cpu_possible_mask``.
0051
0052 This option is limited to the X86 and S390 architecture.
0053
0054 ``cpu0_hotplug``
0055 Allow to shutdown CPU0.
0056
0057 This option is limited to the X86 architecture.
0058
0059 CPU maps
0060 ========
0061
0062 ``cpu_possible_mask``
0063 Bitmap of possible CPUs that can ever be available in the
0064 system. This is used to allocate some boot time memory for per_cpu variables
0065 that aren't designed to grow/shrink as CPUs are made available or removed.
0066 Once set during boot time discovery phase, the map is static, i.e no bits
0067 are added or removed anytime. Trimming it accurately for your system needs
0068 upfront can save some boot time memory.
0069
0070 ``cpu_online_mask``
0071 Bitmap of all CPUs currently online. Its set in ``__cpu_up()``
0072 after a CPU is available for kernel scheduling and ready to receive
0073 interrupts from devices. Its cleared when a CPU is brought down using
0074 ``__cpu_disable()``, before which all OS services including interrupts are
0075 migrated to another target CPU.
0076
0077 ``cpu_present_mask``
0078 Bitmap of CPUs currently present in the system. Not all
0079 of them may be online. When physical hotplug is processed by the relevant
0080 subsystem (e.g ACPI) can change and new bit either be added or removed
0081 from the map depending on the event is hot-add/hot-remove. There are currently
0082 no locking rules as of now. Typical usage is to init topology during boot,
0083 at which time hotplug is disabled.
0084
0085 You really don't need to manipulate any of the system CPU maps. They should
0086 be read-only for most use. When setting up per-cpu resources almost always use
0087 ``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro
0088 ``for_each_cpu()`` can be used to iterate over a custom CPU mask.
0089
0090 Never use anything other than ``cpumask_t`` to represent bitmap of CPUs.
0091
0092
0093 Using CPU hotplug
0094 =================
0095
0096 The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently
0097 available on multiple architectures including ARM, MIPS, PowerPC and X86. The
0098 configuration is done via the sysfs interface::
0099
0100 $ ls -lh /sys/devices/system/cpu
0101 total 0
0102 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0
0103 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1
0104 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2
0105 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3
0106 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4
0107 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5
0108 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6
0109 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7
0110 drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug
0111 -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline
0112 -r--r--r-- 1 root root 4.0K Dec 21 16:33 online
0113 -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible
0114 -r--r--r-- 1 root root 4.0K Dec 21 16:33 present
0115
0116 The files *offline*, *online*, *possible*, *present* represent the CPU masks.
0117 Each CPU folder contains an *online* file which controls the logical on (1) and
0118 off (0) state. To logically shutdown CPU4::
0119
0120 $ echo 0 > /sys/devices/system/cpu/cpu4/online
0121 smpboot: CPU 4 is now offline
0122
0123 Once the CPU is shutdown, it will be removed from */proc/interrupts*,
0124 */proc/cpuinfo* and should also not be shown visible by the *top* command. To
0125 bring CPU4 back online::
0126
0127 $ echo 1 > /sys/devices/system/cpu/cpu4/online
0128 smpboot: Booting Node 0 Processor 4 APIC 0x1
0129
0130 The CPU is usable again. This should work on all CPUs. CPU0 is often special
0131 and excluded from CPU hotplug. On X86 the kernel option
0132 *CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
0133 shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
0134 used. Some known dependencies of CPU0:
0135
0136 * Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
0137 * PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
0138
0139 Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
0140 on CPU0.
0141
0142 The CPU hotplug coordination
0143 ============================
0144
0145 The offline case
0146 ----------------
0147
0148 Once a CPU has been logically shutdown the teardown callbacks of registered
0149 hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating
0150 at state ``CPUHP_OFFLINE``. This includes:
0151
0152 * If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen*
0153 will be set to true.
0154 * All processes are migrated away from this outgoing CPU to new CPUs.
0155 The new CPU is chosen from each process' current cpuset, which may be
0156 a subset of all online CPUs.
0157 * All interrupts targeted to this CPU are migrated to a new CPU
0158 * timers are also migrated to a new CPU
0159 * Once all services are migrated, kernel calls an arch specific routine
0160 ``__cpu_disable()`` to perform arch specific cleanup.
0161
0162
0163 The CPU hotplug API
0164 ===================
0165
0166 CPU hotplug state machine
0167 -------------------------
0168
0169 CPU hotplug uses a trivial state machine with a linear state space from
0170 CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown
0171 callback.
0172
0173 When a CPU is onlined, the startup callbacks are invoked sequentially until
0174 the state CPUHP_ONLINE is reached. They can also be invoked when the
0175 callbacks of a state are set up or an instance is added to a multi-instance
0176 state.
0177
0178 When a CPU is offlined the teardown callbacks are invoked in the reverse
0179 order sequentially until the state CPUHP_OFFLINE is reached. They can also
0180 be invoked when the callbacks of a state are removed or an instance is
0181 removed from a multi-instance state.
0182
0183 If a usage site requires only a callback in one direction of the hotplug
0184 operations (CPU online or CPU offline) then the other not-required callback
0185 can be set to NULL when the state is set up.
0186
0187 The state space is divided into three sections:
0188
0189 * The PREPARE section
0190
0191 The PREPARE section covers the state space from CPUHP_OFFLINE to
0192 CPUHP_BRINGUP_CPU.
0193
0194 The startup callbacks in this section are invoked before the CPU is
0195 started during a CPU online operation. The teardown callbacks are invoked
0196 after the CPU has become dysfunctional during a CPU offline operation.
0197
0198 The callbacks are invoked on a control CPU as they can't obviously run on
0199 the hotplugged CPU which is either not yet started or has become
0200 dysfunctional already.
0201
0202 The startup callbacks are used to setup resources which are required to
0203 bring a CPU successfully online. The teardown callbacks are used to free
0204 resources or to move pending work to an online CPU after the hotplugged
0205 CPU became dysfunctional.
0206
0207 The startup callbacks are allowed to fail. If a callback fails, the CPU
0208 online operation is aborted and the CPU is brought down to the previous
0209 state (usually CPUHP_OFFLINE) again.
0210
0211 The teardown callbacks in this section are not allowed to fail.
0212
0213 * The STARTING section
0214
0215 The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1
0216 and CPUHP_AP_ONLINE.
0217
0218 The startup callbacks in this section are invoked on the hotplugged CPU
0219 with interrupts disabled during a CPU online operation in the early CPU
0220 setup code. The teardown callbacks are invoked with interrupts disabled
0221 on the hotplugged CPU during a CPU offline operation shortly before the
0222 CPU is completely shut down.
0223
0224 The callbacks in this section are not allowed to fail.
0225
0226 The callbacks are used for low level hardware initialization/shutdown and
0227 for core subsystems.
0228
0229 * The ONLINE section
0230
0231 The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and
0232 CPUHP_ONLINE.
0233
0234 The startup callbacks in this section are invoked on the hotplugged CPU
0235 during a CPU online operation. The teardown callbacks are invoked on the
0236 hotplugged CPU during a CPU offline operation.
0237
0238 The callbacks are invoked in the context of the per CPU hotplug thread,
0239 which is pinned on the hotplugged CPU. The callbacks are invoked with
0240 interrupts and preemption enabled.
0241
0242 The callbacks are allowed to fail. When a callback fails the hotplug
0243 operation is aborted and the CPU is brought back to the previous state.
0244
0245 CPU online/offline operations
0246 -----------------------------
0247
0248 A successful online operation looks like this::
0249
0250 [CPUHP_OFFLINE]
0251 [CPUHP_OFFLINE + 1]->startup() -> success
0252 [CPUHP_OFFLINE + 2]->startup() -> success
0253 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL
0254 ...
0255 [CPUHP_BRINGUP_CPU]->startup() -> success
0256 === End of PREPARE section
0257 [CPUHP_BRINGUP_CPU + 1]->startup() -> success
0258 ...
0259 [CPUHP_AP_ONLINE]->startup() -> success
0260 === End of STARTUP section
0261 [CPUHP_AP_ONLINE + 1]->startup() -> success
0262 ...
0263 [CPUHP_ONLINE - 1]->startup() -> success
0264 [CPUHP_ONLINE]
0265
0266 A successful offline operation looks like this::
0267
0268 [CPUHP_ONLINE]
0269 [CPUHP_ONLINE - 1]->teardown() -> success
0270 ...
0271 [CPUHP_AP_ONLINE + 1]->teardown() -> success
0272 === Start of STARTUP section
0273 [CPUHP_AP_ONLINE]->teardown() -> success
0274 ...
0275 [CPUHP_BRINGUP_ONLINE - 1]->teardown()
0276 ...
0277 === Start of PREPARE section
0278 [CPUHP_BRINGUP_CPU]->teardown()
0279 [CPUHP_OFFLINE + 3]->teardown()
0280 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL
0281 [CPUHP_OFFLINE + 1]->teardown()
0282 [CPUHP_OFFLINE]
0283
0284 A failed online operation looks like this::
0285
0286 [CPUHP_OFFLINE]
0287 [CPUHP_OFFLINE + 1]->startup() -> success
0288 [CPUHP_OFFLINE + 2]->startup() -> success
0289 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL
0290 ...
0291 [CPUHP_BRINGUP_CPU]->startup() -> success
0292 === End of PREPARE section
0293 [CPUHP_BRINGUP_CPU + 1]->startup() -> success
0294 ...
0295 [CPUHP_AP_ONLINE]->startup() -> success
0296 === End of STARTUP section
0297 [CPUHP_AP_ONLINE + 1]->startup() -> success
0298 ---
0299 [CPUHP_AP_ONLINE + N]->startup() -> fail
0300 [CPUHP_AP_ONLINE + (N - 1)]->teardown()
0301 ...
0302 [CPUHP_AP_ONLINE + 1]->teardown()
0303 === Start of STARTUP section
0304 [CPUHP_AP_ONLINE]->teardown()
0305 ...
0306 [CPUHP_BRINGUP_ONLINE - 1]->teardown()
0307 ...
0308 === Start of PREPARE section
0309 [CPUHP_BRINGUP_CPU]->teardown()
0310 [CPUHP_OFFLINE + 3]->teardown()
0311 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL
0312 [CPUHP_OFFLINE + 1]->teardown()
0313 [CPUHP_OFFLINE]
0314
0315 A failed offline operation looks like this::
0316
0317 [CPUHP_ONLINE]
0318 [CPUHP_ONLINE - 1]->teardown() -> success
0319 ...
0320 [CPUHP_ONLINE - N]->teardown() -> fail
0321 [CPUHP_ONLINE - (N - 1)]->startup()
0322 ...
0323 [CPUHP_ONLINE - 1]->startup()
0324 [CPUHP_ONLINE]
0325
0326 Recursive failures cannot be handled sensibly. Look at the following
0327 example of a recursive fail due to a failed offline operation: ::
0328
0329 [CPUHP_ONLINE]
0330 [CPUHP_ONLINE - 1]->teardown() -> success
0331 ...
0332 [CPUHP_ONLINE - N]->teardown() -> fail
0333 [CPUHP_ONLINE - (N - 1)]->startup() -> success
0334 [CPUHP_ONLINE - (N - 2)]->startup() -> fail
0335
0336 The CPU hotplug state machine stops right here and does not try to go back
0337 down again because that would likely result in an endless loop::
0338
0339 [CPUHP_ONLINE - (N - 1)]->teardown() -> success
0340 [CPUHP_ONLINE - N]->teardown() -> fail
0341 [CPUHP_ONLINE - (N - 1)]->startup() -> success
0342 [CPUHP_ONLINE - (N - 2)]->startup() -> fail
0343 [CPUHP_ONLINE - (N - 1)]->teardown() -> success
0344 [CPUHP_ONLINE - N]->teardown() -> fail
0345
0346 Lather, rinse and repeat. In this case the CPU left in state::
0347
0348 [CPUHP_ONLINE - (N - 1)]
0349
0350 which at least lets the system make progress and gives the user a chance to
0351 debug or even resolve the situation.
0352
0353 Allocating a state
0354 ------------------
0355
0356 There are two ways to allocate a CPU hotplug state:
0357
0358 * Static allocation
0359
0360 Static allocation has to be used when the subsystem or driver has
0361 ordering requirements versus other CPU hotplug states. E.g. the PERF core
0362 startup callback has to be invoked before the PERF driver startup
0363 callbacks during a CPU online operation. During a CPU offline operation
0364 the driver teardown callbacks have to be invoked before the core teardown
0365 callback. The statically allocated states are described by constants in
0366 the cpuhp_state enum which can be found in include/linux/cpuhotplug.h.
0367
0368 Insert the state into the enum at the proper place so the ordering
0369 requirements are fulfilled. The state constant has to be used for state
0370 setup and removal.
0371
0372 Static allocation is also required when the state callbacks are not set
0373 up at runtime and are part of the initializer of the CPU hotplug state
0374 array in kernel/cpu.c.
0375
0376 * Dynamic allocation
0377
0378 When there are no ordering requirements for the state callbacks then
0379 dynamic allocation is the preferred method. The state number is allocated
0380 by the setup function and returned to the caller on success.
0381
0382 Only the PREPARE and ONLINE sections provide a dynamic allocation
0383 range. The STARTING section does not as most of the callbacks in that
0384 section have explicit ordering requirements.
0385
0386 Setup of a CPU hotplug state
0387 ----------------------------
0388
0389 The core code provides the following functions to setup a state:
0390
0391 * cpuhp_setup_state(state, name, startup, teardown)
0392 * cpuhp_setup_state_nocalls(state, name, startup, teardown)
0393 * cpuhp_setup_state_cpuslocked(state, name, startup, teardown)
0394 * cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown)
0395
0396 For cases where a driver or a subsystem has multiple instances and the same
0397 CPU hotplug state callbacks need to be invoked for each instance, the CPU
0398 hotplug core provides multi-instance support. The advantage over driver
0399 specific instance lists is that the instance related functions are fully
0400 serialized against CPU hotplug operations and provide the automatic
0401 invocations of the state callbacks on add and removal. To set up such a
0402 multi-instance state the following function is available:
0403
0404 * cpuhp_setup_state_multi(state, name, startup, teardown)
0405
0406 The @state argument is either a statically allocated state or one of the
0407 constants for dynamically allocated states - CPUHP_PREPARE_DYN,
0408 CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for
0409 which a dynamic state should be allocated.
0410
0411 The @name argument is used for sysfs output and for instrumentation. The
0412 naming convention is "subsys:mode" or "subsys/driver:mode",
0413 e.g. "perf:mode" or "perf/x86:mode". The common mode names are:
0414
0415 ======== =======================================================
0416 prepare For states in the PREPARE section
0417
0418 dead For states in the PREPARE section which do not provide
0419 a startup callback
0420
0421 starting For states in the STARTING section
0422
0423 dying For states in the STARTING section which do not provide
0424 a startup callback
0425
0426 online For states in the ONLINE section
0427
0428 offline For states in the ONLINE section which do not provide
0429 a startup callback
0430 ======== =======================================================
0431
0432 As the @name argument is only used for sysfs and instrumentation other mode
0433 descriptors can be used as well if they describe the nature of the state
0434 better than the common ones.
0435
0436 Examples for @name arguments: "perf/online", "perf/x86:prepare",
0437 "RCU/tree:dying", "sched/waitempty"
0438
0439 The @startup argument is a function pointer to the callback which should be
0440 invoked during a CPU online operation. If the usage site does not require a
0441 startup callback set the pointer to NULL.
0442
0443 The @teardown argument is a function pointer to the callback which should
0444 be invoked during a CPU offline operation. If the usage site does not
0445 require a teardown callback set the pointer to NULL.
0446
0447 The functions differ in the way how the installed callbacks are treated:
0448
0449 * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked()
0450 and cpuhp_setup_state_multi() only install the callbacks
0451
0452 * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the
0453 callbacks and invoke the @startup callback (if not NULL) for all online
0454 CPUs which have currently a state greater than the newly installed
0455 state. Depending on the state section the callback is either invoked on
0456 the current CPU (PREPARE section) or on each online CPU (ONLINE
0457 section) in the context of the CPU's hotplug thread.
0458
0459 If a callback fails for CPU N then the teardown callback for CPU
0460 0 .. N-1 is invoked to rollback the operation. The state setup fails,
0461 the callbacks for the state are not installed and in case of dynamic
0462 allocation the allocated state is freed.
0463
0464 The state setup and the callback invocations are serialized against CPU
0465 hotplug operations. If the setup function has to be called from a CPU
0466 hotplug read locked region, then the _cpuslocked() variants have to be
0467 used. These functions cannot be used from within CPU hotplug callbacks.
0468
0469 The function return values:
0470 ======== ===================================================================
0471 0 Statically allocated state was successfully set up
0472
0473 >0 Dynamically allocated state was successfully set up.
0474
0475 The returned number is the state number which was allocated. If
0476 the state callbacks have to be removed later, e.g. module
0477 removal, then this number has to be saved by the caller and used
0478 as @state argument for the state remove function. For
0479 multi-instance states the dynamically allocated state number is
0480 also required as @state argument for the instance add/remove
0481 operations.
0482
0483 <0 Operation failed
0484 ======== ===================================================================
0485
0486 Removal of a CPU hotplug state
0487 ------------------------------
0488
0489 To remove a previously set up state, the following functions are provided:
0490
0491 * cpuhp_remove_state(state)
0492 * cpuhp_remove_state_nocalls(state)
0493 * cpuhp_remove_state_nocalls_cpuslocked(state)
0494 * cpuhp_remove_multi_state(state)
0495
0496 The @state argument is either a statically allocated state or the state
0497 number which was allocated in the dynamic range by cpuhp_setup_state*(). If
0498 the state is in the dynamic range, then the state number is freed and
0499 available for dynamic allocation again.
0500
0501 The functions differ in the way how the installed callbacks are treated:
0502
0503 * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked()
0504 and cpuhp_remove_multi_state() only remove the callbacks.
0505
0506 * cpuhp_remove_state() removes the callbacks and invokes the teardown
0507 callback (if not NULL) for all online CPUs which have currently a state
0508 greater than the removed state. Depending on the state section the
0509 callback is either invoked on the current CPU (PREPARE section) or on
0510 each online CPU (ONLINE section) in the context of the CPU's hotplug
0511 thread.
0512
0513 In order to complete the removal, the teardown callback should not fail.
0514
0515 The state removal and the callback invocations are serialized against CPU
0516 hotplug operations. If the remove function has to be called from a CPU
0517 hotplug read locked region, then the _cpuslocked() variants have to be
0518 used. These functions cannot be used from within CPU hotplug callbacks.
0519
0520 If a multi-instance state is removed then the caller has to remove all
0521 instances first.
0522
0523 Multi-Instance state instance management
0524 ----------------------------------------
0525
0526 Once the multi-instance state is set up, instances can be added to the
0527 state:
0528
0529 * cpuhp_state_add_instance(state, node)
0530 * cpuhp_state_add_instance_nocalls(state, node)
0531
0532 The @state argument is either a statically allocated state or the state
0533 number which was allocated in the dynamic range by cpuhp_setup_state_multi().
0534
0535 The @node argument is a pointer to an hlist_node which is embedded in the
0536 instance's data structure. The pointer is handed to the multi-instance
0537 state callbacks and can be used by the callback to retrieve the instance
0538 via container_of().
0539
0540 The functions differ in the way how the installed callbacks are treated:
0541
0542 * cpuhp_state_add_instance_nocalls() and only adds the instance to the
0543 multi-instance state's node list.
0544
0545 * cpuhp_state_add_instance() adds the instance and invokes the startup
0546 callback (if not NULL) associated with @state for all online CPUs which
0547 have currently a state greater than @state. The callback is only
0548 invoked for the to be added instance. Depending on the state section
0549 the callback is either invoked on the current CPU (PREPARE section) or
0550 on each online CPU (ONLINE section) in the context of the CPU's hotplug
0551 thread.
0552
0553 If a callback fails for CPU N then the teardown callback for CPU
0554 0 .. N-1 is invoked to rollback the operation, the function fails and
0555 the instance is not added to the node list of the multi-instance state.
0556
0557 To remove an instance from the state's node list these functions are
0558 available:
0559
0560 * cpuhp_state_remove_instance(state, node)
0561 * cpuhp_state_remove_instance_nocalls(state, node)
0562
0563 The arguments are the same as for the the cpuhp_state_add_instance*()
0564 variants above.
0565
0566 The functions differ in the way how the installed callbacks are treated:
0567
0568 * cpuhp_state_remove_instance_nocalls() only removes the instance from the
0569 state's node list.
0570
0571 * cpuhp_state_remove_instance() removes the instance and invokes the
0572 teardown callback (if not NULL) associated with @state for all online
0573 CPUs which have currently a state greater than @state. The callback is
0574 only invoked for the to be removed instance. Depending on the state
0575 section the callback is either invoked on the current CPU (PREPARE
0576 section) or on each online CPU (ONLINE section) in the context of the
0577 CPU's hotplug thread.
0578
0579 In order to complete the removal, the teardown callback should not fail.
0580
0581 The node list add/remove operations and the callback invocations are
0582 serialized against CPU hotplug operations. These functions cannot be used
0583 from within CPU hotplug callbacks and CPU hotplug read locked regions.
0584
0585 Examples
0586 --------
0587
0588 Setup and teardown a statically allocated state in the STARTING section for
0589 notifications on online and offline operations::
0590
0591 ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying);
0592 if (ret < 0)
0593 return ret;
0594 ....
0595 cpuhp_remove_state(CPUHP_SUBSYS_STARTING);
0596
0597 Setup and teardown a dynamically allocated state in the ONLINE section
0598 for notifications on offline operations::
0599
0600 state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline);
0601 if (state < 0)
0602 return state;
0603 ....
0604 cpuhp_remove_state(state);
0605
0606 Setup and teardown a dynamically allocated state in the ONLINE section
0607 for notifications on online operations without invoking the callbacks::
0608
0609 state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL);
0610 if (state < 0)
0611 return state;
0612 ....
0613 cpuhp_remove_state_nocalls(state);
0614
0615 Setup, use and teardown a dynamically allocated multi-instance state in the
0616 ONLINE section for notifications on online and offline operation::
0617
0618 state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline);
0619 if (state < 0)
0620 return state;
0621 ....
0622 ret = cpuhp_state_add_instance(state, &inst1->node);
0623 if (ret)
0624 return ret;
0625 ....
0626 ret = cpuhp_state_add_instance(state, &inst2->node);
0627 if (ret)
0628 return ret;
0629 ....
0630 cpuhp_remove_instance(state, &inst1->node);
0631 ....
0632 cpuhp_remove_instance(state, &inst2->node);
0633 ....
0634 remove_multi_state(state);
0635
0636
0637 Testing of hotplug states
0638 =========================
0639
0640 One way to verify whether a custom state is working as expected or not is to
0641 shutdown a CPU and then put it online again. It is also possible to put the CPU
0642 to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to
0643 *CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE*
0644 which would lead to rollback to the online state.
0645
0646 All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` ::
0647
0648 $ tail /sys/devices/system/cpu/hotplug/states
0649 138: mm/vmscan:online
0650 139: mm/vmstat:online
0651 140: lib/percpu_cnt:online
0652 141: acpi/cpu-drv:online
0653 142: base/cacheinfo:online
0654 143: virtio/net:online
0655 144: x86/mce:online
0656 145: printk:online
0657 168: sched:active
0658 169: online
0659
0660 To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue::
0661
0662 $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0663 169
0664 $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target
0665 $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0666 140
0667
0668 It is important to note that the teardown callback of state 140 have been
0669 invoked. And now get back online::
0670
0671 $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target
0672 $ cat /sys/devices/system/cpu/cpu4/hotplug/state
0673 169
0674
0675 With trace events enabled, the individual steps are visible, too::
0676
0677 # TASK-PID CPU# TIMESTAMP FUNCTION
0678 # | | | | |
0679 bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work)
0680 cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate)
0681 cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0
0682 cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down)
0683 cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0
0684 cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep)
0685 cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0
0686 cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down)
0687 cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0
0688 bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0
0689 bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work)
0690 cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online)
0691 cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0
0692 cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online)
0693 cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0
0694 cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online)
0695 cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0
0696 cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online)
0697 cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0
0698 cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify)
0699 cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0
0700 cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate)
0701 cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0
0702 bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0
0703
0704 As it an be seen, CPU4 went down until timestamp 22.996 and then back up until
0705 95.552. All invoked callbacks including their return codes are visible in the
0706 trace.
0707
0708 Architecture's requirements
0709 ===========================
0710
0711 The following functions and configurations are required:
0712
0713 ``CONFIG_HOTPLUG_CPU``
0714 This entry needs to be enabled in Kconfig
0715
0716 ``__cpu_up()``
0717 Arch interface to bring up a CPU
0718
0719 ``__cpu_disable()``
0720 Arch interface to shutdown a CPU, no more interrupts can be handled by the
0721 kernel after the routine returns. This includes the shutdown of the timer.
0722
0723 ``__cpu_die()``
0724 This actually supposed to ensure death of the CPU. Actually look at some
0725 example code in other arch that implement CPU hotplug. The processor is taken
0726 down from the ``idle()`` loop for that specific architecture. ``__cpu_die()``
0727 typically waits for some per_cpu state to be set, to ensure the processor dead
0728 routine is called to be sure positively.
0729
0730 User Space Notification
0731 =======================
0732
0733 After CPU successfully onlined or offline udev events are sent. A udev rule like::
0734
0735 SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh"
0736
0737 will receive all events. A script like::
0738
0739 #!/bin/sh
0740
0741 if [ "${ACTION}" = "offline" ]
0742 then
0743 echo "CPU ${DEVPATH##*/} offline"
0744
0745 elif [ "${ACTION}" = "online" ]
0746 then
0747 echo "CPU ${DEVPATH##*/} online"
0748
0749 fi
0750
0751 can process the event further.
0752
0753 Kernel Inline Documentations Reference
0754 ======================================
0755
0756 .. kernel-doc:: include/linux/cpuhotplug.h