0001 .. _memory_hotplug:
0002
0003 ==============
0004 Memory hotplug
0005 ==============
0006
0007 Memory hotplug event notifier
0008 =============================
0009
0010 Hotplugging events are sent to a notification queue.
0011
0012 There are six types of notification defined in ``include/linux/memory.h``:
0013
0014 MEM_GOING_ONLINE
0015 Generated before new memory becomes available in order to be able to
0016 prepare subsystems to handle memory. The page allocator is still unable
0017 to allocate from the new memory.
0018
0019 MEM_CANCEL_ONLINE
0020 Generated if MEM_GOING_ONLINE fails.
0021
0022 MEM_ONLINE
0023 Generated when memory has successfully brought online. The callback may
0024 allocate pages from the new memory.
0025
0026 MEM_GOING_OFFLINE
0027 Generated to begin the process of offlining memory. Allocations are no
0028 longer possible from the memory but some of the memory to be offlined
0029 is still in use. The callback can be used to free memory known to a
0030 subsystem from the indicated memory block.
0031
0032 MEM_CANCEL_OFFLINE
0033 Generated if MEM_GOING_OFFLINE fails. Memory is available again from
0034 the memory block that we attempted to offline.
0035
0036 MEM_OFFLINE
0037 Generated after offlining memory is complete.
0038
0039 A callback routine can be registered by calling::
0040
0041 hotplug_memory_notifier(callback_func, priority)
0042
0043 Callback functions with higher values of priority are called before callback
0044 functions with lower values.
0045
0046 A callback function must have the following prototype::
0047
0048 int callback_func(
0049 struct notifier_block *self, unsigned long action, void *arg);
0050
0051 The first argument of the callback function (self) is a pointer to the block
0052 of the notifier chain that points to the callback function itself.
0053 The second argument (action) is one of the event types described above.
0054 The third argument (arg) passes a pointer of struct memory_notify::
0055
0056 struct memory_notify {
0057 unsigned long start_pfn;
0058 unsigned long nr_pages;
0059 int status_change_nid_normal;
0060 int status_change_nid;
0061 }
0062
0063 - start_pfn is start_pfn of online/offline memory.
0064 - nr_pages is # of pages of online/offline memory.
0065 - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
0066 is (will be) set/clear, if this is -1, then nodemask status is not changed.
0067 - status_change_nid is set node id when N_MEMORY of nodemask is (will be)
0068 set/clear. It means a new(memoryless) node gets new memory by online and a
0069 node loses all memory. If this is -1, then nodemask status is not changed.
0070
0071 If status_changed_nid* >= 0, callback should create/discard structures for the
0072 node if necessary.
0073
0074 The callback routine shall return one of the values
0075 NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
0076 defined in ``include/linux/notifier.h``
0077
0078 NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
0079
0080 NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
0081 MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
0082 further processing of the notification queue.
0083
0084 NOTIFY_STOP stops further processing of the notification queue.
0085
0086 Locking Internals
0087 =================
0088
0089 When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
0090 the device_hotplug_lock should be held to:
0091
0092 - synchronize against online/offline requests (e.g. via sysfs). This way, memory
0093 block devices can only be accessed (.online/.state attributes) by user
0094 space once memory has been fully added. And when removing memory, we
0095 know nobody is in critical sections.
0096 - synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
0097
0098 Especially, there is a possible lock inversion that is avoided using
0099 device_hotplug_lock when adding memory and user space tries to online that
0100 memory faster than expected:
0101
0102 - device_online() will first take the device_lock(), followed by
0103 mem_hotplug_lock
0104 - add_memory_resource() will first take the mem_hotplug_lock, followed by
0105 the device_lock() (while creating the devices, during bus_add_device()).
0106
0107 As the device is visible to user space before taking the device_lock(), this
0108 can result in a lock inversion.
0109
0110 onlining/offlining of memory should be done via device_online()/
0111 device_offline() - to make sure it is properly synchronized to actions
0112 via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
0113
0114 When adding/removing/onlining/offlining memory or adding/removing
0115 heterogeneous/device memory, we should always hold the mem_hotplug_lock in
0116 write mode to serialise memory hotplug (e.g. access to global/zone
0117 variables).
0118
0119 In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
0120 mode allows for a quite efficient get_online_mems/put_online_mems
0121 implementation, so code accessing memory can protect from that memory
0122 vanishing.