Back to home page

OSCL-LXR

 
 

    


0001 .. _swap_numa:
0002 
0003 ===========================================
0004 Automatically bind swap device to numa node
0005 ===========================================
0006 
0007 If the system has more than one swap device and swap device has the node
0008 information, we can make use of this information to decide which swap
0009 device to use in get_swap_pages() to get better performance.
0010 
0011 
0012 How to use this feature
0013 =======================
0014 
0015 Swap device has priority and that decides the order of it to be used. To make
0016 use of automatically binding, there is no need to manipulate priority settings
0017 for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
0018 swapB, with swapA attached to node 0 and swapB attached to node 1, are going
0019 to be swapped on. Simply swapping them on by doing::
0020 
0021         # swapon /dev/swapA
0022         # swapon /dev/swapB
0023 
0024 Then node 0 will use the two swap devices in the order of swapA then swapB and
0025 node 1 will use the two swap devices in the order of swapB then swapA. Note
0026 that the order of them being swapped on doesn't matter.
0027 
0028 A more complex example on a 4 node machine. Assume 6 swap devices are going to
0029 be swapped on: swapA and swapB are attached to node 0, swapC is attached to
0030 node 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
0031 The way to swap them on is the same as above::
0032 
0033         # swapon /dev/swapA
0034         # swapon /dev/swapB
0035         # swapon /dev/swapC
0036         # swapon /dev/swapD
0037         # swapon /dev/swapE
0038         # swapon /dev/swapF
0039 
0040 Then node 0 will use them in the order of::
0041 
0042         swapA/swapB -> swapC -> swapD -> swapE -> swapF
0043 
0044 swapA and swapB will be used in a round robin mode before any other swap device.
0045 
0046 node 1 will use them in the order of::
0047 
0048         swapC -> swapA -> swapB -> swapD -> swapE -> swapF
0049 
0050 node 2 will use them in the order of::
0051 
0052         swapD/swapE -> swapA -> swapB -> swapC -> swapF
0053 
0054 Similaly, swapD and swapE will be used in a round robin mode before any
0055 other swap devices.
0056 
0057 node 3 will use them in the order of::
0058 
0059         swapF -> swapA -> swapB -> swapC -> swapD -> swapE
0060 
0061 
0062 Implementation details
0063 ======================
0064 
0065 The current code uses a priority based list, swap_avail_list, to decide
0066 which swap device to use and if multiple swap devices share the same
0067 priority, they are used round robin. This change here replaces the single
0068 global swap_avail_list with a per-numa-node list, i.e. for each numa node,
0069 it sees its own priority based list of available swap devices. Swap
0070 device's priority can be promoted on its matching node's swap_avail_list.
0071 
0072 The current swap device's priority is set as: user can set a >=0 value,
0073 or the system will pick one starting from -1 then downwards. The priority
0074 value in the swap_avail_list is the negated value of the swap device's
0075 due to plist being sorted from low to high. The new policy doesn't change
0076 the semantics for priority >=0 cases, the previous starting from -1 then
0077 downwards now becomes starting from -2 then downwards and -1 is reserved
0078 as the promoted value. So if multiple swap devices are attached to the same
0079 node, they will all be promoted to priority -1 on that node's plist and will
0080 be used round robin before any other swap devices.