Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =====================
0004 Fake NUMA For CPUSets
0005 =====================
0006 
0007 :Author: David Rientjes <rientjes@cs.washington.edu>
0008 
0009 Using numa=fake and CPUSets for Resource Management
0010 
0011 This document describes how the numa=fake x86_64 command-line option can be used
0012 in conjunction with cpusets for coarse memory management.  Using this feature,
0013 you can create fake NUMA nodes that represent contiguous chunks of memory and
0014 assign them to cpusets and their attached tasks.  This is a way of limiting the
0015 amount of system memory that are available to a certain class of tasks.
0016 
0017 For more information on the features of cpusets, see
0018 Documentation/admin-guide/cgroup-v1/cpusets.rst.
0019 There are a number of different configurations you can use for your needs.  For
0020 more information on the numa=fake command line option and its various ways of
0021 configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
0022 
0023 For the purposes of this introduction, we'll assume a very primitive NUMA
0024 emulation setup of "numa=fake=4*512,".  This will split our system memory into
0025 four equal chunks of 512M each that we can now use to assign to cpusets.  As
0026 you become more familiar with using this combination for resource control,
0027 you'll determine a better setup to minimize the number of nodes you have to deal
0028 with.
0029 
0030 A machine may be split as follows with "numa=fake=4*512," as reported by dmesg::
0031 
0032         Faking node 0 at 0000000000000000-0000000020000000 (512MB)
0033         Faking node 1 at 0000000020000000-0000000040000000 (512MB)
0034         Faking node 2 at 0000000040000000-0000000060000000 (512MB)
0035         Faking node 3 at 0000000060000000-0000000080000000 (512MB)
0036         ...
0037         On node 0 totalpages: 130975
0038         On node 1 totalpages: 131072
0039         On node 2 totalpages: 131072
0040         On node 3 totalpages: 131072
0041 
0042 Now following the instructions for mounting the cpusets filesystem from
0043 Documentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
0044 address spaces) to individual cpusets::
0045 
0046         [root@xroads /]# mkdir exampleset
0047         [root@xroads /]# mount -t cpuset none exampleset
0048         [root@xroads /]# mkdir exampleset/ddset
0049         [root@xroads /]# cd exampleset/ddset
0050         [root@xroads /exampleset/ddset]# echo 0-1 > cpus
0051         [root@xroads /exampleset/ddset]# echo 0-1 > mems
0052 
0053 Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for
0054 memory allocations (1G).
0055 
0056 You can now assign tasks to these cpusets to limit the memory resources
0057 available to them according to the fake nodes assigned as mems::
0058 
0059         [root@xroads /exampleset/ddset]# echo $$ > tasks
0060         [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G
0061         [1] 13425
0062 
0063 Notice the difference between the system memory usage as reported by
0064 /proc/meminfo between the restricted cpuset case above and the unrestricted
0065 case (i.e. running the same 'dd' command without assigning it to a fake NUMA
0066 cpuset):
0067 
0068         ========        ============    ==========
0069         Name            Unrestricted    Restricted
0070         ========        ============    ==========
0071         MemTotal        3091900 kB      3091900 kB
0072         MemFree         42113 kB        1513236 kB
0073         ========        ============    ==========
0074 
0075 This allows for coarse memory management for the tasks you assign to particular
0076 cpusets.  Since cpusets can form a hierarchy, you can create some pretty
0077 interesting combinations of use-cases for various classes of tasks for your
0078 memory management needs.