0001 ==============
0002 Cgroup Freezer
0003 ==============
0004
0005 The cgroup freezer is useful to batch job management system which start
0006 and stop sets of tasks in order to schedule the resources of a machine
0007 according to the desires of a system administrator. This sort of program
0008 is often used on HPC clusters to schedule access to the cluster as a
0009 whole. The cgroup freezer uses cgroups to describe the set of tasks to
0010 be started/stopped by the batch job management system. It also provides
0011 a means to start and stop the tasks composing the job.
0012
0013 The cgroup freezer will also be useful for checkpointing running groups
0014 of tasks. The freezer allows the checkpoint code to obtain a consistent
0015 image of the tasks by attempting to force the tasks in a cgroup into a
0016 quiescent state. Once the tasks are quiescent another task can
0017 walk /proc or invoke a kernel interface to gather information about the
0018 quiesced tasks. Checkpointed tasks can be restarted later should a
0019 recoverable error occur. This also allows the checkpointed tasks to be
0020 migrated between nodes in a cluster by copying the gathered information
0021 to another node and restarting the tasks there.
0022
0023 Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
0024 and resuming tasks in userspace. Both of these signals are observable
0025 from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
0026 blocked, or ignored it can be seen by waiting or ptracing parent tasks.
0027 SIGCONT is especially unsuitable since it can be caught by the task. Any
0028 programs designed to watch for SIGSTOP and SIGCONT could be broken by
0029 attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
0030 demonstrate this problem using nested bash shells::
0031
0032 $ echo $$
0033 16644
0034 $ bash
0035 $ echo $$
0036 16690
0037
0038 From a second, unrelated bash shell:
0039 $ kill -SIGSTOP 16690
0040 $ kill -SIGCONT 16690
0041
0042 <at this point 16690 exits and causes 16644 to exit too>
0043
0044 This happens because bash can observe both signals and choose how it
0045 responds to them.
0046
0047 Another example of a program which catches and responds to these
0048 signals is gdb. In fact any program designed to use ptrace is likely to
0049 have a problem with this method of stopping and resuming tasks.
0050
0051 In contrast, the cgroup freezer uses the kernel freezer code to
0052 prevent the freeze/unfreeze cycle from becoming visible to the tasks
0053 being frozen. This allows the bash example above and gdb to run as
0054 expected.
0055
0056 The cgroup freezer is hierarchical. Freezing a cgroup freezes all
0057 tasks belonging to the cgroup and all its descendant cgroups. Each
0058 cgroup has its own state (self-state) and the state inherited from the
0059 parent (parent-state). Iff both states are THAWED, the cgroup is
0060 THAWED.
0061
0062 The following cgroupfs files are created by cgroup freezer.
0063
0064 * freezer.state: Read-write.
0065
0066 When read, returns the effective state of the cgroup - "THAWED",
0067 "FREEZING" or "FROZEN". This is the combined self and parent-states.
0068 If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
0069
0070 FREEZING cgroup transitions into FROZEN state when all tasks
0071 belonging to the cgroup and its descendants become frozen. Note that
0072 a cgroup reverts to FREEZING from FROZEN after a new task is added
0073 to the cgroup or one of its descendant cgroups until the new task is
0074 frozen.
0075
0076 When written, sets the self-state of the cgroup. Two values are
0077 allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
0078 if not already freezing, enters FREEZING state along with all its
0079 descendant cgroups.
0080
0081 If THAWED is written, the self-state of the cgroup is changed to
0082 THAWED. Note that the effective state may not change to THAWED if
0083 the parent-state is still freezing. If a cgroup's effective state
0084 becomes THAWED, all its descendants which are freezing because of
0085 the cgroup also leave the freezing state.
0086
0087 * freezer.self_freezing: Read only.
0088
0089 Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
0090 This value is 1 iff the last write to freezer.state was "FROZEN".
0091
0092 * freezer.parent_freezing: Read only.
0093
0094 Shows the parent-state. 0 if none of the cgroup's ancestors is
0095 frozen; otherwise, 1.
0096
0097 The root cgroup is non-freezable and the above interface files don't
0098 exist.
0099
0100 * Examples of usage::
0101
0102 # mkdir /sys/fs/cgroup/freezer
0103 # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
0104 # mkdir /sys/fs/cgroup/freezer/0
0105 # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
0106
0107 to get status of the freezer subsystem::
0108
0109 # cat /sys/fs/cgroup/freezer/0/freezer.state
0110 THAWED
0111
0112 to freeze all tasks in the container::
0113
0114 # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
0115 # cat /sys/fs/cgroup/freezer/0/freezer.state
0116 FREEZING
0117 # cat /sys/fs/cgroup/freezer/0/freezer.state
0118 FROZEN
0119
0120 to unfreeze all tasks in the container::
0121
0122 # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
0123 # cat /sys/fs/cgroup/freezer/0/freezer.state
0124 THAWED
0125
0126 This is the basic mechanism which should do the right thing for user space task
0127 in a simple scenario.