Documentation/core-api/padata.rst

0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =======================================
0004 The padata parallel execution mechanism
0005 =======================================
0006
0007 :Date: May 2020
0008
0009 Padata is a mechanism by which the kernel can farm jobs out to be done in
0010 parallel on multiple CPUs while optionally retaining their ordering.
0011
0012 It was originally developed for IPsec, which needs to perform encryption and
0013 decryption on large numbers of packets without reordering those packets.  This
0014 is currently the sole consumer of padata's serialized job support.
0015
0016 Padata also supports multithreaded jobs, splitting up the job evenly while load
0017 balancing and coordinating between threads.
0018
0019 Running Serialized Jobs
0020 =======================
0021
0022 Initializing
0023 ------------
0024
0025 The first step in using padata to run serialized jobs is to set up a
0026 padata_instance structure for overall control of how jobs are to be run::
0027
0028     #include <linux/padata.h>
0029
0030     struct padata_instance *padata_alloc(const char *name);
0031
0032 'name' simply identifies the instance.
0033
0034 Then, complete padata initialization by allocating a padata_shell::
0035
0036    struct padata_shell *padata_alloc_shell(struct padata_instance *pinst);
0037
0038 A padata_shell is used to submit a job to padata and allows a series of such
0039 jobs to be serialized independently.  A padata_instance may have one or more
0040 padata_shells associated with it, each allowing a separate series of jobs.
0041
0042 Modifying cpumasks
0043 ------------------
0044
0045 The CPUs used to run jobs can be changed in two ways, programatically with
0046 padata_set_cpumask() or via sysfs.  The former is defined::
0047
0048     int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
0049                            cpumask_var_t cpumask);
0050
0051 Here cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a
0052 parallel cpumask describes which processors will be used to execute jobs
0053 submitted to this instance in parallel and a serial cpumask defines which
0054 processors are allowed to be used as the serialization callback processor.
0055 cpumask specifies the new cpumask to use.
0056
0057 There may be sysfs files for an instance's cpumasks.  For example, pcrypt's
0058 live in /sys/kernel/pcrypt/<instance-name>.  Within an instance's directory
0059 there are two files, parallel_cpumask and serial_cpumask, and either cpumask
0060 may be changed by echoing a bitmask into the file, for example::
0061
0062     echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
0063
0064 Reading one of these files shows the user-supplied cpumask, which may be
0065 different from the 'usable' cpumask.
0066
0067 Padata maintains two pairs of cpumasks internally, the user-supplied cpumasks
0068 and the 'usable' cpumasks.  (Each pair consists of a parallel and a serial
0069 cpumask.)  The user-supplied cpumasks default to all possible CPUs on instance
0070 allocation and may be changed as above.  The usable cpumasks are always a
0071 subset of the user-supplied cpumasks and contain only the online CPUs in the
0072 user-supplied masks; these are the cpumasks padata actually uses.  So it is
0073 legal to supply a cpumask to padata that contains offline CPUs.  Once an
0074 offline CPU in the user-supplied cpumask comes online, padata is going to use
0075 it.
0076
0077 Changing the CPU masks are expensive operations, so it should not be done with
0078 great frequency.
0079
0080 Running A Job
0081 -------------
0082
0083 Actually submitting work to the padata instance requires the creation of a
0084 padata_priv structure, which represents one job::
0085
0086     struct padata_priv {
0087         /* Other stuff here... */
0088         void                    (*parallel)(struct padata_priv *padata);
0089         void                    (*serial)(struct padata_priv *padata);
0090     };
0091
0092 This structure will almost certainly be embedded within some larger
0093 structure specific to the work to be done.  Most of its fields are private to
0094 padata, but the structure should be zeroed at initialisation time, and the
0095 parallel() and serial() functions should be provided.  Those functions will
0096 be called in the process of getting the work done as we will see
0097 momentarily.
0098
0099 The submission of the job is done with::
0100
0101     int padata_do_parallel(struct padata_shell *ps,
0102                            struct padata_priv *padata, int *cb_cpu);
0103
0104 The ps and padata structures must be set up as described above; cb_cpu
0105 points to the preferred CPU to be used for the final callback when the job is
0106 done; it must be in the current instance's CPU mask (if not the cb_cpu pointer
0107 is updated to point to the CPU actually chosen).  The return value from
0108 padata_do_parallel() is zero on success, indicating that the job is in
0109 progress. -EBUSY means that somebody, somewhere else is messing with the
0110 instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the
0111 serial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped
0112 instance.
0113
0114 Each job submitted to padata_do_parallel() will, in turn, be passed to
0115 exactly one call to the above-mentioned parallel() function, on one CPU, so
0116 true parallelism is achieved by submitting multiple jobs.  parallel() runs with
0117 software interrupts disabled and thus cannot sleep.  The parallel()
0118 function gets the padata_priv structure pointer as its lone parameter;
0119 information about the actual work to be done is probably obtained by using
0120 container_of() to find the enclosing structure.
0121
0122 Note that parallel() has no return value; the padata subsystem assumes that
0123 parallel() will take responsibility for the job from this point.  The job
0124 need not be completed during this call, but, if parallel() leaves work
0125 outstanding, it should be prepared to be called again with a new job before
0126 the previous one completes.
0127
0128 Serializing Jobs
0129 ----------------
0130
0131 When a job does complete, parallel() (or whatever function actually finishes
0132 the work) should inform padata of the fact with a call to::
0133
0134     void padata_do_serial(struct padata_priv *padata);
0135
0136 At some point in the future, padata_do_serial() will trigger a call to the
0137 serial() function in the padata_priv structure.  That call will happen on
0138 the CPU requested in the initial call to padata_do_parallel(); it, too, is
0139 run with local software interrupts disabled.
0140 Note that this call may be deferred for a while since the padata code takes
0141 pains to ensure that jobs are completed in the order in which they were
0142 submitted.
0143
0144 Destroying
0145 ----------
0146
0147 Cleaning up a padata instance predictably involves calling the two free
0148 functions that correspond to the allocation in reverse::
0149
0150     void padata_free_shell(struct padata_shell *ps);
0151     void padata_free(struct padata_instance *pinst);
0152
0153 It is the user's responsibility to ensure all outstanding jobs are complete
0154 before any of the above are called.
0155
0156 Running Multithreaded Jobs
0157 ==========================
0158
0159 A multithreaded job has a main thread and zero or more helper threads, with the
0160 main thread participating in the job and then waiting until all helpers have
0161 finished.  padata splits the job into units called chunks, where a chunk is a
0162 piece of the job that one thread completes in one call to the thread function.
0163
0164 A user has to do three things to run a multithreaded job.  First, describe the
0165 job by defining a padata_mt_job structure, which is explained in the Interface
0166 section.  This includes a pointer to the thread function, which padata will
0167 call each time it assigns a job chunk to a thread.  Then, define the thread
0168 function, which accepts three arguments, ``start``, ``end``, and ``arg``, where
0169 the first two delimit the range that the thread operates on and the last is a
0170 pointer to the job's shared state, if any.  Prepare the shared state, which is
0171 typically allocated on the main thread's stack.  Last, call
0172 padata_do_multithreaded(), which will return once the job is finished.
0173
0174 Interface
0175 =========
0176
0177 .. kernel-doc:: include/linux/padata.h
0178 .. kernel-doc:: kernel/padata.c