Back to home page




0001 The padata parallel execution mechanism
0002 Last updated for 2.6.36
0004 Padata is a mechanism by which the kernel can farm work out to be done in
0005 parallel on multiple CPUs while retaining the ordering of tasks.  It was
0006 developed for use with the IPsec code, which needs to be able to perform
0007 encryption and decryption on large numbers of packets without reordering
0008 those packets.  The crypto developers made a point of writing padata in a
0009 sufficiently general fashion that it could be put to other uses as well.
0011 The first step in using padata is to set up a padata_instance structure for
0012 overall control of how tasks are to be run:
0014     #include <linux/padata.h>
0016     struct padata_instance *padata_alloc(struct workqueue_struct *wq,
0017                                          const struct cpumask *pcpumask,
0018                                          const struct cpumask *cbcpumask);
0020 The pcpumask describes which processors will be used to execute work
0021 submitted to this instance in parallel. The cbcpumask defines which
0022 processors are allowed to be used as the serialization callback processor.
0023 The workqueue wq is where the work will actually be done; it should be
0024 a multithreaded queue, naturally.
0026 To allocate a padata instance with the cpu_possible_mask for both
0027 cpumasks this helper function can be used:
0029     struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
0031 Note: Padata maintains two kinds of cpumasks internally. The user supplied
0032 cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable'
0033 cpumasks. The usable cpumasks are always a subset of active CPUs in the
0034 user supplied cpumasks; these are the cpumasks padata actually uses. So
0035 it is legal to supply a cpumask to padata that contains offline CPUs.
0036 Once an offline CPU in the user supplied cpumask comes online, padata
0037 is going to use it.
0039 There are functions for enabling and disabling the instance:
0041     int padata_start(struct padata_instance *pinst);
0042     void padata_stop(struct padata_instance *pinst);
0044 These functions are setting or clearing the "PADATA_INIT" flag;
0045 if that flag is not set, other functions will refuse to work.
0046 padata_start returns zero on success (flag set) or -EINVAL if the
0047 padata cpumask contains no active CPU (flag not set).
0048 padata_stop clears the flag and blocks until the padata instance
0049 is unused.
0051 The list of CPUs to be used can be adjusted with these functions:
0053     int padata_set_cpumasks(struct padata_instance *pinst,
0054                             cpumask_var_t pcpumask,
0055                             cpumask_var_t cbcpumask);
0056     int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
0057                            cpumask_var_t cpumask);
0058     int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask);
0059     int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask);
0061 Changing the CPU masks are expensive operations, though, so it should not be
0062 done with great frequency.
0064 It's possible to change both cpumasks of a padata instance with
0065 padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask)
0066 and for the serial callback function (cbcpumask). padata_set_cpumask is used to
0067 change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL,
0068 PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use.
0069 To simply add or remove one CPU from a certain cpumask the functions
0070 padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or
0071 remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
0073 If a user is interested in padata cpumask changes, he can register to
0074 the padata cpumask change notifier:
0076     int padata_register_cpumask_notifier(struct padata_instance *pinst,
0077                                          struct notifier_block *nblock);
0079 To unregister from that notifier:
0081     int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
0082                                            struct notifier_block *nblock);
0084 The padata cpumask change notifier notifies about changes of the usable
0085 cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
0087 Padata calls the notifier chain with:
0089     blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
0090                                  notification_mask,
0091                                  &pd_new->cpumask);
0093 Here cpumask_change_notifier is registered notifier, notification_mask
0094 is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer
0095 to a struct padata_cpumask that contains the new cpumask information.
0097 Actually submitting work to the padata instance requires the creation of a
0098 padata_priv structure:
0100     struct padata_priv {
0101         /* Other stuff here... */
0102         void                    (*parallel)(struct padata_priv *padata);
0103         void                    (*serial)(struct padata_priv *padata);
0104     };
0106 This structure will almost certainly be embedded within some larger
0107 structure specific to the work to be done.  Most of its fields are private to
0108 padata, but the structure should be zeroed at initialisation time, and the
0109 parallel() and serial() functions should be provided.  Those functions will
0110 be called in the process of getting the work done as we will see
0111 momentarily.
0113 The submission of work is done with:
0115     int padata_do_parallel(struct padata_instance *pinst,
0116                            struct padata_priv *padata, int cb_cpu);
0118 The pinst and padata structures must be set up as described above; cb_cpu
0119 specifies which CPU will be used for the final callback when the work is
0120 done; it must be in the current instance's CPU mask.  The return value from
0121 padata_do_parallel() is zero on success, indicating that the work is in
0122 progress. -EBUSY means that somebody, somewhere else is messing with the
0123 instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being
0124 in that CPU mask or about a not running instance.
0126 Each task submitted to padata_do_parallel() will, in turn, be passed to
0127 exactly one call to the above-mentioned parallel() function, on one CPU, so
0128 true parallelism is achieved by submitting multiple tasks.  Despite the
0129 fact that the workqueue is used to make these calls, parallel() is run with
0130 software interrupts disabled and thus cannot sleep.  The parallel()
0131 function gets the padata_priv structure pointer as its lone parameter;
0132 information about the actual work to be done is probably obtained by using
0133 container_of() to find the enclosing structure.
0135 Note that parallel() has no return value; the padata subsystem assumes that
0136 parallel() will take responsibility for the task from this point.  The work
0137 need not be completed during this call, but, if parallel() leaves work
0138 outstanding, it should be prepared to be called again with a new job before
0139 the previous one completes.  When a task does complete, parallel() (or
0140 whatever function actually finishes the job) should inform padata of the
0141 fact with a call to:
0143     void padata_do_serial(struct padata_priv *padata);
0145 At some point in the future, padata_do_serial() will trigger a call to the
0146 serial() function in the padata_priv structure.  That call will happen on
0147 the CPU requested in the initial call to padata_do_parallel(); it, too, is
0148 done through the workqueue, but with local software interrupts disabled.
0149 Note that this call may be deferred for a while since the padata code takes
0150 pains to ensure that tasks are completed in the order in which they were
0151 submitted.
0153 The one remaining function in the padata API should be called to clean up
0154 when a padata instance is no longer needed:
0156     void padata_free(struct padata_instance *pinst);
0158 This function will busy-wait while any remaining tasks are completed, so it
0159 might be best not to call it while there is work outstanding.  Shutting
0160 down the workqueue, if necessary, should be done separately.