Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =======================
0004 Energy Model of devices
0005 =======================
0006 
0007 1. Overview
0008 -----------
0009 
0010 The Energy Model (EM) framework serves as an interface between drivers knowing
0011 the power consumed by devices at various performance levels, and the kernel
0012 subsystems willing to use that information to make energy-aware decisions.
0013 
0014 The source of the information about the power consumed by devices can vary greatly
0015 from one platform to another. These power costs can be estimated using
0016 devicetree data in some cases. In others, the firmware will know better.
0017 Alternatively, userspace might be best positioned. And so on. In order to avoid
0018 each and every client subsystem to re-implement support for each and every
0019 possible source of information on its own, the EM framework intervenes as an
0020 abstraction layer which standardizes the format of power cost tables in the
0021 kernel, hence enabling to avoid redundant work.
0022 
0023 The power values might be expressed in micro-Watts or in an 'abstract scale'.
0024 Multiple subsystems might use the EM and it is up to the system integrator to
0025 check that the requirements for the power value scale types are met. An example
0026 can be found in the Energy-Aware Scheduler documentation
0027 Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
0028 powercap power values expressed in an 'abstract scale' might cause issues.
0029 These subsystems are more interested in estimation of power used in the past,
0030 thus the real micro-Watts might be needed. An example of these requirements can
0031 be found in the Intelligent Power Allocation in
0032 Documentation/driver-api/thermal/power_allocator.rst.
0033 Kernel subsystems might implement automatic detection to check whether EM
0034 registered devices have inconsistent scale (based on EM internal flag).
0035 Important thing to keep in mind is that when the power values are expressed in
0036 an 'abstract scale' deriving real energy in micro-Joules would not be possible.
0037 
0038 The figure below depicts an example of drivers (Arm-specific here, but the
0039 approach is applicable to any architecture) providing power costs to the EM
0040 framework, and interested clients reading the data from it::
0041 
0042        +---------------+  +-----------------+  +---------------+
0043        | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
0044        +---------------+  +-----------------+  +---------------+
0045                |                   | em_cpu_energy()   |
0046                |                   | em_cpu_get()      |
0047                +---------+         |         +---------+
0048                          |         |         |
0049                          v         v         v
0050                         +---------------------+
0051                         |    Energy Model     |
0052                         |     Framework       |
0053                         +---------------------+
0054                            ^       ^       ^
0055                            |       |       | em_dev_register_perf_domain()
0056                 +----------+       |       +---------+
0057                 |                  |                 |
0058         +---------------+  +---------------+  +--------------+
0059         |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
0060         +---------------+  +---------------+  +--------------+
0061                 ^                  ^                 ^
0062                 |                  |                 |
0063         +--------------+   +---------------+  +--------------+
0064         | Device Tree  |   |   Firmware    |  |      ?       |
0065         +--------------+   +---------------+  +--------------+
0066 
0067 In case of CPU devices the EM framework manages power cost tables per
0068 'performance domain' in the system. A performance domain is a group of CPUs
0069 whose performance is scaled together. Performance domains generally have a
0070 1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
0071 required to have the same micro-architecture. CPUs in different performance
0072 domains can have different micro-architectures.
0073 
0074 
0075 2. Core APIs
0076 ------------
0077 
0078 2.1 Config options
0079 ^^^^^^^^^^^^^^^^^^
0080 
0081 CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
0082 
0083 
0084 2.2 Registration of performance domains
0085 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0086 
0087 Registration of 'advanced' EM
0088 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0089 
0090 The 'advanced' EM gets it's name due to the fact that the driver is allowed
0091 to provide more precised power model. It's not limited to some implemented math
0092 formula in the framework (like it's in 'simple' EM case). It can better reflect
0093 the real power measurements performed for each performance state. Thus, this
0094 registration method should be preferred in case considering EM static power
0095 (leakage) is important.
0096 
0097 Drivers are expected to register performance domains into the EM framework by
0098 calling the following API::
0099 
0100   int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
0101                 struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);
0102 
0103 Drivers must provide a callback function returning <frequency, power> tuples
0104 for each performance state. The callback function provided by the driver is free
0105 to fetch data from any relevant location (DT, firmware, ...), and by any mean
0106 deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
0107 performance domains using cpumask. For other devices than CPUs the last
0108 argument must be set to NULL.
0109 The last argument 'microwatts' is important to set with correct value. Kernel
0110 subsystems which use EM might rely on this flag to check if all EM devices use
0111 the same scale. If there are different scales, these subsystems might decide
0112 to return warning/error, stop working or panic.
0113 See Section 3. for an example of driver implementing this
0114 callback, or Section 2.4 for further documentation on this API
0115 
0116 Registration of EM using DT
0117 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0118 
0119 The  EM can also be registered using OPP framework and information in DT
0120 "operating-points-v2". Each OPP entry in DT can be extended with a property
0121 "opp-microwatt" containing micro-Watts power value. This OPP DT property
0122 allows a platform to register EM power values which are reflecting total power
0123 (static + dynamic). These power values might be coming directly from
0124 experiments and measurements.
0125 
0126 Registration of 'artificial' EM
0127 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0128 
0129 There is an option to provide a custom callback for drivers missing detailed
0130 knowledge about power value for each performance state. The callback
0131 .get_cost() is optional and provides the 'cost' values used by the EAS.
0132 This is useful for platforms that only provide information on relative
0133 efficiency between CPU types, where one could use the information to
0134 create an abstract power model. But even an abstract power model can
0135 sometimes be hard to fit in, given the input power value size restrictions.
0136 The .get_cost() allows to provide the 'cost' values which reflect the
0137 efficiency of the CPUs. This would allow to provide EAS information which
0138 has different relation than what would be forced by the EM internal
0139 formulas calculating 'cost' values. To register an EM for such platform, the
0140 driver must set the flag 'microwatts' to 0, provide .get_power() callback
0141 and provide .get_cost() callback. The EM framework would handle such platform
0142 properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
0143 platform. Special care should be taken by other frameworks which are using EM
0144 to test and treat this flag properly.
0145 
0146 Registration of 'simple' EM
0147 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
0148 
0149 The 'simple' EM is registered using the framework helper function
0150 cpufreq_register_em_with_opp(). It implements a power model which is tight to
0151 math formula::
0152 
0153         Power = C * V^2 * f
0154 
0155 The EM which is registered using this method might not reflect correctly the
0156 physics of a real device, e.g. when static power (leakage) is important.
0157 
0158 
0159 2.3 Accessing performance domains
0160 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0161 
0162 There are two API functions which provide the access to the energy model:
0163 em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
0164 pointer as an argument. It depends on the subsystem which interface it is
0165 going to use, but in case of CPU devices both functions return the same
0166 performance domain.
0167 
0168 Subsystems interested in the energy model of a CPU can retrieve it using the
0169 em_cpu_get() API. The energy model tables are allocated once upon creation of
0170 the performance domains, and kept in memory untouched.
0171 
0172 The energy consumed by a performance domain can be estimated using the
0173 em_cpu_energy() API. The estimation is performed assuming that the schedutil
0174 CPUfreq governor is in use in case of CPU device. Currently this calculation is
0175 not provided for other type of devices.
0176 
0177 More details about the above APIs can be found in ``<linux/energy_model.h>``
0178 or in Section 2.4
0179 
0180 
0181 2.4 Description details of this API
0182 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0183 .. kernel-doc:: include/linux/energy_model.h
0184    :internal:
0185 
0186 .. kernel-doc:: kernel/power/energy_model.c
0187    :export:
0188 
0189 
0190 3. Example driver
0191 -----------------
0192 
0193 The CPUFreq framework supports dedicated callback for registering
0194 the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
0195 That callback has to be implemented properly for a given driver,
0196 because the framework would call it at the right time during setup.
0197 This section provides a simple example of a CPUFreq driver registering a
0198 performance domain in the Energy Model framework using the (fake) 'foo'
0199 protocol. The driver implements an est_power() function to be provided to the
0200 EM framework::
0201 
0202   -> drivers/cpufreq/foo_cpufreq.c
0203 
0204   01    static int est_power(struct device *dev, unsigned long *mW,
0205   02                    unsigned long *KHz)
0206   03    {
0207   04            long freq, power;
0208   05
0209   06            /* Use the 'foo' protocol to ceil the frequency */
0210   07            freq = foo_get_freq_ceil(dev, *KHz);
0211   08            if (freq < 0);
0212   09                    return freq;
0213   10
0214   11            /* Estimate the power cost for the dev at the relevant freq. */
0215   12            power = foo_estimate_power(dev, freq);
0216   13            if (power < 0);
0217   14                    return power;
0218   15
0219   16            /* Return the values to the EM framework */
0220   17            *mW = power;
0221   18            *KHz = freq;
0222   19
0223   20            return 0;
0224   21    }
0225   22
0226   23    static void foo_cpufreq_register_em(struct cpufreq_policy *policy)
0227   24    {
0228   25            struct em_data_callback em_cb = EM_DATA_CB(est_power);
0229   26            struct device *cpu_dev;
0230   27            int nr_opp;
0231   28
0232   29            cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
0233   30
0234   31            /* Find the number of OPPs for this policy */
0235   32            nr_opp = foo_get_nr_opp(policy);
0236   33
0237   34            /* And register the new performance domain */
0238   35            em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
0239   36                                        true);
0240   37    }
0241   38
0242   39    static struct cpufreq_driver foo_cpufreq_driver = {
0243   40            .register_em = foo_cpufreq_register_em,
0244   41    };