Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 .. include:: <isonum.txt>
0003 
0004 ===============================================
0005 ``amd-pstate`` CPU Performance Scaling Driver
0006 ===============================================
0007 
0008 :Copyright: |copy| 2021 Advanced Micro Devices, Inc.
0009 
0010 :Author: Huang Rui <ray.huang@amd.com>
0011 
0012 
0013 Introduction
0014 ===================
0015 
0016 ``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
0017 new CPU frequency control mechanism on modern AMD APU and CPU series in
0018 Linux kernel. The new mechanism is based on Collaborative Processor
0019 Performance Control (CPPC) which provides finer grain frequency management
0020 than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
0021 the ACPI P-states driver to manage CPU frequency and clocks with switching
0022 only in 3 P-states. CPPC replaces the ACPI P-states controls and allows a
0023 flexible, low-latency interface for the Linux kernel to directly
0024 communicate the performance hints to hardware.
0025 
0026 ``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
0027 ``ondemand``, etc. to manage the performance hints which are provided by
0028 CPPC hardware functionality that internally follows the hardware
0029 specification (for details refer to AMD64 Architecture Programmer's Manual
0030 Volume 2: System Programming [1]_). Currently, ``amd-pstate`` supports basic
0031 frequency control function according to kernel governors on some of the
0032 Zen2 and Zen3 processors, and we will implement more AMD specific functions
0033 in future after we verify them on the hardware and SBIOS.
0034 
0035 
0036 AMD CPPC Overview
0037 =======================
0038 
0039 Collaborative Processor Performance Control (CPPC) interface enumerates a
0040 continuous, abstract, and unit-less performance value in a scale that is
0041 not tied to a specific performance state / frequency. This is an ACPI
0042 standard [2]_ which software can specify application performance goals and
0043 hints as a relative target to the infrastructure limits. AMD processors
0044 provide the low latency register model (MSR) instead of an AML code
0045 interpreter for performance adjustments. ``amd-pstate`` will initialize a
0046 ``struct cpufreq_driver`` instance, ``amd_pstate_driver``, with the callbacks
0047 to manage each performance update behavior. ::
0048 
0049  Highest Perf ------>+-----------------------+                         +-----------------------+
0050                      |                       |                         |                       |
0051                      |                       |                         |                       |
0052                      |                       |          Max Perf  ---->|                       |
0053                      |                       |                         |                       |
0054                      |                       |                         |                       |
0055  Nominal Perf ------>+-----------------------+                         +-----------------------+
0056                      |                       |                         |                       |
0057                      |                       |                         |                       |
0058                      |                       |                         |                       |
0059                      |                       |                         |                       |
0060                      |                       |                         |                       |
0061                      |                       |                         |                       |
0062                      |                       |      Desired Perf  ---->|                       |
0063                      |                       |                         |                       |
0064                      |                       |                         |                       |
0065                      |                       |                         |                       |
0066                      |                       |                         |                       |
0067                      |                       |                         |                       |
0068                      |                       |                         |                       |
0069                      |                       |                         |                       |
0070                      |                       |                         |                       |
0071                      |                       |                         |                       |
0072   Lowest non-        |                       |                         |                       |
0073   linear perf ------>+-----------------------+                         +-----------------------+
0074                      |                       |                         |                       |
0075                      |                       |       Lowest perf  ---->|                       |
0076                      |                       |                         |                       |
0077   Lowest perf ------>+-----------------------+                         +-----------------------+
0078                      |                       |                         |                       |
0079                      |                       |                         |                       |
0080                      |                       |                         |                       |
0081           0   ------>+-----------------------+                         +-----------------------+
0082 
0083                                      AMD P-States Performance Scale
0084 
0085 
0086 .. _perf_cap:
0087 
0088 AMD CPPC Performance Capability
0089 --------------------------------
0090 
0091 Highest Performance (RO)
0092 .........................
0093 
0094 This is the absolute maximum performance an individual processor may reach,
0095 assuming ideal conditions. This performance level may not be sustainable
0096 for long durations and may only be achievable if other platform components
0097 are in a specific state; for example, it may require other processors to be in
0098 an idle state. This would be equivalent to the highest frequencies
0099 supported by the processor.
0100 
0101 Nominal (Guaranteed) Performance (RO)
0102 ......................................
0103 
0104 This is the maximum sustained performance level of the processor, assuming
0105 ideal operating conditions. In the absence of an external constraint (power,
0106 thermal, etc.), this is the performance level the processor is expected to
0107 be able to maintain continuously. All cores/processors are expected to be
0108 able to sustain their nominal performance state simultaneously.
0109 
0110 Lowest non-linear Performance (RO)
0111 ...................................
0112 
0113 This is the lowest performance level at which nonlinear power savings are
0114 achieved, for example, due to the combined effects of voltage and frequency
0115 scaling. Above this threshold, lower performance levels should be generally
0116 more energy efficient than higher performance levels. This register
0117 effectively conveys the most efficient performance level to ``amd-pstate``.
0118 
0119 Lowest Performance (RO)
0120 ........................
0121 
0122 This is the absolute lowest performance level of the processor. Selecting a
0123 performance level lower than the lowest nonlinear performance level may
0124 cause an efficiency penalty but should reduce the instantaneous power
0125 consumption of the processor.
0126 
0127 AMD CPPC Performance Control
0128 ------------------------------
0129 
0130 ``amd-pstate`` passes performance goals through these registers. The
0131 register drives the behavior of the desired performance target.
0132 
0133 Minimum requested performance (RW)
0134 ...................................
0135 
0136 ``amd-pstate`` specifies the minimum allowed performance level.
0137 
0138 Maximum requested performance (RW)
0139 ...................................
0140 
0141 ``amd-pstate`` specifies a limit the maximum performance that is expected
0142 to be supplied by the hardware.
0143 
0144 Desired performance target (RW)
0145 ...................................
0146 
0147 ``amd-pstate`` specifies a desired target in the CPPC performance scale as
0148 a relative number. This can be expressed as percentage of nominal
0149 performance (infrastructure max). Below the nominal sustained performance
0150 level, desired performance expresses the average performance level of the
0151 processor subject to hardware. Above the nominal performance level,
0152 the processor must provide at least nominal performance requested and go higher
0153 if current operating conditions allow.
0154 
0155 Energy Performance Preference (EPP) (RW)
0156 .........................................
0157 
0158 This attribute provides a hint to the hardware if software wants to bias
0159 toward performance (0x0) or energy efficiency (0xff).
0160 
0161 
0162 Key Governors Support
0163 =======================
0164 
0165 ``amd-pstate`` can be used with all the (generic) scaling governors listed
0166 by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
0167 it is responsible for the configuration of policy objects corresponding to
0168 CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
0169 to the policy objects) with accurate information on the maximum and minimum
0170 operating frequencies supported by the hardware. Users can check the
0171 ``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
0172 
0173 ``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
0174 frequency control. It is to fine tune the processor configuration on
0175 ``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
0176 registers the adjust_perf callback to implement performance update behavior
0177 similar to CPPC. It is initialized by ``sugov_start`` and then populates the
0178 CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as the
0179 utilization update callback function in the CPU scheduler. The CPU scheduler
0180 will call ``cpufreq_update_util`` and assigns the target performance according
0181 to the ``struct sugov_cpu`` that the utilization update belongs to.
0182 Then, ``amd-pstate`` updates the desired performance according to the CPU
0183 scheduler assigned.
0184 
0185 
0186 Processor Support
0187 =======================
0188 
0189 The ``amd-pstate`` initialization will fail if the ``_CPC`` entry in the ACPI
0190 SBIOS does not exist in the detected processor. It uses ``acpi_cpc_valid``
0191 to check the existence of ``_CPC``. All Zen based processors support the legacy
0192 ACPI hardware P-States function, so when ``amd-pstate`` fails initialization,
0193 the kernel will fall back to initialize the ``acpi-cpufreq`` driver.
0194 
0195 There are two types of hardware implementations for ``amd-pstate``: one is
0196 `Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
0197 <perf_cap_>`_. It can use the :c:macro:`X86_FEATURE_CPPC` feature flag to
0198 indicate the different types. (For details, refer to the Processor Programming
0199 Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors [3]_.)
0200 ``amd-pstate`` is to register different ``static_call`` instances for different
0201 hardware implementations.
0202 
0203 Currently, some of the Zen2 and Zen3 processors support ``amd-pstate``. In the
0204 future, it will be supported on more and more AMD processors.
0205 
0206 Full MSR Support
0207 -----------------
0208 
0209 Some new Zen3 processors such as Cezanne provide the MSR registers directly
0210 while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set.
0211 ``amd-pstate`` can handle the MSR register to implement the fast switch
0212 function in ``CPUFreq`` that can reduce the latency of frequency control in
0213 interrupt context. The functions with a ``pstate_xxx`` prefix represent the
0214 operations on MSR registers.
0215 
0216 Shared Memory Support
0217 ----------------------
0218 
0219 If the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, the
0220 processor supports the shared memory solution. In this case, ``amd-pstate``
0221 uses the ``cppc_acpi`` helper methods to implement the callback functions
0222 that are defined on ``static_call``. The functions with the ``cppc_xxx`` prefix
0223 represent the operations of ACPI CPPC helpers for the shared memory solution.
0224 
0225 
0226 AMD P-States and ACPI hardware P-States always can be supported in one
0227 processor. But AMD P-States has the higher priority and if it is enabled
0228 with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
0229 to the request from AMD P-States.
0230 
0231 
0232 User Space Interface in ``sysfs``
0233 ==================================
0234 
0235 ``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
0236 control its functionality at the system level. They are located in the
0237 ``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
0238 
0239  root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
0240  /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
0241  /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
0242  /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
0243 
0244 
0245 ``amd_pstate_highest_perf / amd_pstate_max_freq``
0246 
0247 Maximum CPPC performance and CPU frequency that the driver is allowed to
0248 set, in percent of the maximum supported CPPC performance level (the highest
0249 performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
0250 In some ASICs, the highest CPPC performance is not the one in the ``_CPC``
0251 table, so we need to expose it to sysfs. If boost is not active, but
0252 still supported, this maximum frequency will be larger than the one in
0253 ``cpuinfo``.
0254 This attribute is read-only.
0255 
0256 ``amd_pstate_lowest_nonlinear_freq``
0257 
0258 The lowest non-linear CPPC CPU frequency that the driver is allowed to set,
0259 in percent of the maximum supported CPPC performance level. (Please see the
0260 lowest non-linear performance in `AMD CPPC Performance Capability
0261 <perf_cap_>`_.)
0262 This attribute is read-only.
0263 
0264 Other performance and frequency values can be read back from
0265 ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
0266 
0267 
0268 ``amd-pstate`` vs ``acpi-cpufreq``
0269 ======================================
0270 
0271 On the majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
0272 provided by the platform firmware are used for CPU performance scaling, but
0273 only provide 3 P-states on AMD processors.
0274 However, on modern AMD APU and CPU series, hardware provides the Collaborative
0275 Processor Performance Control according to the ACPI protocol and customizes this
0276 for AMD platforms. That is, fine-grained and continuous frequency ranges
0277 instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
0278 module which supports the new AMD P-States mechanism on most of the future AMD
0279 platforms. The AMD P-States mechanism is the more performance and energy
0280 efficiency frequency management method on AMD processors.
0281 
0282 Kernel Module Options for ``amd-pstate``
0283 =========================================
0284 
0285 ``shared_mem``
0286 Use a module param (shared_mem) to enable related processors manually with
0287 **amd_pstate.shared_mem=1**.
0288 Due to the performance issue on the processors with `Shared Memory Support
0289 <perf_cap_>`_, we disable it presently and will re-enable this by default
0290 once we address performance issue with this solution.
0291 
0292 To check whether the current processor is using `Full MSR Support <perf_cap_>`_
0293 or `Shared Memory Support <perf_cap_>`_ : ::
0294 
0295   ray@hr-test1:~$ lscpu | grep cppc
0296   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
0297 
0298 If the CPU flags have ``cppc``, then this processor supports `Full MSR Support
0299 <perf_cap_>`_. Otherwise, it supports `Shared Memory Support <perf_cap_>`_.
0300 
0301 
0302 ``cpupower`` tool support for ``amd-pstate``
0303 ===============================================
0304 
0305 ``amd-pstate`` is supported by the ``cpupower`` tool, which can be used to dump
0306 frequency information. Development is in progress to support more and more
0307 operations for the new ``amd-pstate`` module with this tool. ::
0308 
0309  root@hr-test1:/home/ray# cpupower frequency-info
0310  analyzing CPU 0:
0311    driver: amd-pstate
0312    CPUs which run at the same hardware frequency: 0
0313    CPUs which need to have their frequency coordinated by software: 0
0314    maximum transition latency: 131 us
0315    hardware limits: 400 MHz - 4.68 GHz
0316    available cpufreq governors: ondemand conservative powersave userspace performance schedutil
0317    current policy: frequency should be within 400 MHz and 4.68 GHz.
0318                    The governor "schedutil" may decide which speed to use
0319                    within this range.
0320    current CPU frequency: Unable to call hardware
0321    current CPU frequency: 4.02 GHz (asserted by call to kernel)
0322    boost state support:
0323      Supported: yes
0324      Active: yes
0325      AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
0326      AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
0327      AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
0328      AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
0329 
0330 
0331 Diagnostics and Tuning
0332 =======================
0333 
0334 Trace Events
0335 --------------
0336 
0337 There are two static trace events that can be used for ``amd-pstate``
0338 diagnostics. One of them is the ``cpu_frequency`` trace event generally used
0339 by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
0340 specific to ``amd-pstate``.  The following sequence of shell commands can
0341 be used to enable them and see their output (if the kernel is
0342 configured to support event tracing). ::
0343 
0344  root@hr-test1:/home/ray# cd /sys/kernel/tracing/
0345  root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
0346  root@hr-test1:/sys/kernel/tracing# cat trace
0347  # tracer: nop
0348  #
0349  # entries-in-buffer/entries-written: 47827/42233061   #P:2
0350  #
0351  #                                _-----=> irqs-off
0352  #                               / _----=> need-resched
0353  #                              | / _---=> hardirq/softirq
0354  #                              || / _--=> preempt-depth
0355  #                              ||| /     delay
0356  #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
0357  #              | |         |   ||||      |         |
0358           <idle>-0       [015] dN...  4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
0359           <idle>-0       [007] d.h..  4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
0360              cat-2161    [000] d....  4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
0361             sshd-2125    [004] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
0362           <idle>-0       [007] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
0363           <idle>-0       [003] d.s..  4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
0364           <idle>-0       [011] d.s..  4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
0365 
0366 The ``cpu_frequency`` trace event will be triggered either by the ``schedutil`` scaling
0367 governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
0368 policies with other scaling governors).
0369 
0370 
0371 Tracer Tool
0372 -------------
0373 
0374 ``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
0375 generate performance plots. This utility can be used to debug and tune the
0376 performance of ``amd-pstate`` driver. The tracer tool needs to import intel
0377 pstate tracer.
0378 
0379 Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
0380 used in two ways. If trace file is available, then directly parse the file
0381 with command ::
0382 
0383  ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
0384 
0385 Or generate trace file with root privilege, then parse and plot with command ::
0386 
0387  sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
0388 
0389 The test result can be found in ``results/test_name``. Following is the example
0390 about part of the output. ::
0391 
0392  common_cpu  common_secs  common_usecs  min_perf  des_perf  max_perf  freq    mperf   apef    tsc       load   duration_ms  sample_num  elapsed_time  common_comm
0393  CPU_005     712          116384        39        49        166       0.7565  9645075 2214891 38431470  25.1   11.646       469         2.496         kworker/5:0-40
0394  CPU_006     712          116408        39        49        166       0.6769  8950227 1839034 37192089  24.06  11.272       470         2.496         kworker/6:0-1264
0395 
0396 
0397 Reference
0398 ===========
0399 
0400 .. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
0401        https://www.amd.com/system/files/TechDocs/24593.pdf
0402 
0403 .. [2] Advanced Configuration and Power Interface Specification,
0404        https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
0405 
0406 .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors
0407        https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip