Back to home page

OSCL-LXR

 
 

    


0001 Intel hybrid support
0002 --------------------
0003 Support for Intel hybrid events within perf tools.
0004 
0005 For some Intel platforms, such as AlderLake, which is hybrid platform and
0006 it consists of atom cpu and core cpu. Each cpu has dedicated event list.
0007 Part of events are available on core cpu, part of events are available
0008 on atom cpu and even part of events are available on both.
0009 
0010 Kernel exports two new cpu pmus via sysfs:
0011 /sys/devices/cpu_core
0012 /sys/devices/cpu_atom
0013 
0014 The 'cpus' files are created under the directories. For example,
0015 
0016 cat /sys/devices/cpu_core/cpus
0017 0-15
0018 
0019 cat /sys/devices/cpu_atom/cpus
0020 16-23
0021 
0022 It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
0023 
0024 As before, use perf-list to list the symbolic event.
0025 
0026 perf list
0027 
0028 inst_retired.any
0029         [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
0030 inst_retired.any
0031         [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
0032 
0033 The 'Unit: xxx' is added to brief description to indicate which pmu
0034 the event is belong to. Same event name but with different pmu can
0035 be supported.
0036 
0037 Enable hybrid event with a specific pmu
0038 
0039 To enable a core only event or atom only event, following syntax is supported:
0040 
0041         cpu_core/<event name>/
0042 or
0043         cpu_atom/<event name>/
0044 
0045 For example, count the 'cycles' event on core cpus.
0046 
0047         perf stat -e cpu_core/cycles/
0048 
0049 Create two events for one hardware event automatically
0050 
0051 When creating one event and the event is available on both atom and core,
0052 two events are created automatically. One is for atom, the other is for
0053 core. Most of hardware events and cache events are available on both
0054 cpu_core and cpu_atom.
0055 
0056 For hardware events, they have pre-defined configs (e.g. 0 for cycles).
0057 But on hybrid platform, kernel needs to know where the event comes from
0058 (from atom or from core). The original perf event type PERF_TYPE_HARDWARE
0059 can't carry pmu information. So now this type is extended to be PMU aware
0060 type. The PMU type ID is stored at attr.config[63:32].
0061 
0062 PMU type ID is retrieved from sysfs.
0063 /sys/devices/cpu_atom/type
0064 /sys/devices/cpu_core/type
0065 
0066 The new attr.config layout for PERF_TYPE_HARDWARE:
0067 
0068 PERF_TYPE_HARDWARE:                 0xEEEEEEEE000000AA
0069                                     AA: hardware event ID
0070                                     EEEEEEEE: PMU type ID
0071 
0072 Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
0073 PMU aware type. The PMU type ID is stored at attr.config[63:32].
0074 
0075 The new attr.config layout for PERF_TYPE_HW_CACHE:
0076 
0077 PERF_TYPE_HW_CACHE:                 0xEEEEEEEE00DDCCBB
0078                                     BB: hardware cache ID
0079                                     CC: hardware cache op ID
0080                                     DD: hardware cache op result ID
0081                                     EEEEEEEE: PMU type ID
0082 
0083 When enabling a hardware event without specified pmu, such as,
0084 perf stat -e cycles -a (use system-wide in this example), two events
0085 are created automatically.
0086 
0087   ------------------------------------------------------------
0088   perf_event_attr:
0089     size                             120
0090     config                           0x400000000
0091     sample_type                      IDENTIFIER
0092     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0093     disabled                         1
0094     inherit                          1
0095     exclude_guest                    1
0096   ------------------------------------------------------------
0097 
0098 and
0099 
0100   ------------------------------------------------------------
0101   perf_event_attr:
0102     size                             120
0103     config                           0x800000000
0104     sample_type                      IDENTIFIER
0105     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0106     disabled                         1
0107     inherit                          1
0108     exclude_guest                    1
0109   ------------------------------------------------------------
0110 
0111 type 0 is PERF_TYPE_HARDWARE.
0112 0x4 in 0x400000000 indicates it's cpu_core pmu.
0113 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
0114 
0115 The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
0116 and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
0117 
0118 For perf-stat result, it displays two events:
0119 
0120  Performance counter stats for 'system wide':
0121 
0122            6,744,979      cpu_core/cycles/
0123            1,965,552      cpu_atom/cycles/
0124 
0125 The first 'cycles' is core event, the second 'cycles' is atom event.
0126 
0127 Thread mode example:
0128 
0129 perf-stat reports the scaled counts for hybrid event and with a percentage
0130 displayed. The percentage is the event's running time/enabling time.
0131 
0132 One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
0133 scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
0134 
0135 perf stat -e cycles \-- taskset -c 16 ./triad_loop
0136 
0137 As previous, two events are created.
0138 
0139 ------------------------------------------------------------
0140 perf_event_attr:
0141   size                             120
0142   config                           0x400000000
0143   sample_type                      IDENTIFIER
0144   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0145   disabled                         1
0146   inherit                          1
0147   enable_on_exec                   1
0148   exclude_guest                    1
0149 ------------------------------------------------------------
0150 
0151 and
0152 
0153 ------------------------------------------------------------
0154 perf_event_attr:
0155   size                             120
0156   config                           0x800000000
0157   sample_type                      IDENTIFIER
0158   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0159   disabled                         1
0160   inherit                          1
0161   enable_on_exec                   1
0162   exclude_guest                    1
0163 ------------------------------------------------------------
0164 
0165  Performance counter stats for 'taskset -c 16 ./triad_loop':
0166 
0167        233,066,666      cpu_core/cycles/                                              (0.43%)
0168        604,097,080      cpu_atom/cycles/                                              (99.57%)
0169 
0170 perf-record:
0171 
0172 If there is no '-e' specified in perf record, on hybrid platform,
0173 it creates two default 'cycles' and adds them to event list. One
0174 is for core, the other is for atom.
0175 
0176 perf-stat:
0177 
0178 If there is no '-e' specified in perf stat, on hybrid platform,
0179 besides of software events, following events are created and
0180 added to event list in order.
0181 
0182 cpu_core/cycles/,
0183 cpu_atom/cycles/,
0184 cpu_core/instructions/,
0185 cpu_atom/instructions/,
0186 cpu_core/branches/,
0187 cpu_atom/branches/,
0188 cpu_core/branch-misses/,
0189 cpu_atom/branch-misses/
0190 
0191 Of course, both perf-stat and perf-record support to enable
0192 hybrid event with a specific pmu.
0193 
0194 e.g.
0195 perf stat -e cpu_core/cycles/
0196 perf stat -e cpu_atom/cycles/
0197 perf stat -e cpu_core/r1a/
0198 perf stat -e cpu_atom/L1-icache-loads/
0199 perf stat -e cpu_core/cycles/,cpu_atom/instructions/
0200 perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
0201 
0202 But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
0203 warning and disable grouping, because the pmus in group are
0204 not matched (cpu_core vs. cpu_atom).