0001 Intel hybrid support
0002 --------------------
0003 Support for Intel hybrid events within perf tools.
0004
0005 For some Intel platforms, such as AlderLake, which is hybrid platform and
0006 it consists of atom cpu and core cpu. Each cpu has dedicated event list.
0007 Part of events are available on core cpu, part of events are available
0008 on atom cpu and even part of events are available on both.
0009
0010 Kernel exports two new cpu pmus via sysfs:
0011 /sys/devices/cpu_core
0012 /sys/devices/cpu_atom
0013
0014 The 'cpus' files are created under the directories. For example,
0015
0016 cat /sys/devices/cpu_core/cpus
0017 0-15
0018
0019 cat /sys/devices/cpu_atom/cpus
0020 16-23
0021
0022 It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
0023
0024 As before, use perf-list to list the symbolic event.
0025
0026 perf list
0027
0028 inst_retired.any
0029 [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
0030 inst_retired.any
0031 [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
0032
0033 The 'Unit: xxx' is added to brief description to indicate which pmu
0034 the event is belong to. Same event name but with different pmu can
0035 be supported.
0036
0037 Enable hybrid event with a specific pmu
0038
0039 To enable a core only event or atom only event, following syntax is supported:
0040
0041 cpu_core/<event name>/
0042 or
0043 cpu_atom/<event name>/
0044
0045 For example, count the 'cycles' event on core cpus.
0046
0047 perf stat -e cpu_core/cycles/
0048
0049 Create two events for one hardware event automatically
0050
0051 When creating one event and the event is available on both atom and core,
0052 two events are created automatically. One is for atom, the other is for
0053 core. Most of hardware events and cache events are available on both
0054 cpu_core and cpu_atom.
0055
0056 For hardware events, they have pre-defined configs (e.g. 0 for cycles).
0057 But on hybrid platform, kernel needs to know where the event comes from
0058 (from atom or from core). The original perf event type PERF_TYPE_HARDWARE
0059 can't carry pmu information. So now this type is extended to be PMU aware
0060 type. The PMU type ID is stored at attr.config[63:32].
0061
0062 PMU type ID is retrieved from sysfs.
0063 /sys/devices/cpu_atom/type
0064 /sys/devices/cpu_core/type
0065
0066 The new attr.config layout for PERF_TYPE_HARDWARE:
0067
0068 PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
0069 AA: hardware event ID
0070 EEEEEEEE: PMU type ID
0071
0072 Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
0073 PMU aware type. The PMU type ID is stored at attr.config[63:32].
0074
0075 The new attr.config layout for PERF_TYPE_HW_CACHE:
0076
0077 PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
0078 BB: hardware cache ID
0079 CC: hardware cache op ID
0080 DD: hardware cache op result ID
0081 EEEEEEEE: PMU type ID
0082
0083 When enabling a hardware event without specified pmu, such as,
0084 perf stat -e cycles -a (use system-wide in this example), two events
0085 are created automatically.
0086
0087 ------------------------------------------------------------
0088 perf_event_attr:
0089 size 120
0090 config 0x400000000
0091 sample_type IDENTIFIER
0092 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0093 disabled 1
0094 inherit 1
0095 exclude_guest 1
0096 ------------------------------------------------------------
0097
0098 and
0099
0100 ------------------------------------------------------------
0101 perf_event_attr:
0102 size 120
0103 config 0x800000000
0104 sample_type IDENTIFIER
0105 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0106 disabled 1
0107 inherit 1
0108 exclude_guest 1
0109 ------------------------------------------------------------
0110
0111 type 0 is PERF_TYPE_HARDWARE.
0112 0x4 in 0x400000000 indicates it's cpu_core pmu.
0113 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
0114
0115 The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
0116 and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
0117
0118 For perf-stat result, it displays two events:
0119
0120 Performance counter stats for 'system wide':
0121
0122 6,744,979 cpu_core/cycles/
0123 1,965,552 cpu_atom/cycles/
0124
0125 The first 'cycles' is core event, the second 'cycles' is atom event.
0126
0127 Thread mode example:
0128
0129 perf-stat reports the scaled counts for hybrid event and with a percentage
0130 displayed. The percentage is the event's running time/enabling time.
0131
0132 One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
0133 scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
0134
0135 perf stat -e cycles \-- taskset -c 16 ./triad_loop
0136
0137 As previous, two events are created.
0138
0139 ------------------------------------------------------------
0140 perf_event_attr:
0141 size 120
0142 config 0x400000000
0143 sample_type IDENTIFIER
0144 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0145 disabled 1
0146 inherit 1
0147 enable_on_exec 1
0148 exclude_guest 1
0149 ------------------------------------------------------------
0150
0151 and
0152
0153 ------------------------------------------------------------
0154 perf_event_attr:
0155 size 120
0156 config 0x800000000
0157 sample_type IDENTIFIER
0158 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
0159 disabled 1
0160 inherit 1
0161 enable_on_exec 1
0162 exclude_guest 1
0163 ------------------------------------------------------------
0164
0165 Performance counter stats for 'taskset -c 16 ./triad_loop':
0166
0167 233,066,666 cpu_core/cycles/ (0.43%)
0168 604,097,080 cpu_atom/cycles/ (99.57%)
0169
0170 perf-record:
0171
0172 If there is no '-e' specified in perf record, on hybrid platform,
0173 it creates two default 'cycles' and adds them to event list. One
0174 is for core, the other is for atom.
0175
0176 perf-stat:
0177
0178 If there is no '-e' specified in perf stat, on hybrid platform,
0179 besides of software events, following events are created and
0180 added to event list in order.
0181
0182 cpu_core/cycles/,
0183 cpu_atom/cycles/,
0184 cpu_core/instructions/,
0185 cpu_atom/instructions/,
0186 cpu_core/branches/,
0187 cpu_atom/branches/,
0188 cpu_core/branch-misses/,
0189 cpu_atom/branch-misses/
0190
0191 Of course, both perf-stat and perf-record support to enable
0192 hybrid event with a specific pmu.
0193
0194 e.g.
0195 perf stat -e cpu_core/cycles/
0196 perf stat -e cpu_atom/cycles/
0197 perf stat -e cpu_core/r1a/
0198 perf stat -e cpu_atom/L1-icache-loads/
0199 perf stat -e cpu_core/cycles/,cpu_atom/instructions/
0200 perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
0201
0202 But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
0203 warning and disable grouping, because the pmus in group are
0204 not matched (cpu_core vs. cpu_atom).