0001 =========================================
0002 user_events: User-based Event Tracing
0003 =========================================
0004
0005 :Author: Beau Belgrave
0006
0007 Overview
0008 --------
0009 User based trace events allow user processes to create events and trace data
0010 that can be viewed via existing tools, such as ftrace and perf.
0011 To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
0012
0013 Programs can view status of the events via
0014 /sys/kernel/debug/tracing/user_events_status and can both register and write
0015 data out via /sys/kernel/debug/tracing/user_events_data.
0016
0017 Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
0018 delete user based events via the u: prefix. The format of the command to
0019 dynamic_events is the same as the ioctl with the u: prefix applied.
0020
0021 Typically programs will register a set of events that they wish to expose to
0022 tools that can read trace_events (such as ftrace and perf). The registration
0023 process gives back two ints to the program for each event. The first int is the
0024 status index. This index describes which byte in the
0025 /sys/kernel/debug/tracing/user_events_status file represents this event. The
0026 second int is the write index. This index describes the data when a write() or
0027 writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
0028
0029 The structures referenced in this document are contained with the
0030 /include/uap/linux/user_events.h file in the source tree.
0031
0032 **NOTE:** *Both user_events_status and user_events_data are under the tracefs
0033 filesystem and may be mounted at different paths than above.*
0034
0035 Registering
0036 -----------
0037 Registering within a user process is done via ioctl() out to the
0038 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
0039 DIAG_IOCSREG.
0040
0041 This command takes a struct user_reg as an argument::
0042
0043 struct user_reg {
0044 u32 size;
0045 u64 name_args;
0046 u32 status_index;
0047 u32 write_index;
0048 };
0049
0050 The struct user_reg requires two inputs, the first is the size of the structure
0051 to ensure forward and backward compatibility. The second is the command string
0052 to issue for registering. Upon success two outputs are set, the status index
0053 and the write index.
0054
0055 User based events show up under tracefs like any other event under the
0056 subsystem named "user_events". This means tools that wish to attach to the
0057 events need to use /sys/kernel/debug/tracing/events/user_events/[name]/enable
0058 or perf record -e user_events:[name] when attaching/recording.
0059
0060 **NOTE:** *The write_index returned is only valid for the FD that was used*
0061
0062 Command Format
0063 ^^^^^^^^^^^^^^
0064 The command string format is as follows::
0065
0066 name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
0067
0068 Supported Flags
0069 ^^^^^^^^^^^^^^^
0070 None yet
0071
0072 Field Format
0073 ^^^^^^^^^^^^
0074 ::
0075
0076 type name [size]
0077
0078 Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
0079 User programs are encouraged to use clearly sized types like u32.
0080
0081 **NOTE:** *Long is not supported since size can vary between user and kernel.*
0082
0083 The size is only valid for types that start with a struct prefix.
0084 This allows user programs to describe custom structs out to tools, if required.
0085
0086 For example, a struct in C that looks like this::
0087
0088 struct mytype {
0089 char data[20];
0090 };
0091
0092 Would be represented by the following field::
0093
0094 struct mytype myname 20
0095
0096 Deleting
0097 -----------
0098 Deleting an event from within a user process is done via ioctl() out to the
0099 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
0100 DIAG_IOCSDEL.
0101
0102 This command only requires a single string specifying the event to delete by
0103 its name. Delete will only succeed if there are no references left to the
0104 event (in both user and kernel space). User programs should use a separate file
0105 to request deletes than the one used for registration due to this.
0106
0107 Status
0108 ------
0109 When tools attach/record user based events the status of the event is updated
0110 in realtime. This allows user programs to only incur the cost of the write() or
0111 writev() calls when something is actively attached to the event.
0112
0113 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
0114 check the status for each event that is registered. The byte to check in the
0115 file is given back after the register ioctl() via user_reg.status_index.
0116 Currently the size of user_events_status is a single page, however, custom
0117 kernel configurations can change this size to allow more user based events. In
0118 all cases the size of the file is a multiple of a page size.
0119
0120 For example, if the register ioctl() gives back a status_index of 3 you would
0121 check byte 3 of the returned mmap data to see if anything is attached to that
0122 event.
0123
0124 Administrators can easily check the status of all registered events by reading
0125 the user_events_status file directly via a terminal. The output is as follows::
0126
0127 Byte:Name [# Comments]
0128 ...
0129
0130 Active: ActiveCount
0131 Busy: BusyCount
0132 Max: MaxCount
0133
0134 For example, on a system that has a single event the output looks like this::
0135
0136 1:test
0137
0138 Active: 1
0139 Busy: 0
0140 Max: 4096
0141
0142 If a user enables the user event via ftrace, the output would change to this::
0143
0144 1:test # Used by ftrace
0145
0146 Active: 1
0147 Busy: 1
0148 Max: 4096
0149
0150 **NOTE:** *A status index of 0 will never be returned. This allows user
0151 programs to have an index that can be used on error cases.*
0152
0153 Status Bits
0154 ^^^^^^^^^^^
0155 The byte being checked will be non-zero if anything is attached. Programs can
0156 check specific bits in the byte to see what mechanism has been attached.
0157
0158 The following values are defined to aid in checking what has been attached:
0159
0160 **EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
0161
0162 **EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
0163
0164 Writing Data
0165 ------------
0166 After registering an event the same fd that was used to register can be used
0167 to write an entry for that event. The write_index returned must be at the start
0168 of the data, then the remaining data is treated as the payload of the event.
0169
0170 For example, if write_index returned was 1 and I wanted to write out an int
0171 payload of the event. Then the data would have to be 8 bytes (2 ints) in size,
0172 with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
0173 value I want as the payload.
0174
0175 In memory this would look like this::
0176
0177 int index;
0178 int payload;
0179
0180 User programs might have well known structs that they wish to use to emit out
0181 as payloads. In those cases writev() can be used, with the first vector being
0182 the index and the following vector(s) being the actual event payload.
0183
0184 For example, if I have a struct like this::
0185
0186 struct payload {
0187 int src;
0188 int dst;
0189 int flags;
0190 };
0191
0192 It's advised for user programs to do the following::
0193
0194 struct iovec io[2];
0195 struct payload e;
0196
0197 io[0].iov_base = &write_index;
0198 io[0].iov_len = sizeof(write_index);
0199 io[1].iov_base = &e;
0200 io[1].iov_len = sizeof(e);
0201
0202 writev(fd, (const struct iovec*)io, 2);
0203
0204 **NOTE:** *The write_index is not emitted out into the trace being recorded.*
0205
0206 Example Code
0207 ------------
0208 See sample code in samples/user_events.