Back to home page

OSCL-LXR

 
 

    


0001 =========================================
0002 user_events: User-based Event Tracing
0003 =========================================
0004 
0005 :Author: Beau Belgrave
0006 
0007 Overview
0008 --------
0009 User based trace events allow user processes to create events and trace data
0010 that can be viewed via existing tools, such as ftrace and perf.
0011 To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
0012 
0013 Programs can view status of the events via
0014 /sys/kernel/debug/tracing/user_events_status and can both register and write
0015 data out via /sys/kernel/debug/tracing/user_events_data.
0016 
0017 Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
0018 delete user based events via the u: prefix. The format of the command to
0019 dynamic_events is the same as the ioctl with the u: prefix applied.
0020 
0021 Typically programs will register a set of events that they wish to expose to
0022 tools that can read trace_events (such as ftrace and perf). The registration
0023 process gives back two ints to the program for each event. The first int is the
0024 status index. This index describes which byte in the
0025 /sys/kernel/debug/tracing/user_events_status file represents this event. The
0026 second int is the write index. This index describes the data when a write() or
0027 writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
0028 
0029 The structures referenced in this document are contained with the
0030 /include/uap/linux/user_events.h file in the source tree.
0031 
0032 **NOTE:** *Both user_events_status and user_events_data are under the tracefs
0033 filesystem and may be mounted at different paths than above.*
0034 
0035 Registering
0036 -----------
0037 Registering within a user process is done via ioctl() out to the
0038 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
0039 DIAG_IOCSREG.
0040 
0041 This command takes a struct user_reg as an argument::
0042 
0043   struct user_reg {
0044         u32 size;
0045         u64 name_args;
0046         u32 status_index;
0047         u32 write_index;
0048   };
0049 
0050 The struct user_reg requires two inputs, the first is the size of the structure
0051 to ensure forward and backward compatibility. The second is the command string
0052 to issue for registering. Upon success two outputs are set, the status index
0053 and the write index.
0054 
0055 User based events show up under tracefs like any other event under the
0056 subsystem named "user_events". This means tools that wish to attach to the
0057 events need to use /sys/kernel/debug/tracing/events/user_events/[name]/enable
0058 or perf record -e user_events:[name] when attaching/recording.
0059 
0060 **NOTE:** *The write_index returned is only valid for the FD that was used*
0061 
0062 Command Format
0063 ^^^^^^^^^^^^^^
0064 The command string format is as follows::
0065 
0066   name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
0067 
0068 Supported Flags
0069 ^^^^^^^^^^^^^^^
0070 None yet
0071 
0072 Field Format
0073 ^^^^^^^^^^^^
0074 ::
0075 
0076   type name [size]
0077 
0078 Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
0079 User programs are encouraged to use clearly sized types like u32.
0080 
0081 **NOTE:** *Long is not supported since size can vary between user and kernel.*
0082 
0083 The size is only valid for types that start with a struct prefix.
0084 This allows user programs to describe custom structs out to tools, if required.
0085 
0086 For example, a struct in C that looks like this::
0087 
0088   struct mytype {
0089     char data[20];
0090   };
0091 
0092 Would be represented by the following field::
0093 
0094   struct mytype myname 20
0095 
0096 Deleting
0097 -----------
0098 Deleting an event from within a user process is done via ioctl() out to the
0099 /sys/kernel/debug/tracing/user_events_data file. The command to issue is
0100 DIAG_IOCSDEL.
0101 
0102 This command only requires a single string specifying the event to delete by
0103 its name. Delete will only succeed if there are no references left to the
0104 event (in both user and kernel space). User programs should use a separate file
0105 to request deletes than the one used for registration due to this.
0106 
0107 Status
0108 ------
0109 When tools attach/record user based events the status of the event is updated
0110 in realtime. This allows user programs to only incur the cost of the write() or
0111 writev() calls when something is actively attached to the event.
0112 
0113 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
0114 check the status for each event that is registered. The byte to check in the
0115 file is given back after the register ioctl() via user_reg.status_index.
0116 Currently the size of user_events_status is a single page, however, custom
0117 kernel configurations can change this size to allow more user based events. In
0118 all cases the size of the file is a multiple of a page size.
0119 
0120 For example, if the register ioctl() gives back a status_index of 3 you would
0121 check byte 3 of the returned mmap data to see if anything is attached to that
0122 event.
0123 
0124 Administrators can easily check the status of all registered events by reading
0125 the user_events_status file directly via a terminal. The output is as follows::
0126 
0127   Byte:Name [# Comments]
0128   ...
0129 
0130   Active: ActiveCount
0131   Busy: BusyCount
0132   Max: MaxCount
0133 
0134 For example, on a system that has a single event the output looks like this::
0135 
0136   1:test
0137 
0138   Active: 1
0139   Busy: 0
0140   Max: 4096
0141 
0142 If a user enables the user event via ftrace, the output would change to this::
0143 
0144   1:test # Used by ftrace
0145 
0146   Active: 1
0147   Busy: 1
0148   Max: 4096
0149 
0150 **NOTE:** *A status index of 0 will never be returned. This allows user
0151 programs to have an index that can be used on error cases.*
0152 
0153 Status Bits
0154 ^^^^^^^^^^^
0155 The byte being checked will be non-zero if anything is attached. Programs can
0156 check specific bits in the byte to see what mechanism has been attached.
0157 
0158 The following values are defined to aid in checking what has been attached:
0159 
0160 **EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
0161 
0162 **EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
0163 
0164 Writing Data
0165 ------------
0166 After registering an event the same fd that was used to register can be used
0167 to write an entry for that event. The write_index returned must be at the start
0168 of the data, then the remaining data is treated as the payload of the event.
0169 
0170 For example, if write_index returned was 1 and I wanted to write out an int
0171 payload of the event. Then the data would have to be 8 bytes (2 ints) in size,
0172 with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
0173 value I want as the payload.
0174 
0175 In memory this would look like this::
0176 
0177   int index;
0178   int payload;
0179 
0180 User programs might have well known structs that they wish to use to emit out
0181 as payloads. In those cases writev() can be used, with the first vector being
0182 the index and the following vector(s) being the actual event payload.
0183 
0184 For example, if I have a struct like this::
0185 
0186   struct payload {
0187         int src;
0188         int dst;
0189         int flags;
0190   };
0191 
0192 It's advised for user programs to do the following::
0193 
0194   struct iovec io[2];
0195   struct payload e;
0196 
0197   io[0].iov_base = &write_index;
0198   io[0].iov_len = sizeof(write_index);
0199   io[1].iov_base = &e;
0200   io[1].iov_len = sizeof(e);
0201 
0202   writev(fd, (const struct iovec*)io, 2);
0203 
0204 **NOTE:** *The write_index is not emitted out into the trace being recorded.*
0205 
0206 Example Code
0207 ------------
0208 See sample code in samples/user_events.