0001 ==============
0002 OSNOISE Tracer
0003 ==============
0004
0005 In the context of high-performance computing (HPC), the Operating System
0006 Noise (*osnoise*) refers to the interference experienced by an application
0007 due to activities inside the operating system. In the context of Linux,
0008 NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
0009 system. Moreover, hardware-related jobs can also cause noise, for example,
0010 via SMIs.
0011
0012 hwlat_detector is one of the tools used to identify the most complex
0013 source of noise: *hardware noise*.
0014
0015 In a nutshell, the hwlat_detector creates a thread that runs
0016 periodically for a given period. At the beginning of a period, the thread
0017 disables interrupt and starts sampling. While running, the hwlatd
0018 thread reads the time in a loop. As interrupts are disabled, threads,
0019 IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the
0020 cause of any gap between two different reads of the time roots either on
0021 NMI or in the hardware itself. At the end of the period, hwlatd enables
0022 interrupts and reports the max observed gap between the reads. It also
0023 prints a NMI occurrence counter. If the output does not report NMI
0024 executions, the user can conclude that the hardware is the culprit for
0025 the latency. The hwlat detects the NMI execution by observing
0026 the entry and exit of a NMI.
0027
0028 The osnoise tracer leverages the hwlat_detector by running a
0029 similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing
0030 all the sources of *osnoise* during its execution. Using the same approach
0031 of hwlat, osnoise takes note of the entry and exit point of any
0032 source of interferences, increasing a per-cpu interference counter. The
0033 osnoise tracer also saves an interference counter for each source of
0034 interference. The interference counter for NMI, IRQs, SoftIRQs, and
0035 threads is increased anytime the tool observes these interferences' entry
0036 events. When a noise happens without any interference from the operating
0037 system level, the hardware noise counter increases, pointing to a
0038 hardware-related noise. In this way, osnoise can account for any
0039 source of interference. At the end of the period, the osnoise tracer
0040 prints the sum of all noise, the max single noise, the percentage of CPU
0041 available for the thread, and the counters for the noise sources.
0042
0043 Usage
0044 -----
0045
0046 Write the ASCII text "osnoise" into the current_tracer file of the
0047 tracing system (generally mounted at /sys/kernel/tracing).
0048
0049 For example::
0050
0051 [root@f32 ~]# cd /sys/kernel/tracing/
0052 [root@f32 tracing]# echo osnoise > current_tracer
0053
0054 It is possible to follow the trace by reading the trace file::
0055
0056 [root@f32 tracing]# cat trace
0057 # tracer: osnoise
0058 #
0059 # _-----=> irqs-off
0060 # / _----=> need-resched
0061 # | / _---=> hardirq/softirq
0062 # || / _--=> preempt-depth MAX
0063 # || / SINGLE Interference counters:
0064 # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
0065 # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
0066 # | | | |||| | | | | | | | | | |
0067 <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
0068 <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
0069 <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
0070 <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
0071 <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
0072 <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
0073 <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
0074 <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
0075
0076 In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
0077 tracer prints a message at the end of each period for each CPU that is
0078 running an osnoise/ thread. The osnoise specific fields report:
0079
0080 - The RUNTIME IN US reports the amount of time in microseconds that
0081 the osnoise thread kept looping reading the time.
0082 - The NOISE IN US reports the sum of noise in microseconds observed
0083 by the osnoise tracer during the associated runtime.
0084 - The % OF CPU AVAILABLE reports the percentage of CPU available for
0085 the osnoise thread during the runtime window.
0086 - The MAX SINGLE NOISE IN US reports the maximum single noise observed
0087 during the runtime window.
0088 - The Interference counters display how many each of the respective
0089 interference happened during the runtime window.
0090
0091 Note that the example above shows a high number of HW noise samples.
0092 The reason being is that this sample was taken on a virtual machine,
0093 and the host interference is detected as a hardware interference.
0094
0095 Tracer options
0096 ---------------------
0097
0098 The tracer has a set of options inside the osnoise directory, they are:
0099
0100 - osnoise/cpus: CPUs at which a osnoise thread will execute.
0101 - osnoise/period_us: the period of the osnoise thread.
0102 - osnoise/runtime_us: how long an osnoise thread will look for noise.
0103 - osnoise/stop_tracing_us: stop the system tracing if a single noise
0104 higher than the configured value happens. Writing 0 disables this
0105 option.
0106 - osnoise/stop_tracing_total_us: stop the system tracing if total noise
0107 higher than the configured value happens. Writing 0 disables this
0108 option.
0109 - tracing_threshold: the minimum delta between two time() reads to be
0110 considered as noise, in us. When set to 0, the default value will
0111 be used, which is currently 5 us.
0112
0113 Additional Tracing
0114 ------------------
0115
0116 In addition to the tracer, a set of tracepoints were added to
0117 facilitate the identification of the osnoise source.
0118
0119 - osnoise:sample_threshold: printed anytime a noise is higher than
0120 the configurable tolerance_ns.
0121 - osnoise:nmi_noise: noise from NMI, including the duration.
0122 - osnoise:irq_noise: noise from an IRQ, including the duration.
0123 - osnoise:softirq_noise: noise from a SoftIRQ, including the
0124 duration.
0125 - osnoise:thread_noise: noise from a thread, including the duration.
0126
0127 Note that all the values are *net values*. For example, if while osnoise
0128 is running, another thread preempts the osnoise thread, it will start a
0129 thread_noise duration at the start. Then, an IRQ takes place, preempting
0130 the thread_noise, starting a irq_noise. When the IRQ ends its execution,
0131 it will compute its duration, and this duration will be subtracted from
0132 the thread_noise, in such a way as to avoid the double accounting of the
0133 IRQ execution. This logic is valid for all sources of noise.
0134
0135 Here is one example of the usage of these tracepoints::
0136
0137 osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
0138 osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
0139 migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
0140 osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8812 ns interferences 2
0141
0142 In this example, a noise sample of 8 microseconds was reported in the last
0143 line, pointing to two interferences. Looking backward in the trace, the
0144 two previous entries were about the migration thread running after a
0145 timer IRQ execution. The first event is not part of the noise because
0146 it took place one millisecond before.
0147
0148 It is worth noticing that the sum of the duration reported in the
0149 tracepoints is smaller than eight us reported in the sample_threshold.
0150 The reason roots in the overhead of the entry and exit code that happens
0151 before and after any interference execution. This justifies the dual
0152 approach: measuring thread and tracing.