Back to home page

OSCL-LXR

 
 

    


0001 ###############
0002 Timerlat tracer
0003 ###############
0004 
0005 The timerlat tracer aims to help the preemptive kernel developers to
0006 find sources of wakeup latencies of real-time threads. Like cyclictest,
0007 the tracer sets a periodic timer that wakes up a thread. The thread then
0008 computes a *wakeup latency* value as the difference between the *current
0009 time* and the *absolute time* that the timer was set to expire. The main
0010 goal of timerlat is tracing in such a way to help kernel developers.
0011 
0012 Usage
0013 -----
0014 
0015 Write the ASCII text "timerlat" into the current_tracer file of the
0016 tracing system (generally mounted at /sys/kernel/tracing).
0017 
0018 For example::
0019 
0020         [root@f32 ~]# cd /sys/kernel/tracing/
0021         [root@f32 tracing]# echo timerlat > current_tracer
0022 
0023 It is possible to follow the trace by reading the trace trace file::
0024 
0025   [root@f32 tracing]# cat trace
0026   # tracer: timerlat
0027   #
0028   #                              _-----=> irqs-off
0029   #                             / _----=> need-resched
0030   #                            | / _---=> hardirq/softirq
0031   #                            || / _--=> preempt-depth
0032   #                            || /
0033   #                            ||||             ACTIVATION
0034   #         TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
0035   #            | |         |   ||||      |         |                  |                       |
0036           <idle>-0       [000] d.h1    54.029328: #1     context    irq timer_latency       932 ns
0037            <...>-867     [000] ....    54.029339: #1     context thread timer_latency     11700 ns
0038           <idle>-0       [001] dNh1    54.029346: #1     context    irq timer_latency      2833 ns
0039            <...>-868     [001] ....    54.029353: #1     context thread timer_latency      9820 ns
0040           <idle>-0       [000] d.h1    54.030328: #2     context    irq timer_latency       769 ns
0041            <...>-867     [000] ....    54.030330: #2     context thread timer_latency      3070 ns
0042           <idle>-0       [001] d.h1    54.030344: #2     context    irq timer_latency       935 ns
0043            <...>-868     [001] ....    54.030347: #2     context thread timer_latency      4351 ns
0044 
0045 
0046 The tracer creates a per-cpu kernel thread with real-time priority that
0047 prints two lines at every activation. The first is the *timer latency*
0048 observed at the *hardirq* context before the activation of the thread.
0049 The second is the *timer latency* observed by the thread. The ACTIVATION
0050 ID field serves to relate the *irq* execution to its respective *thread*
0051 execution.
0052 
0053 The *irq*/*thread* splitting is important to clarify in which context
0054 the unexpected high value is coming from. The *irq* context can be
0055 delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
0056 or by thread masking interrupts. Once the timer happens, the delay
0057 can also be influenced by blocking caused by threads. For example, by
0058 postponing the scheduler execution via preempt_disable(), scheduler
0059 execution, or masking interrupts. Threads can also be delayed by the
0060 interference from other threads and IRQs.
0061 
0062 Tracer options
0063 ---------------------
0064 
0065 The timerlat tracer is built on top of osnoise tracer.
0066 So its configuration is also done in the osnoise/ config
0067 directory. The timerlat configs are:
0068 
0069  - cpus: CPUs at which a timerlat thread will execute.
0070  - timerlat_period_us: the period of the timerlat thread.
0071  - stop_tracing_us: stop the system tracing if a
0072    timer latency at the *irq* context higher than the configured
0073    value happens. Writing 0 disables this option.
0074  - stop_tracing_total_us: stop the system tracing if a
0075    timer latency at the *thread* context is higher than the configured
0076    value happens. Writing 0 disables this option.
0077  - print_stack: save the stack of the IRQ occurrence. The stack is printed
0078    after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
0079    is hit.
0080 
0081 timerlat and osnoise
0082 ----------------------------
0083 
0084 The timerlat can also take advantage of the osnoise: traceevents.
0085 For example::
0086 
0087         [root@f32 ~]# cd /sys/kernel/tracing/
0088         [root@f32 tracing]# echo timerlat > current_tracer
0089         [root@f32 tracing]# echo 1 > events/osnoise/enable
0090         [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
0091         [root@f32 tracing]# tail -10 trace
0092              cc1-87882   [005] d..h...   548.771078: #402268 context    irq timer_latency     13585 ns
0093              cc1-87882   [005] dNLh1..   548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
0094              cc1-87882   [005] dNLh2..   548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
0095              cc1-87882   [005] d...3..   548.771102: thread_noise:      cc1:87882 start 548.771078243 duration 9909 ns
0096       timerlat/5-1035    [005] .......   548.771104: #402268 context thread timer_latency     39960 ns
0097 
0098 In this case, the root cause of the timer latency does not point to a
0099 single cause but to multiple ones. Firstly, the timer IRQ was delayed
0100 for 13 us, which may point to a long IRQ disabled section (see IRQ
0101 stacktrace section). Then the timer interrupt that wakes up the timerlat
0102 thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
0103 the cc1 thread noise took 9909 ns of time before the context switch.
0104 Such pieces of evidence are useful for the developer to use other
0105 tracing methods to figure out how to debug and optimize the system.
0106 
0107 It is worth mentioning that the *duration* values reported
0108 by the osnoise: events are *net* values. For example, the
0109 thread_noise does not include the duration of the overhead caused
0110 by the IRQ execution (which indeed accounted for 12736 ns). But
0111 the values reported by the timerlat tracer (timerlat_latency)
0112 are *gross* values.
0113 
0114 The art below illustrates a CPU timeline and how the timerlat tracer
0115 observes it at the top and the osnoise: events at the bottom. Each "-"
0116 in the timelines means circa 1 us, and the time moves ==>::
0117 
0118       External     timer irq                   thread
0119        clock        latency                    latency
0120        event        13585 ns                   39960 ns
0121          |             ^                         ^
0122          v             |                         |
0123          |-------------|                         |
0124          |-------------+-------------------------|
0125                        ^                         ^
0126   ========================================================================
0127                     [tmr irq]  [dev irq]
0128   [another thread...^       v..^       v.......][timerlat/ thread]  <-- CPU timeline
0129   =========================================================================
0130                     |-------|  |-------|
0131                             |--^       v-------|
0132                             |          |       |
0133                             |          |       + thread_noise: 9909 ns
0134                             |          +-> irq_noise: 6139 ns
0135                             +-> irq_noise: 7597 ns
0136 
0137 IRQ stacktrace
0138 ---------------------------
0139 
0140 The osnoise/print_stack option is helpful for the cases in which a thread
0141 noise causes the major factor for the timer latency, because of preempt or
0142 irq disabled. For example::
0143 
0144         [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
0145         [root@f32 tracing]# echo 500 > osnoise/print_stack
0146         [root@f32 tracing]# echo timerlat > current_tracer
0147         [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
0148           insmod-1026    [007] dN.h1..   200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
0149           insmod-1026    [007] d..h1..   200.202587: #29800 context    irq timer_latency      1616 ns
0150           insmod-1026    [007] dN.h2..   200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
0151           insmod-1026    [007] dN.h3..   200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
0152           insmod-1026    [007] d...3..   200.203444: thread_noise:   insmod:1026 start 200.202586933 duration 838681 ns
0153       timerlat/7-1001    [007] .......   200.203445: #29800 context thread timer_latency    859978 ns
0154       timerlat/7-1001    [007] ....1..   200.203446: <stack trace>
0155   => timerlat_irq
0156   => __hrtimer_run_queues
0157   => hrtimer_interrupt
0158   => __sysvec_apic_timer_interrupt
0159   => asm_call_irq_on_stack
0160   => sysvec_apic_timer_interrupt
0161   => asm_sysvec_apic_timer_interrupt
0162   => delay_tsc
0163   => dummy_load_1ms_pd_init
0164   => do_one_initcall
0165   => do_init_module
0166   => __do_sys_finit_module
0167   => do_syscall_64
0168   => entry_SYSCALL_64_after_hwframe
0169 
0170 In this case, it is possible to see that the thread added the highest
0171 contribution to the *timer latency* and the stack trace, saved during
0172 the timerlat IRQ handler, points to a function named
0173 dummy_load_1ms_pd_init, which had the following code (on purpose)::
0174 
0175         static int __init dummy_load_1ms_pd_init(void)
0176         {
0177                 preempt_disable();
0178                 mdelay(1);
0179                 preempt_enable();
0180                 return 0;
0181 
0182         }