Back to home page

OSCL-LXR

 
 

    


0001 =================================
0002 Using ftrace to hook to functions
0003 =================================
0004 
0005 .. Copyright 2017 VMware Inc.
0006 ..   Author:   Steven Rostedt <srostedt@goodmis.org>
0007 ..  License:   The GNU Free Documentation License, Version 1.2
0008 ..               (dual licensed under the GPL v2)
0009 
0010 Written for: 4.14
0011 
0012 Introduction
0013 ============
0014 
0015 The ftrace infrastructure was originally created to attach callbacks to the
0016 beginning of functions in order to record and trace the flow of the kernel.
0017 But callbacks to the start of a function can have other use cases. Either
0018 for live kernel patching, or for security monitoring. This document describes
0019 how to use ftrace to implement your own function callbacks.
0020 
0021 
0022 The ftrace context
0023 ==================
0024 .. warning::
0025 
0026   The ability to add a callback to almost any function within the
0027   kernel comes with risks. A callback can be called from any context
0028   (normal, softirq, irq, and NMI). Callbacks can also be called just before
0029   going to idle, during CPU bring up and takedown, or going to user space.
0030   This requires extra care to what can be done inside a callback. A callback
0031   can be called outside the protective scope of RCU.
0032 
0033 There are helper functions to help against recursion, and making sure
0034 RCU is watching. These are explained below.
0035 
0036 
0037 The ftrace_ops structure
0038 ========================
0039 
0040 To register a function callback, a ftrace_ops is required. This structure
0041 is used to tell ftrace what function should be called as the callback
0042 as well as what protections the callback will perform and not require
0043 ftrace to handle.
0044 
0045 There is only one field that is needed to be set when registering
0046 an ftrace_ops with ftrace:
0047 
0048 .. code-block:: c
0049 
0050  struct ftrace_ops ops = {
0051        .func                    = my_callback_func,
0052        .flags                   = MY_FTRACE_FLAGS
0053        .private                 = any_private_data_structure,
0054  };
0055 
0056 Both .flags and .private are optional. Only .func is required.
0057 
0058 To enable tracing call::
0059 
0060     register_ftrace_function(&ops);
0061 
0062 To disable tracing call::
0063 
0064     unregister_ftrace_function(&ops);
0065 
0066 The above is defined by including the header::
0067 
0068     #include <linux/ftrace.h>
0069 
0070 The registered callback will start being called some time after the
0071 register_ftrace_function() is called and before it returns. The exact time
0072 that callbacks start being called is dependent upon architecture and scheduling
0073 of services. The callback itself will have to handle any synchronization if it
0074 must begin at an exact moment.
0075 
0076 The unregister_ftrace_function() will guarantee that the callback is
0077 no longer being called by functions after the unregister_ftrace_function()
0078 returns. Note that to perform this guarantee, the unregister_ftrace_function()
0079 may take some time to finish.
0080 
0081 
0082 The callback function
0083 =====================
0084 
0085 The prototype of the callback function is as follows (as of v4.14):
0086 
0087 .. code-block:: c
0088 
0089    void callback_func(unsigned long ip, unsigned long parent_ip,
0090                       struct ftrace_ops *op, struct pt_regs *regs);
0091 
0092 @ip
0093          This is the instruction pointer of the function that is being traced.
0094          (where the fentry or mcount is within the function)
0095 
0096 @parent_ip
0097         This is the instruction pointer of the function that called the
0098         the function being traced (where the call of the function occurred).
0099 
0100 @op
0101         This is a pointer to ftrace_ops that was used to register the callback.
0102         This can be used to pass data to the callback via the private pointer.
0103 
0104 @regs
0105         If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
0106         flags are set in the ftrace_ops structure, then this will be pointing
0107         to the pt_regs structure like it would be if an breakpoint was placed
0108         at the start of the function where ftrace was tracing. Otherwise it
0109         either contains garbage, or NULL.
0110 
0111 Protect your callback
0112 =====================
0113 
0114 As functions can be called from anywhere, and it is possible that a function
0115 called by a callback may also be traced, and call that same callback,
0116 recursion protection must be used. There are two helper functions that
0117 can help in this regard. If you start your code with:
0118 
0119 .. code-block:: c
0120 
0121         int bit;
0122 
0123         bit = ftrace_test_recursion_trylock(ip, parent_ip);
0124         if (bit < 0)
0125                 return;
0126 
0127 and end it with:
0128 
0129 .. code-block:: c
0130 
0131         ftrace_test_recursion_unlock(bit);
0132 
0133 The code in between will be safe to use, even if it ends up calling a
0134 function that the callback is tracing. Note, on success,
0135 ftrace_test_recursion_trylock() will disable preemption, and the
0136 ftrace_test_recursion_unlock() will enable it again (if it was previously
0137 enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
0138 ftrace_test_recursion_trylock() to record where the recursion happened
0139 (if CONFIG_FTRACE_RECORD_RECURSION is set).
0140 
0141 Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
0142 (as explained below), then a helper trampoline will be used to test
0143 for recursion for the callback and no recursion test needs to be done.
0144 But this is at the expense of a slightly more overhead from an extra
0145 function call.
0146 
0147 If your callback accesses any data or critical section that requires RCU
0148 protection, it is best to make sure that RCU is "watching", otherwise
0149 that data or critical section will not be protected as expected. In this
0150 case add:
0151 
0152 .. code-block:: c
0153 
0154         if (!rcu_is_watching())
0155                 return;
0156 
0157 Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops
0158 (as explained below), then a helper trampoline will be used to test
0159 for rcu_is_watching for the callback and no other test needs to be done.
0160 But this is at the expense of a slightly more overhead from an extra
0161 function call.
0162 
0163 
0164 The ftrace FLAGS
0165 ================
0166 
0167 The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
0168 Some of the flags are used for internal infrastructure of ftrace, but the
0169 ones that users should be aware of are the following:
0170 
0171 FTRACE_OPS_FL_SAVE_REGS
0172         If the callback requires reading or modifying the pt_regs
0173         passed to the callback, then it must set this flag. Registering
0174         a ftrace_ops with this flag set on an architecture that does not
0175         support passing of pt_regs to the callback will fail.
0176 
0177 FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
0178         Similar to SAVE_REGS but the registering of a
0179         ftrace_ops on an architecture that does not support passing of regs
0180         will not fail with this flag set. But the callback must check if
0181         regs is NULL or not to determine if the architecture supports it.
0182 
0183 FTRACE_OPS_FL_RECURSION
0184         By default, it is expected that the callback can handle recursion.
0185         But if the callback is not that worried about overehead, then
0186         setting this bit will add the recursion protection around the
0187         callback by calling a helper function that will do the recursion
0188         protection and only call the callback if it did not recurse.
0189 
0190         Note, if this flag is not set, and recursion does occur, it could
0191         cause the system to crash, and possibly reboot via a triple fault.
0192 
0193         Not, if this flag is set, then the callback will always be called
0194         with preemption disabled. If it is not set, then it is possible
0195         (but not guaranteed) that the callback will be called in
0196         preemptable context.
0197 
0198 FTRACE_OPS_FL_IPMODIFY
0199         Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
0200         the traced function (have another function called instead of the
0201         traced function), it requires setting this flag. This is what live
0202         kernel patches uses. Without this flag the pt_regs->ip can not be
0203         modified.
0204 
0205         Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
0206         registered to any given function at a time.
0207 
0208 FTRACE_OPS_FL_RCU
0209         If this is set, then the callback will only be called by functions
0210         where RCU is "watching". This is required if the callback function
0211         performs any rcu_read_lock() operation.
0212 
0213         RCU stops watching when the system goes idle, the time when a CPU
0214         is taken down and comes back online, and when entering from kernel
0215         to user space and back to kernel space. During these transitions,
0216         a callback may be executed and RCU synchronization will not protect
0217         it.
0218 
0219 FTRACE_OPS_FL_PERMANENT
0220         If this is set on any ftrace ops, then the tracing cannot disabled by
0221         writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with
0222         the flag set cannot be registered if ftrace_enabled is 0.
0223 
0224         Livepatch uses it not to lose the function redirection, so the system
0225         stays protected.
0226 
0227 
0228 Filtering which functions to trace
0229 ==================================
0230 
0231 If a callback is only to be called from specific functions, a filter must be
0232 set up. The filters are added by name, or ip if it is known.
0233 
0234 .. code-block:: c
0235 
0236    int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
0237                          int len, int reset);
0238 
0239 @ops
0240         The ops to set the filter with
0241 
0242 @buf
0243         The string that holds the function filter text.
0244 @len
0245         The length of the string.
0246 
0247 @reset
0248         Non-zero to reset all filters before applying this filter.
0249 
0250 Filters denote which functions should be enabled when tracing is enabled.
0251 If @buf is NULL and reset is set, all functions will be enabled for tracing.
0252 
0253 The @buf can also be a glob expression to enable all functions that
0254 match a specific pattern.
0255 
0256 See Filter Commands in :file:`Documentation/trace/ftrace.rst`.
0257 
0258 To just trace the schedule function:
0259 
0260 .. code-block:: c
0261 
0262    ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
0263 
0264 To add more functions, call the ftrace_set_filter() more than once with the
0265 @reset parameter set to zero. To remove the current filter set and replace it
0266 with new functions defined by @buf, have @reset be non-zero.
0267 
0268 To remove all the filtered functions and trace all functions:
0269 
0270 .. code-block:: c
0271 
0272    ret = ftrace_set_filter(&ops, NULL, 0, 1);
0273 
0274 
0275 Sometimes more than one function has the same name. To trace just a specific
0276 function in this case, ftrace_set_filter_ip() can be used.
0277 
0278 .. code-block:: c
0279 
0280    ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
0281 
0282 Although the ip must be the address where the call to fentry or mcount is
0283 located in the function. This function is used by perf and kprobes that
0284 gets the ip address from the user (usually using debug info from the kernel).
0285 
0286 If a glob is used to set the filter, functions can be added to a "notrace"
0287 list that will prevent those functions from calling the callback.
0288 The "notrace" list takes precedence over the "filter" list. If the
0289 two lists are non-empty and contain the same functions, the callback will not
0290 be called by any function.
0291 
0292 An empty "notrace" list means to allow all functions defined by the filter
0293 to be traced.
0294 
0295 .. code-block:: c
0296 
0297    int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
0298                           int len, int reset);
0299 
0300 This takes the same parameters as ftrace_set_filter() but will add the
0301 functions it finds to not be traced. This is a separate list from the
0302 filter list, and this function does not modify the filter list.
0303 
0304 A non-zero @reset will clear the "notrace" list before adding functions
0305 that match @buf to it.
0306 
0307 Clearing the "notrace" list is the same as clearing the filter list
0308 
0309 .. code-block:: c
0310 
0311   ret = ftrace_set_notrace(&ops, NULL, 0, 1);
0312 
0313 The filter and notrace lists may be changed at any time. If only a set of
0314 functions should call the callback, it is best to set the filters before
0315 registering the callback. But the changes may also happen after the callback
0316 has been registered.
0317 
0318 If a filter is in place, and the @reset is non-zero, and @buf contains a
0319 matching glob to functions, the switch will happen during the time of
0320 the ftrace_set_filter() call. At no time will all functions call the callback.
0321 
0322 .. code-block:: c
0323 
0324    ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
0325 
0326    register_ftrace_function(&ops);
0327 
0328    msleep(10);
0329 
0330    ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
0331 
0332 is not the same as:
0333 
0334 .. code-block:: c
0335 
0336    ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
0337 
0338    register_ftrace_function(&ops);
0339 
0340    msleep(10);
0341 
0342    ftrace_set_filter(&ops, NULL, 0, 1);
0343 
0344    ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
0345 
0346 As the latter will have a short time where all functions will call
0347 the callback, between the time of the reset, and the time of the
0348 new setting of the filter.