Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ===================================
0004 Running BPF programs from userspace
0005 ===================================
0006 
0007 This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
0008 from userspace.
0009 
0010 .. contents::
0011     :local:
0012     :depth: 2
0013 
0014 
0015 Overview
0016 --------
0017 
0018 The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
0019 execute a BPF program in the kernel and return the results to userspace. This
0020 can be used to unit test BPF programs against user-supplied context objects, and
0021 as way to explicitly execute programs in the kernel for their side effects. The
0022 command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
0023 to be defined in the UAPI header, aliased to the same value.
0024 
0025 The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
0026 following types:
0027 
0028 - ``BPF_PROG_TYPE_SOCKET_FILTER``
0029 - ``BPF_PROG_TYPE_SCHED_CLS``
0030 - ``BPF_PROG_TYPE_SCHED_ACT``
0031 - ``BPF_PROG_TYPE_XDP``
0032 - ``BPF_PROG_TYPE_SK_LOOKUP``
0033 - ``BPF_PROG_TYPE_CGROUP_SKB``
0034 - ``BPF_PROG_TYPE_LWT_IN``
0035 - ``BPF_PROG_TYPE_LWT_OUT``
0036 - ``BPF_PROG_TYPE_LWT_XMIT``
0037 - ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
0038 - ``BPF_PROG_TYPE_FLOW_DISSECTOR``
0039 - ``BPF_PROG_TYPE_STRUCT_OPS``
0040 - ``BPF_PROG_TYPE_RAW_TRACEPOINT``
0041 - ``BPF_PROG_TYPE_SYSCALL``
0042 
0043 When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
0044 object and (for program types operating on network packets) a buffer containing
0045 the packet data that the BPF program will operate on. The kernel will then
0046 execute the program and return the results to userspace. Note that programs will
0047 not have any side effects while being run in this mode; in particular, packets
0048 will not actually be redirected or dropped, the program return code will just be
0049 returned to userspace. A separate mode for live execution of XDP programs is
0050 provided, documented separately below.
0051 
0052 Running XDP programs in "live frame mode"
0053 -----------------------------------------
0054 
0055 The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
0056 which can be used to execute XDP programs in a way where packets will actually
0057 be processed by the kernel after the execution of the XDP program as if they
0058 arrived on a physical interface. This mode is activated by setting the
0059 ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
0060 ``BPF_PROG_RUN``.
0061 
0062 The live packet mode is optimised for high performance execution of the supplied
0063 XDP program many times (suitable for, e.g., running as a traffic generator),
0064 which means the semantics are not quite as straight-forward as the regular test
0065 run mode. Specifically:
0066 
0067 - When executing an XDP program in live frame mode, the result of the execution
0068   will not be returned to userspace; instead, the kernel will perform the
0069   operation indicated by the program's return code (drop the packet, redirect
0070   it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
0071   in the syscall parameters when running in this mode will be rejected. In
0072   addition, not all failures will be reported back to userspace directly;
0073   specifically, only fatal errors in setup or during execution (like memory
0074   allocation errors) will halt execution and return an error. If an error occurs
0075   in packet processing, like a failure to redirect to a given interface,
0076   execution will continue with the next repetition; these errors can be detected
0077   via the same trace points as for regular XDP programs.
0078 
0079 - Userspace can supply an ifindex as part of the context object, just like in
0080   the regular (non-live) mode. The XDP program will be executed as though the
0081   packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
0082   object will point to that interface. Furthermore, if the XDP program returns
0083   ``XDP_PASS``, the packet will be injected into the kernel networking stack as
0084   though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
0085   will be transmitted *out* of that same interface. Do note, though, that
0086   because the program execution is not happening in driver context, an
0087   ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
0088   that same interface (i.e., it will only work if the driver has support for the
0089   ``ndo_xdp_xmit`` driver op).
0090 
0091 - When running the program with multiple repetitions, the execution will happen
0092   in batches. The batch size defaults to 64 packets (which is same as the
0093   maximum NAPI receive batch size), but can be specified by userspace through
0094   the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
0095   the kernel executes the XDP program repeatedly, each invocation getting a
0096   separate copy of the packet data. For each repetition, if the program drops
0097   the packet, the data page is immediately recycled (see below). Otherwise, the
0098   packet is buffered until the end of the batch, at which point all packets
0099   buffered this way during the batch are transmitted at once.
0100 
0101 - When setting up the test run, the kernel will initialise a pool of memory
0102   pages of the same size as the batch size. Each memory page will be initialised
0103   with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
0104   invocation. When possible, the pages will be recycled on future program
0105   invocations, to improve performance. Pages will generally be recycled a full
0106   batch at a time, except when a packet is dropped (by return code or because
0107   of, say, a redirection error), in which case that page will be recycled
0108   immediately. If a packet ends up being passed to the regular networking stack
0109   (because the XDP program returns ``XDP_PASS``, or because it ends up being
0110   redirected to an interface that injects it into the stack), the page will be
0111   released and a new one will be allocated when the pool is empty.
0112 
0113   When recycling, the page content is not rewritten; only the packet boundary
0114   pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
0115   be reset to the original values. This means that if a program rewrites the
0116   packet contents, it has to be prepared to see either the original content or
0117   the modified version on subsequent invocations.