Back to home page

OSCL-LXR

 
 

    


0001 ======================
0002 Userspace verbs access
0003 ======================
0004 
0005   The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
0006   enables direct userspace access to IB hardware via "verbs," as
0007   described in chapter 11 of the InfiniBand Architecture Specification.
0008 
0009   To use the verbs, the libibverbs library, available from
0010   https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
0011   device-independent API for using the ib_uverbs interface.
0012   libibverbs also requires appropriate device-dependent kernel and
0013   userspace driver for your InfiniBand hardware.  For example, to use
0014   a Mellanox HCA, you will need the ib_mthca kernel module and the
0015   libmthca userspace driver be installed.
0016 
0017 User-kernel communication
0018 =========================
0019 
0020   Userspace communicates with the kernel for slow path, resource
0021   management operations via the /dev/infiniband/uverbsN character
0022   devices.  Fast path operations are typically performed by writing
0023   directly to hardware registers mmap()ed into userspace, with no
0024   system call or context switch into the kernel.
0025 
0026   Commands are sent to the kernel via write()s on these device files.
0027   The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
0028   The structs for commands that require a response from the kernel
0029   contain a 64-bit field used to pass a pointer to an output buffer.
0030   Status is returned to userspace as the return value of the write()
0031   system call.
0032 
0033 Resource management
0034 ===================
0035 
0036   Since creation and destruction of all IB resources is done by
0037   commands passed through a file descriptor, the kernel can keep track
0038   of which resources are attached to a given userspace context.  The
0039   ib_uverbs module maintains idr tables that are used to translate
0040   between kernel pointers and opaque userspace handles, so that kernel
0041   pointers are never exposed to userspace and userspace cannot trick
0042   the kernel into following a bogus pointer.
0043 
0044   This also allows the kernel to clean up when a process exits and
0045   prevent one process from touching another process's resources.
0046 
0047 Memory pinning
0048 ==============
0049 
0050   Direct userspace I/O requires that memory regions that are potential
0051   I/O targets be kept resident at the same physical address.  The
0052   ib_uverbs module manages pinning and unpinning memory regions via
0053   get_user_pages() and put_page() calls.  It also accounts for the
0054   amount of memory pinned in the process's pinned_vm, and checks that
0055   unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
0056 
0057   Pages that are pinned multiple times are counted each time they are
0058   pinned, so the value of pinned_vm may be an overestimate of the
0059   number of pages pinned by a process.
0060 
0061 /dev files
0062 ==========
0063 
0064   To create the appropriate character device files automatically with
0065   udev, a rule like::
0066 
0067     KERNEL=="uverbs*", NAME="infiniband/%k"
0068 
0069   can be used.  This will create device nodes named::
0070 
0071     /dev/infiniband/uverbs0
0072 
0073   and so on.  Since the InfiniBand userspace verbs should be safe for
0074   use by non-privileged processes, it may be useful to add an
0075   appropriate MODE or GROUP to the udev rule.