0001 ===============
0002 RDMA Controller
0003 ===============
0004
0005 .. Contents
0006
0007 1. Overview
0008 1-1. What is RDMA controller?
0009 1-2. Why RDMA controller needed?
0010 1-3. How is RDMA controller implemented?
0011 2. Usage Examples
0012
0013 1. Overview
0014 ===========
0015
0016 1-1. What is RDMA controller?
0017 -----------------------------
0018
0019 RDMA controller allows user to limit RDMA/IB specific resources that a given
0020 set of processes can use. These processes are grouped using RDMA controller.
0021
0022 RDMA controller defines two resources which can be limited for processes of a
0023 cgroup.
0024
0025 1-2. Why RDMA controller needed?
0026 --------------------------------
0027
0028 Currently user space applications can easily take away all the rdma verb
0029 specific resources such as AH, CQ, QP, MR etc. Due to which other applications
0030 in other cgroup or kernel space ULPs may not even get chance to allocate any
0031 rdma resources. This can lead to service unavailability.
0032
0033 Therefore RDMA controller is needed through which resource consumption
0034 of processes can be limited. Through this controller different rdma
0035 resources can be accounted.
0036
0037 1-3. How is RDMA controller implemented?
0038 ----------------------------------------
0039
0040 RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
0041 resource accounting per cgroup, per device using resource pool structure.
0042 Each such resource pool is limited up to 64 resources in given resource pool
0043 by rdma cgroup, which can be extended later if required.
0044
0045 This resource pool object is linked to the cgroup css. Typically there
0046 are 0 to 4 resource pool instances per cgroup, per device in most use cases.
0047 But nothing limits to have it more. At present hundreds of RDMA devices per
0048 single cgroup may not be handled optimally, however there is no
0049 known use case or requirement for such configuration either.
0050
0051 Since RDMA resources can be allocated from any process and can be freed by any
0052 of the child processes which shares the address space, rdma resources are
0053 always owned by the creator cgroup css. This allows process migration from one
0054 to other cgroup without major complexity of transferring resource ownership;
0055 because such ownership is not really present due to shared nature of
0056 rdma resources. Linking resources around css also ensures that cgroups can be
0057 deleted after processes migrated. This allow progress migration as well with
0058 active resources, even though that is not a primary use case.
0059
0060 Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
0061 the caller. Same rdma cgroup should be passed while uncharging the resource.
0062 This also allows process migrated with active RDMA resource to charge
0063 to new owner cgroup for new resource. It also allows to uncharge resource of
0064 a process from previously charged cgroup which is migrated to new cgroup,
0065 even though that is not a primary use case.
0066
0067 Resource pool object is created in following situations.
0068 (a) User sets the limit and no previous resource pool exist for the device
0069 of interest for the cgroup.
0070 (b) No resource limits were configured, but IB/RDMA stack tries to
0071 charge the resource. So that it correctly uncharge them when applications are
0072 running without limits and later on when limits are enforced during uncharging,
0073 otherwise usage count will drop to negative.
0074
0075 Resource pool is destroyed if all the resource limits are set to max and
0076 it is the last resource getting deallocated.
0077
0078 User should set all the limit to max value if it intents to remove/unconfigure
0079 the resource pool for a particular device.
0080
0081 IB stack honors limits enforced by the rdma controller. When application
0082 query about maximum resource limits of IB device, it returns minimum of
0083 what is configured by user for a given cgroup and what is supported by
0084 IB device.
0085
0086 Following resources can be accounted by rdma controller.
0087
0088 ========== =============================
0089 hca_handle Maximum number of HCA Handles
0090 hca_object Maximum number of HCA Objects
0091 ========== =============================
0092
0093 2. Usage Examples
0094 =================
0095
0096 (a) Configure resource limit::
0097
0098 echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
0099 echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
0100
0101 (b) Query resource limit::
0102
0103 cat /sys/fs/cgroup/rdma/2/rdma.max
0104 #Output:
0105 mlx4_0 hca_handle=2 hca_object=2000
0106 ocrdma1 hca_handle=3 hca_object=max
0107
0108 (c) Query current usage::
0109
0110 cat /sys/fs/cgroup/rdma/2/rdma.current
0111 #Output:
0112 mlx4_0 hca_handle=1 hca_object=20
0113 ocrdma1 hca_handle=1 hca_object=23
0114
0115 (d) Delete resource limit::
0116
0117 echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max