Back to home page

OSCL-LXR

 
 

    


0001 ===============
0002 RDMA Controller
0003 ===============
0004 
0005 .. Contents
0006 
0007    1. Overview
0008      1-1. What is RDMA controller?
0009      1-2. Why RDMA controller needed?
0010      1-3. How is RDMA controller implemented?
0011    2. Usage Examples
0012 
0013 1. Overview
0014 ===========
0015 
0016 1-1. What is RDMA controller?
0017 -----------------------------
0018 
0019 RDMA controller allows user to limit RDMA/IB specific resources that a given
0020 set of processes can use. These processes are grouped using RDMA controller.
0021 
0022 RDMA controller defines two resources which can be limited for processes of a
0023 cgroup.
0024 
0025 1-2. Why RDMA controller needed?
0026 --------------------------------
0027 
0028 Currently user space applications can easily take away all the rdma verb
0029 specific resources such as AH, CQ, QP, MR etc. Due to which other applications
0030 in other cgroup or kernel space ULPs may not even get chance to allocate any
0031 rdma resources. This can lead to service unavailability.
0032 
0033 Therefore RDMA controller is needed through which resource consumption
0034 of processes can be limited. Through this controller different rdma
0035 resources can be accounted.
0036 
0037 1-3. How is RDMA controller implemented?
0038 ----------------------------------------
0039 
0040 RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
0041 resource accounting per cgroup, per device using resource pool structure.
0042 Each such resource pool is limited up to 64 resources in given resource pool
0043 by rdma cgroup, which can be extended later if required.
0044 
0045 This resource pool object is linked to the cgroup css. Typically there
0046 are 0 to 4 resource pool instances per cgroup, per device in most use cases.
0047 But nothing limits to have it more. At present hundreds of RDMA devices per
0048 single cgroup may not be handled optimally, however there is no
0049 known use case or requirement for such configuration either.
0050 
0051 Since RDMA resources can be allocated from any process and can be freed by any
0052 of the child processes which shares the address space, rdma resources are
0053 always owned by the creator cgroup css. This allows process migration from one
0054 to other cgroup without major complexity of transferring resource ownership;
0055 because such ownership is not really present due to shared nature of
0056 rdma resources. Linking resources around css also ensures that cgroups can be
0057 deleted after processes migrated. This allow progress migration as well with
0058 active resources, even though that is not a primary use case.
0059 
0060 Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
0061 the caller. Same rdma cgroup should be passed while uncharging the resource.
0062 This also allows process migrated with active RDMA resource to charge
0063 to new owner cgroup for new resource. It also allows to uncharge resource of
0064 a process from previously charged cgroup which is migrated to new cgroup,
0065 even though that is not a primary use case.
0066 
0067 Resource pool object is created in following situations.
0068 (a) User sets the limit and no previous resource pool exist for the device
0069 of interest for the cgroup.
0070 (b) No resource limits were configured, but IB/RDMA stack tries to
0071 charge the resource. So that it correctly uncharge them when applications are
0072 running without limits and later on when limits are enforced during uncharging,
0073 otherwise usage count will drop to negative.
0074 
0075 Resource pool is destroyed if all the resource limits are set to max and
0076 it is the last resource getting deallocated.
0077 
0078 User should set all the limit to max value if it intents to remove/unconfigure
0079 the resource pool for a particular device.
0080 
0081 IB stack honors limits enforced by the rdma controller. When application
0082 query about maximum resource limits of IB device, it returns minimum of
0083 what is configured by user for a given cgroup and what is supported by
0084 IB device.
0085 
0086 Following resources can be accounted by rdma controller.
0087 
0088   ==========    =============================
0089   hca_handle    Maximum number of HCA Handles
0090   hca_object    Maximum number of HCA Objects
0091   ==========    =============================
0092 
0093 2. Usage Examples
0094 =================
0095 
0096 (a) Configure resource limit::
0097 
0098         echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
0099         echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
0100 
0101 (b) Query resource limit::
0102 
0103         cat /sys/fs/cgroup/rdma/2/rdma.max
0104         #Output:
0105         mlx4_0 hca_handle=2 hca_object=2000
0106         ocrdma1 hca_handle=3 hca_object=max
0107 
0108 (c) Query current usage::
0109 
0110         cat /sys/fs/cgroup/rdma/2/rdma.current
0111         #Output:
0112         mlx4_0 hca_handle=1 hca_object=20
0113         ocrdma1 hca_handle=1 hca_object=23
0114 
0115 (d) Delete resource limit::
0116 
0117         echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max