Back to home page

OSCL-LXR

 
 

    


0001 ==========================================
0002 I915 VM_BIND feature design and use cases
0003 ==========================================
0004 
0005 VM_BIND feature
0006 ================
0007 DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
0008 objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
0009 specified address space (VM). These mappings (also referred to as persistent
0010 mappings) will be persistent across multiple GPU submissions (execbuf calls)
0011 issued by the UMD, without user having to provide a list of all required
0012 mappings during each submission (as required by older execbuf mode).
0013 
0014 The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for
0015 signaling the completion of bind/unbind operation.
0016 
0017 VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION.
0018 User has to opt-in for VM_BIND mode of binding for an address space (VM)
0019 during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
0020 
0021 VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are
0022 not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done
0023 asynchronously, when valid out fence is specified.
0024 
0025 VM_BIND features include:
0026 
0027 * Multiple Virtual Address (VA) mappings can map to the same physical pages
0028   of an object (aliasing).
0029 * VA mapping can map to a partial section of the BO (partial binding).
0030 * Support capture of persistent mappings in the dump upon GPU error.
0031 * Support for userptr gem objects (no special uapi is required for this).
0032 
0033 TLB flush consideration
0034 ------------------------
0035 The i915 driver flushes the TLB for each submission and when an object's
0036 pages are released. The VM_BIND/UNBIND operation will not do any additional
0037 TLB flush. Any VM_BIND mapping added will be in the working set for subsequent
0038 submissions on that VM and will not be in the working set for currently running
0039 batches (which would require additional TLB flushes, which is not supported).
0040 
0041 Execbuf ioctl in VM_BIND mode
0042 -------------------------------
0043 A VM in VM_BIND mode will not support older execbuf mode of binding.
0044 The execbuf ioctl handling in VM_BIND mode differs significantly from the
0045 older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2).
0046 Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See
0047 struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any
0048 execlist. Hence, no support for implicit sync. It is expected that the below
0049 work will be able to support requirements of object dependency setting in all
0050 use cases:
0051 
0052 "dma-buf: Add an API for exporting sync files"
0053 (https://lwn.net/Articles/859290/)
0054 
0055 The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only
0056 works with execbuf3 ioctl for submission. All BOs mapped on that VM (through
0057 VM_BIND call) at the time of execbuf3 call are deemed required for that
0058 submission.
0059 
0060 The execbuf3 ioctl directly specifies the batch addresses instead of as
0061 object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not
0062 support many of the older features like in/out/submit fences, fence array,
0063 default gem context and many more (See struct drm_i915_gem_execbuffer3).
0064 
0065 In VM_BIND mode, VA allocation is completely managed by the user instead of
0066 the i915 driver. Hence all VA assignment, eviction are not applicable in
0067 VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not
0068 be using the i915_vma active reference tracking. It will instead use dma-resv
0069 object for that (See `VM_BIND dma_resv usage`_).
0070 
0071 So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA
0072 evictions, vma lookup table, implicit sync, vma active reference tracking etc.,
0073 are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling
0074 should be in a separate file and only functionalities common to these ioctls
0075 can be the shared code where possible.
0076 
0077 VM_PRIVATE objects
0078 -------------------
0079 By default, BOs can be mapped on multiple VMs and can also be dma-buf
0080 exported. Hence these BOs are referred to as Shared BOs.
0081 During each execbuf submission, the request fence must be added to the
0082 dma-resv fence list of all shared BOs mapped on the VM.
0083 
0084 VM_BIND feature introduces an optimization where user can create BO which
0085 is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during
0086 BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on
0087 the VM they are private to and can't be dma-buf exported.
0088 All private BOs of a VM share the dma-resv object. Hence during each execbuf
0089 submission, they need only one dma-resv fence list updated. Thus, the fast
0090 path (where required mappings are already bound) submission latency is O(1)
0091 w.r.t the number of VM private BOs.
0092 
0093 VM_BIND locking hirarchy
0094 -------------------------
0095 The locking design here supports the older (execlist based) execbuf mode, the
0096 newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future
0097 system allocator support (See `Shared Virtual Memory (SVM) support`_).
0098 The older execbuf mode and the newer VM_BIND mode without page faults manages
0099 residency of backing storage using dma_fence. The VM_BIND mode with page faults
0100 and the system allocator support do not use any dma_fence at all.
0101 
0102 VM_BIND locking order is as below.
0103 
0104 1) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in
0105    vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the
0106    mapping.
0107 
0108    In future, when GPU page faults are supported, we can potentially use a
0109    rwsem instead, so that multiple page fault handlers can take the read side
0110    lock to lookup the mapping and hence can run in parallel.
0111    The older execbuf mode of binding do not need this lock.
0112 
0113 2) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to
0114    be held while binding/unbinding a vma in the async worker and while updating
0115    dma-resv fence list of an object. Note that private BOs of a VM will all
0116    share a dma-resv object.
0117 
0118    The future system allocator support will use the HMM prescribed locking
0119    instead.
0120 
0121 3) Lock-C: Spinlock/s to protect some of the VM's lists like the list of
0122    invalidated vmas (due to eviction and userptr invalidation) etc.
0123 
0124 When GPU page faults are supported, the execbuf path do not take any of these
0125 locks. There we will simply smash the new batch buffer address into the ring and
0126 then tell the scheduler run that. The lock taking only happens from the page
0127 fault handler, where we take lock-A in read mode, whichever lock-B we need to
0128 find the backing storage (dma_resv lock for gem objects, and hmm/core mm for
0129 system allocator) and some additional locks (lock-D) for taking care of page
0130 table races. Page fault mode should not need to ever manipulate the vm lists,
0131 so won't ever need lock-C.
0132 
0133 VM_BIND LRU handling
0134 ---------------------
0135 We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid
0136 performance degradation. We will also need support for bulk LRU movement of
0137 VM_BIND objects to avoid additional latencies in execbuf path.
0138 
0139 The page table pages are similar to VM_BIND mapped objects (See
0140 `Evictable page table allocations`_) and are maintained per VM and needs to
0141 be pinned in memory when VM is made active (ie., upon an execbuf call with
0142 that VM). So, bulk LRU movement of page table pages is also needed.
0143 
0144 VM_BIND dma_resv usage
0145 -----------------------
0146 Fences needs to be added to all VM_BIND mapped objects. During each execbuf
0147 submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent
0148 over sync (See enum dma_resv_usage). One can override it with either
0149 DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object
0150 dependency setting.
0151 
0152 Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for
0153 DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch
0154 check. Instead, the execbuf3 out fence should be used for end of batch check
0155 (See struct drm_i915_gem_execbuffer3).
0156 
0157 Also, in VM_BIND mode, use dma-resv apis for determining object activeness
0158 (See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the
0159 older i915_vma active reference tracking which is deprecated. This should be
0160 easier to get it working with the current TTM backend.
0161 
0162 Mesa use case
0163 --------------
0164 VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris),
0165 hence improving performance of CPU-bound applications. It also allows us to
0166 implement Vulkan's Sparse Resources. With increasing GPU hardware performance,
0167 reducing CPU overhead becomes more impactful.
0168 
0169 
0170 Other VM_BIND use cases
0171 ========================
0172 
0173 Long running Compute contexts
0174 ------------------------------
0175 Usage of dma-fence expects that they complete in reasonable amount of time.
0176 Compute on the other hand can be long running. Hence it is appropriate for
0177 compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage
0178 must be limited to in-kernel consumption only.
0179 
0180 Where GPU page faults are not available, kernel driver upon buffer invalidation
0181 will initiate a suspend (preemption) of long running context, finish the
0182 invalidation, revalidate the BO and then resume the compute context. This is
0183 done by having a per-context preempt fence which is enabled when someone tries
0184 to wait on it and triggers the context preemption.
0185 
0186 User/Memory Fence
0187 ~~~~~~~~~~~~~~~~~~
0188 User/Memory fence is a <address, value> pair. To signal the user fence, the
0189 specified value will be written at the specified virtual address and wakeup the
0190 waiting process. User fence can be signaled either by the GPU or kernel async
0191 worker (like upon bind completion). User can wait on a user fence with a new
0192 user fence wait ioctl.
0193 
0194 Here is some prior work on this:
0195 https://patchwork.freedesktop.org/patch/349417/
0196 
0197 Low Latency Submission
0198 ~~~~~~~~~~~~~~~~~~~~~~~
0199 Allows compute UMD to directly submit GPU jobs instead of through execbuf
0200 ioctl. This is made possible by VM_BIND is not being synchronized against
0201 execbuf. VM_BIND allows bind/unbind of mappings required for the directly
0202 submitted jobs.
0203 
0204 Debugger
0205 ---------
0206 With debug event interface user space process (debugger) is able to keep track
0207 of and act upon resources created by another process (debugged) and attached
0208 to GPU via vm_bind interface.
0209 
0210 GPU page faults
0211 ----------------
0212 GPU page faults when supported (in future), will only be supported in the
0213 VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of
0214 binding will require using dma-fence to ensure residency, the GPU page faults
0215 mode when supported, will not use any dma-fence as residency is purely managed
0216 by installing and removing/invalidating page table entries.
0217 
0218 Page level hints settings
0219 --------------------------
0220 VM_BIND allows any hints setting per mapping instead of per BO. Possible hints
0221 include placement and atomicity. Sub-BO level placement hint will be even more
0222 relevant with upcoming GPU on-demand page fault support.
0223 
0224 Page level Cache/CLOS settings
0225 -------------------------------
0226 VM_BIND allows cache/CLOS settings per mapping instead of per BO.
0227 
0228 Evictable page table allocations
0229 ---------------------------------
0230 Make pagetable allocations evictable and manage them similar to VM_BIND
0231 mapped objects. Page table pages are similar to persistent mappings of a
0232 VM (difference here are that the page table pages will not have an i915_vma
0233 structure and after swapping pages back in, parent page link needs to be
0234 updated).
0235 
0236 Shared Virtual Memory (SVM) support
0237 ------------------------------------
0238 VM_BIND interface can be used to map system memory directly (without gem BO
0239 abstraction) using the HMM interface. SVM is only supported with GPU page
0240 faults enabled.
0241 
0242 VM_BIND UAPI
0243 =============
0244 
0245 .. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h