0001 =========================================
0002 I915 GuC Submission/DRM Scheduler Section
0003 =========================================
0004
0005 Upstream plan
0006 =============
0007 For upstream the overall plan for landing GuC submission and integrating the
0008 i915 with the DRM scheduler is:
0009
0010 * Merge basic GuC submission
0011 * Basic submission support for all gen11+ platforms
0012 * Not enabled by default on any current platforms but can be enabled via
0013 modparam enable_guc
0014 * Lots of rework will need to be done to integrate with DRM scheduler so
0015 no need to nit pick everything in the code, it just should be
0016 functional, no major coding style / layering errors, and not regress
0017 execlists
0018 * Update IGTs / selftests as needed to work with GuC submission
0019 * Enable CI on supported platforms for a baseline
0020 * Rework / get CI heathly for GuC submission in place as needed
0021 * Merge new parallel submission uAPI
0022 * Bonding uAPI completely incompatible with GuC submission, plus it has
0023 severe design issues in general, which is why we want to retire it no
0024 matter what
0025 * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
0026 which configures a slot with N contexts
0027 * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
0028 a slot in a single execbuf IOCTL and the batches run on the GPU in
0029 paralllel
0030 * Initially only for GuC submission but execlists can be supported if
0031 needed
0032 * Convert the i915 to use the DRM scheduler
0033 * GuC submission backend fully integrated with DRM scheduler
0034 * All request queues removed from backend (e.g. all backpressure
0035 handled in DRM scheduler)
0036 * Resets / cancels hook in DRM scheduler
0037 * Watchdog hooks into DRM scheduler
0038 * Lots of complexity of the GuC backend can be pulled out once
0039 integrated with DRM scheduler (e.g. state machine gets
0040 simplier, locking gets simplier, etc...)
0041 * Execlists backend will minimum required to hook in the DRM scheduler
0042 * Legacy interface
0043 * Features like timeslicing / preemption / virtual engines would
0044 be difficult to integrate with the DRM scheduler and these
0045 features are not required for GuC submission as the GuC does
0046 these things for us
0047 * ROI low on fully integrating into DRM scheduler
0048 * Fully integrating would add lots of complexity to DRM
0049 scheduler
0050 * Port i915 priority inheritance / boosting feature in DRM scheduler
0051 * Used for i915 page flip, may be useful to other DRM drivers as
0052 well
0053 * Will be an optional feature in the DRM scheduler
0054 * Remove in-order completion assumptions from DRM scheduler
0055 * Even when using the DRM scheduler the backends will handle
0056 preemption, timeslicing, etc... so it is possible for jobs to
0057 finish out of order
0058 * Pull out i915 priority levels and use DRM priority levels
0059 * Optimize DRM scheduler as needed
0060
0061 TODOs for GuC submission upstream
0062 =================================
0063
0064 * Need an update to GuC firmware / i915 to enable error state capture
0065 * Open source tool to decode GuC logs
0066 * Public GuC spec
0067
0068 New uAPI for basic GuC submission
0069 =================================
0070 No major changes are required to the uAPI for basic GuC submission. The only
0071 change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
0072 This attribute indicates the 2k i915 user priority levels are statically mapped
0073 into 3 levels as follows:
0074
0075 * -1k to -1 Low priority
0076 * 0 Medium priority
0077 * 1 to 1k High priority
0078
0079 This is needed because the GuC only has 4 priority bands. The highest priority
0080 band is reserved with the kernel. This aligns with the DRM scheduler priority
0081 levels too.
0082
0083 Spec references:
0084 ----------------
0085 * https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
0086 * https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
0087 * https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t
0088
0089 New parallel submission uAPI
0090 ============================
0091 The existing bonding uAPI is completely broken with GuC submission because
0092 whether a submission is a single context submit or parallel submit isn't known
0093 until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
0094 contexts in parallel with the GuC the context must be explicitly registered with
0095 N contexts and all N contexts must be submitted in a single command to the GuC.
0096 The GuC interfaces do not support dynamically changing between N contexts as the
0097 bonding uAPI does. Hence the need for a new parallel submission interface. Also
0098 the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore
0099 I915_SUBMIT_FENCE is by design a future fence, so not really something we should
0100 continue to support.
0101
0102 The new parallel submission uAPI consists of 3 parts:
0103
0104 * Export engines logical mapping
0105 * A 'set_parallel' extension to configure contexts for parallel
0106 submission
0107 * Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
0108
0109 Export engines logical mapping
0110 ------------------------------
0111 Certain use cases require BBs to be placed on engine instances in logical order
0112 (e.g. split-frame on gen11+). The logical mapping of engine instances can change
0113 based on fusing. Rather than making UMDs be aware of fusing, simply expose the
0114 logical mapping with the existing query engine info IOCTL. Also the GuC
0115 submission interface currently only supports submitting multiple contexts to
0116 engines in logical order which is a new requirement compared to execlists.
0117 Lastly, all current platforms have at most 2 engine instances and the logical
0118 order is the same as uAPI order. This will change on platforms with more than 2
0119 engine instances.
0120
0121 A single bit will be added to drm_i915_engine_info.flags indicating that the
0122 logical instance has been returned and a new field,
0123 drm_i915_engine_info.logical_instance, returns the logical instance.
0124
0125 A 'set_parallel' extension to configure contexts for parallel submission
0126 ------------------------------------------------------------------------
0127 The 'set_parallel' extension configures a slot for parallel submission of N BBs.
0128 It is a setup step that must be called before using any of the contexts. See
0129 I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
0130 similar existing examples. Once a slot is configured for parallel submission the
0131 execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only
0132 supports GuC submission. Execlists supports can be added later if needed.
0133
0134 Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
0135 drm_i915_context_engines_parallel_submit to the uAPI to implement this
0136 extension.
0137
0138 .. kernel-doc:: include/uapi/drm/i915_drm.h
0139 :functions: i915_context_engines_parallel_submit
0140
0141 Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
0142 -------------------------------------------------------------------
0143 Contexts that have been configured with the 'set_parallel' extension can only
0144 submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects
0145 in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is
0146 set. The number of BBs is implicit based on the slot submitted and how it has
0147 been configured by 'set_parallel' or other extensions. No uAPI changes are
0148 required to the execbuf2 IOCTL.