0001 ===================================================
0002 Scalable Vector Extension support for AArch64 Linux
0003 ===================================================
0004
0005 Author: Dave Martin <Dave.Martin@arm.com>
0006
0007 Date: 4 August 2017
0008
0009 This document outlines briefly the interface provided to userspace by Linux in
0010 order to support use of the ARM Scalable Vector Extension (SVE), including
0011 interactions with Streaming SVE mode added by the Scalable Matrix Extension
0012 (SME).
0013
0014 This is an outline of the most important features and issues only and not
0015 intended to be exhaustive.
0016
0017 This document does not aim to describe the SVE architecture or programmer's
0018 model. To aid understanding, a minimal description of relevant programmer's
0019 model features for SVE is included in Appendix A.
0020
0021
0022 1. General
0023 -----------
0024
0025 * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
0026 tracked per-thread.
0027
0028 * In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present
0029 in the system, when it is not supported and these interfaces are used to
0030 access streaming mode FFR is read and written as zero.
0031
0032 * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
0033 AT_HWCAP entry. Presence of this flag implies the presence of the SVE
0034 instructions and registers, and the Linux-specific system interfaces
0035 described in this document. SVE is reported in /proc/cpuinfo as "sve".
0036
0037 * Support for the execution of SVE instructions in userspace can also be
0038 detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
0039 instruction, and checking that the value of the SVE field is nonzero. [3]
0040
0041 It does not guarantee the presence of the system interfaces described in the
0042 following sections: software that needs to verify that those interfaces are
0043 present must check for HWCAP_SVE instead.
0044
0045 * On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also
0046 be reported in the AT_HWCAP2 aux vector entry. In addition to this,
0047 optional extensions to SVE2 may be reported by the presence of:
0048
0049 HWCAP2_SVE2
0050 HWCAP2_SVEAES
0051 HWCAP2_SVEPMULL
0052 HWCAP2_SVEBITPERM
0053 HWCAP2_SVESHA3
0054 HWCAP2_SVESM4
0055
0056 This list may be extended over time as the SVE architecture evolves.
0057
0058 These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1,
0059 which userspace can read using an MRS instruction. See elf_hwcaps.txt and
0060 cpu-feature-registers.txt for details.
0061
0062 * On hardware that supports the SME extensions, HWCAP2_SME will also be
0063 reported in the AT_HWCAP2 aux vector entry. Among other things SME adds
0064 streaming mode which provides a subset of the SVE feature set using a
0065 separate SME vector length and the same Z/V registers. See sme.rst
0066 for more details.
0067
0068 * Debuggers should restrict themselves to interacting with the target via the
0069 NT_ARM_SVE regset. The recommended way of detecting support for this regset
0070 is to connect to a target process first and then attempt a
0071 ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is
0072 present and streaming SVE mode is in use the FPSIMD subset of registers
0073 will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
0074 in the target.
0075
0076 * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
0077 between userspace and the kernel, the register value is encoded in memory in
0078 an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at
0079 byte offset i from the start of the memory representation. This affects for
0080 example the signal frame (struct sve_context) and ptrace interface
0081 (struct user_sve_header) and associated data.
0082
0083 Beware that on big-endian systems this results in a different byte order than
0084 for the FPSIMD V-registers, which are stored as single host-endian 128-bit
0085 values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at
0086 byte offset i. (struct fpsimd_context, struct user_fpsimd_state).
0087
0088
0089 2. Vector length terminology
0090 -----------------------------
0091
0092 The size of an SVE vector (Z) register is referred to as the "vector length".
0093
0094 To avoid confusion about the units used to express vector length, the kernel
0095 adopts the following conventions:
0096
0097 * Vector length (VL) = size of a Z-register in bytes
0098
0099 * Vector quadwords (VQ) = size of a Z-register in units of 128 bits
0100
0101 (So, VL = 16 * VQ.)
0102
0103 The VQ convention is used where the underlying granularity is important, such
0104 as in data structure definitions. In most other situations, the VL convention
0105 is used. This is consistent with the meaning of the "VL" pseudo-register in
0106 the SVE instruction set architecture.
0107
0108
0109 3. System call behaviour
0110 -------------------------
0111
0112 * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of
0113 Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR
0114 become unspecified on return from a syscall.
0115
0116 * The SVE registers are not used to pass arguments to or receive results from
0117 any syscall.
0118
0119 * In practice the affected registers/bits will be preserved or will be replaced
0120 with zeros on return from a syscall, but userspace should not make
0121 assumptions about this. The kernel behaviour may vary on a case-by-case
0122 basis.
0123
0124 * All other SVE state of a thread, including the currently configured vector
0125 length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
0126 length (if any), is preserved across all syscalls, subject to the specific
0127 exceptions for execve() described in section 6.
0128
0129 In particular, on return from a fork() or clone(), the parent and new child
0130 process or thread share identical SVE configuration, matching that of the
0131 parent before the call.
0132
0133
0134 4. Signal handling
0135 -------------------
0136
0137 * A new signal frame record sve_context encodes the SVE registers on signal
0138 delivery. [1]
0139
0140 * This record is supplementary to fpsimd_context. The FPSR and FPCR registers
0141 are only present in fpsimd_context. For convenience, the content of V0..V31
0142 is duplicated between sve_context and fpsimd_context.
0143
0144 * The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
0145 if set indicates that the thread is in streaming mode and the vector length
0146 and register data (if present) describe the streaming SVE data and vector
0147 length.
0148
0149 * The signal frame record for SVE always contains basic metadata, in particular
0150 the thread's vector length (in sve_context.vl).
0151
0152 * The SVE registers may or may not be included in the record, depending on
0153 whether the registers are live for the thread. The registers are present if
0154 and only if:
0155 sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
0156
0157 * If the registers are present, the remainder of the record has a vl-dependent
0158 size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to
0159 the members.
0160
0161 * Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant
0162 layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the
0163 start of the register's representation in memory.
0164
0165 * If the SVE context is too big to fit in sigcontext.__reserved[], then extra
0166 space is allocated on the stack, an extra_context record is written in
0167 __reserved[] referencing this space. sve_context is then written in the
0168 extra space. Refer to [1] for further details about this mechanism.
0169
0170
0171 5. Signal return
0172 -----------------
0173
0174 When returning from a signal handler:
0175
0176 * If there is no sve_context record in the signal frame, or if the record is
0177 present but contains no register data as desribed in the previous section,
0178 then the SVE registers/bits become non-live and take unspecified values.
0179
0180 * If sve_context is present in the signal frame and contains full register
0181 data, the SVE registers become live and are populated with the specified
0182 data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31
0183 are always restored from the corresponding members of fpsimd_context.vregs[]
0184 and not from sve_context. The remaining bits are restored from sve_context.
0185
0186 * Inclusion of fpsimd_context in the signal frame remains mandatory,
0187 irrespective of whether sve_context is present or not.
0188
0189 * The vector length cannot be changed via signal return. If sve_context.vl in
0190 the signal frame does not match the current vector length, the signal return
0191 attempt is treated as illegal, resulting in a forced SIGSEGV.
0192
0193 * It is permitted to enter or leave streaming mode by setting or clearing
0194 the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
0195 when doing so sve_context.vl and any register data are appropriate for the
0196 vector length in the new mode.
0197
0198
0199 6. prctl extensions
0200 --------------------
0201
0202 Some new prctl() calls are added to allow programs to manage the SVE vector
0203 length:
0204
0205 prctl(PR_SVE_SET_VL, unsigned long arg)
0206
0207 Sets the vector length of the calling thread and related flags, where
0208 arg == vl | flags. Other threads of the calling process are unaffected.
0209
0210 vl is the desired vector length, where sve_vl_valid(vl) must be true.
0211
0212 flags:
0213
0214 PR_SVE_VL_INHERIT
0215
0216 Inherit the current vector length across execve(). Otherwise, the
0217 vector length is reset to the system default at execve(). (See
0218 Section 9.)
0219
0220 PR_SVE_SET_VL_ONEXEC
0221
0222 Defer the requested vector length change until the next execve()
0223 performed by this thread.
0224
0225 The effect is equivalent to implicit exceution of the following
0226 call immediately after the next execve() (if any) by the thread:
0227
0228 prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
0229
0230 This allows launching of a new program with a different vector
0231 length, while avoiding runtime side effects in the caller.
0232
0233
0234 Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
0235 immediately.
0236
0237
0238 Return value: a nonnegative on success, or a negative value on error:
0239 EINVAL: SVE not supported, invalid vector length requested, or
0240 invalid flags.
0241
0242
0243 On success:
0244
0245 * Either the calling thread's vector length or the deferred vector length
0246 to be applied at the next execve() by the thread (dependent on whether
0247 PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
0248 supported by the system that is less than or equal to vl. If vl ==
0249 SVE_VL_MAX, the value set will be the largest value supported by the
0250 system.
0251
0252 * Any previously outstanding deferred vector length change in the calling
0253 thread is cancelled.
0254
0255 * The returned value describes the resulting configuration, encoded as for
0256 PR_SVE_GET_VL. The vector length reported in this value is the new
0257 current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
0258 present in arg; otherwise, the reported vector length is the deferred
0259 vector length that will be applied at the next execve() by the calling
0260 thread.
0261
0262 * Changing the vector length causes all of P0..P15, FFR and all bits of
0263 Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
0264 unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current
0265 vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
0266 flag, does not constitute a change to the vector length for this purpose.
0267
0268
0269 prctl(PR_SVE_GET_VL)
0270
0271 Gets the vector length of the calling thread.
0272
0273 The following flag may be OR-ed into the result:
0274
0275 PR_SVE_VL_INHERIT
0276
0277 Vector length will be inherited across execve().
0278
0279 There is no way to determine whether there is an outstanding deferred
0280 vector length change (which would only normally be the case between a
0281 fork() or vfork() and the corresponding execve() in typical use).
0282
0283 To extract the vector length from the result, bitwise and it with
0284 PR_SVE_VL_LEN_MASK.
0285
0286 Return value: a nonnegative value on success, or a negative value on error:
0287 EINVAL: SVE not supported.
0288
0289
0290 7. ptrace extensions
0291 ---------------------
0292
0293 * New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
0294 PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
0295 streaming mode SVE registers and NT_ARM_SVE describes the
0296 non-streaming mode SVE registers.
0297
0298 In this description a register set is referred to as being "live" when
0299 the target is in the appropriate streaming or non-streaming mode and is
0300 using data beyond the subset shared with the FPSIMD Vn registers.
0301
0302 Refer to [2] for definitions.
0303
0304 The regset data starts with struct user_sve_header, containing:
0305
0306 size
0307
0308 Size of the complete regset, in bytes.
0309 This depends on vl and possibly on other things in the future.
0310
0311 If a call to PTRACE_GETREGSET requests less data than the value of
0312 size, the caller can allocate a larger buffer and retry in order to
0313 read the complete regset.
0314
0315 max_size
0316
0317 Maximum size in bytes that the regset can grow to for the target
0318 thread. The regset won't grow bigger than this even if the target
0319 thread changes its vector length etc.
0320
0321 vl
0322
0323 Target thread's current vector length, in bytes.
0324
0325 max_vl
0326
0327 Maximum possible vector length for the target thread.
0328
0329 flags
0330
0331 at most one of
0332
0333 SVE_PT_REGS_FPSIMD
0334
0335 SVE registers are not live (GETREGSET) or are to be made
0336 non-live (SETREGSET).
0337
0338 The payload is of type struct user_fpsimd_state, with the same
0339 meaning as for NT_PRFPREG, starting at offset
0340 SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
0341
0342 Extra data might be appended in the future: the size of the
0343 payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
0344
0345 vq should be obtained using sve_vq_from_vl(vl).
0346
0347 or
0348
0349 SVE_PT_REGS_SVE
0350
0351 SVE registers are live (GETREGSET) or are to be made live
0352 (SETREGSET).
0353
0354 The payload contains the SVE register data, starting at offset
0355 SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
0356 size SVE_PT_SVE_SIZE(vq, flags);
0357
0358 ... OR-ed with zero or more of the following flags, which have the same
0359 meaning and behaviour as the corresponding PR_SET_VL_* flags:
0360
0361 SVE_PT_VL_INHERIT
0362
0363 SVE_PT_VL_ONEXEC (SETREGSET only).
0364
0365 If neither FPSIMD nor SVE flags are provided then no register
0366 payload is available, this is only possible when SME is implemented.
0367
0368
0369 * The effects of changing the vector length and/or flags are equivalent to
0370 those documented for PR_SVE_SET_VL.
0371
0372 The caller must make a further GETREGSET call if it needs to know what VL is
0373 actually set by SETREGSET, unless is it known in advance that the requested
0374 VL is supported.
0375
0376 * In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
0377 the header fields. The SVE_PT_SVE_*() macros are provided to facilitate
0378 access to the members.
0379
0380 * In either case, for SETREGSET it is permissible to omit the payload, in which
0381 case only the vector length and flags are changed (along with any
0382 consequences of those changes).
0383
0384 * In systems supporting SME when in streaming mode a GETREGSET for
0385 NT_REG_SVE will return only the user_sve_header with no register data,
0386 similarly a GETREGSET for NT_REG_SSVE will not return any register data
0387 when not in streaming mode.
0388
0389 * A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD.
0390
0391 * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
0392 requested VL is not supported, the effect will be the same as if the
0393 payload were omitted, except that an EIO error is reported. No
0394 attempt is made to translate the payload data to the correct layout
0395 for the vector length actually set. The thread's FPSIMD state is
0396 preserved, but the remaining bits of the SVE registers become
0397 unspecified. It is up to the caller to translate the payload layout
0398 for the actual VL and retry.
0399
0400 * Where SME is implemented it is not possible to GETREGSET the register
0401 state for normal SVE when in streaming mode, nor the streaming mode
0402 register state when in normal mode, regardless of the implementation defined
0403 behaviour of the hardware for sharing data between the two modes.
0404
0405 * Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
0406 streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
0407 if the target was not in streaming mode.
0408
0409 * The effect of writing a partial, incomplete payload is unspecified.
0410
0411
0412 8. ELF coredump extensions
0413 ---------------------------
0414
0415 * NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
0416 each thread of the dumped process. The contents will be equivalent to the
0417 data that would have been read if a PTRACE_GETREGSET of the corresponding
0418 type were executed for each thread when the coredump was generated.
0419
0420 9. System runtime configuration
0421 --------------------------------
0422
0423 * To mitigate the ABI impact of expansion of the signal frame, a policy
0424 mechanism is provided for administrators, distro maintainers and developers
0425 to set the default vector length for userspace processes:
0426
0427 /proc/sys/abi/sve_default_vector_length
0428
0429 Writing the text representation of an integer to this file sets the system
0430 default vector length to the specified value, unless the value is greater
0431 than the maximum vector length supported by the system in which case the
0432 default vector length is set to that maximum.
0433
0434 The result can be determined by reopening the file and reading its
0435 contents.
0436
0437 At boot, the default vector length is initially set to 64 or the maximum
0438 supported vector length, whichever is smaller. This determines the initial
0439 vector length of the init process (PID 1).
0440
0441 Reading this file returns the current system default vector length.
0442
0443 * At every execve() call, the new vector length of the new process is set to
0444 the system default vector length, unless
0445
0446 * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
0447 calling thread, or
0448
0449 * a deferred vector length change is pending, established via the
0450 PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
0451
0452 * Modifying the system default vector length does not affect the vector length
0453 of any existing process or thread that does not make an execve() call.
0454
0455
0456 Appendix A. SVE programmer's model (informative)
0457 =================================================
0458
0459 This section provides a minimal description of the additions made by SVE to the
0460 ARMv8-A programmer's model that are relevant to this document.
0461
0462 Note: This section is for information only and not intended to be complete or
0463 to replace any architectural specification.
0464
0465 A.1. Registers
0466 ---------------
0467
0468 In A64 state, SVE adds the following:
0469
0470 * 32 8VL-bit vector registers Z0..Z31
0471 For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
0472
0473 A register write using a Vn register name zeros all bits of the corresponding
0474 Zn except for bits [127:0].
0475
0476 * 16 VL-bit predicate registers P0..P15
0477
0478 * 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
0479
0480 * a VL "pseudo-register" that determines the size of each vector register
0481
0482 The SVE instruction set architecture provides no way to write VL directly.
0483 Instead, it can be modified only by EL1 and above, by writing appropriate
0484 system registers.
0485
0486 * The value of VL can be configured at runtime by EL1 and above:
0487 16 <= VL <= VLmax, where VL must be a multiple of 16.
0488
0489 * The maximum vector length is determined by the hardware:
0490 16 <= VLmax <= 256.
0491
0492 (The SVE architecture specifies 256, but permits future architecture
0493 revisions to raise this limit.)
0494
0495 * FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
0496 operations in a similar way to the way in which they interact with ARMv8
0497 floating-point operations::
0498
0499 8VL-1 128 0 bit index
0500 +---- //// -----------------+
0501 Z0 | : V0 |
0502 : :
0503 Z7 | : V7 |
0504 Z8 | : * V8 |
0505 : : :
0506 Z15 | : *V15 |
0507 Z16 | : V16 |
0508 : :
0509 Z31 | : V31 |
0510 +---- //// -----------------+
0511 31 0
0512 VL-1 0 +-------+
0513 +---- //// --+ FPSR | |
0514 P0 | | +-------+
0515 : | | *FPCR | |
0516 P15 | | +-------+
0517 +---- //// --+
0518 FFR | | +-----+
0519 +---- //// --+ VL | |
0520 +-----+
0521
0522 (*) callee-save:
0523 This only applies to bits [63:0] of Z-/V-registers.
0524 FPCR contains callee-save and caller-save bits. See [4] for details.
0525
0526
0527 A.2. Procedure call standard
0528 -----------------------------
0529
0530 The ARMv8-A base procedure call standard is extended as follows with respect to
0531 the additional SVE register state:
0532
0533 * All SVE register bits that are not shared with FP/SIMD are caller-save.
0534
0535 * Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
0536
0537 This follows from the way these bits are mapped to V8..V15, which are caller-
0538 save in the base procedure call standard.
0539
0540
0541 Appendix B. ARMv8-A FP/SIMD programmer's model
0542 ===============================================
0543
0544 Note: This section is for information only and not intended to be complete or
0545 to replace any architectural specification.
0546
0547 Refer to [4] for more information.
0548
0549 ARMv8-A defines the following floating-point / SIMD register state:
0550
0551 * 32 128-bit vector registers V0..V31
0552 * 2 32-bit status/control registers FPSR, FPCR
0553
0554 ::
0555
0556 127 0 bit index
0557 +---------------+
0558 V0 | |
0559 : : :
0560 V7 | |
0561 * V8 | |
0562 : : : :
0563 *V15 | |
0564 V16 | |
0565 : : :
0566 V31 | |
0567 +---------------+
0568
0569 31 0
0570 +-------+
0571 FPSR | |
0572 +-------+
0573 *FPCR | |
0574 +-------+
0575
0576 (*) callee-save:
0577 This only applies to bits [63:0] of V-registers.
0578 FPCR contains a mixture of callee-save and caller-save bits.
0579
0580
0581 References
0582 ==========
0583
0584 [1] arch/arm64/include/uapi/asm/sigcontext.h
0585 AArch64 Linux signal ABI definitions
0586
0587 [2] arch/arm64/include/uapi/asm/ptrace.h
0588 AArch64 Linux ptrace ABI definitions
0589
0590 [3] Documentation/arm64/cpu-feature-registers.rst
0591
0592 [4] ARM IHI0055C
0593 http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
0594 http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
0595 Procedure Call Standard for the ARM 64-bit Architecture (AArch64)