0001 ===============================================
0002 Memory Tagging Extension (MTE) in AArch64 Linux
0003 ===============================================
0004
0005 Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
0006 Catalin Marinas <catalin.marinas@arm.com>
0007
0008 Date: 2020-02-25
0009
0010 This document describes the provision of the Memory Tagging Extension
0011 functionality in AArch64 Linux.
0012
0013 Introduction
0014 ============
0015
0016 ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
0017 feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
0018 (Top Byte Ignore) feature and allows software to access a 4-bit
0019 allocation tag for each 16-byte granule in the physical address space.
0020 Such memory range must be mapped with the Normal-Tagged memory
0021 attribute. A logical tag is derived from bits 59-56 of the virtual
0022 address used for the memory access. A CPU with MTE enabled will compare
0023 the logical tag against the allocation tag and potentially raise an
0024 exception on mismatch, subject to system registers configuration.
0025
0026 Userspace Support
0027 =================
0028
0029 When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
0030 supported by the hardware, the kernel advertises the feature to
0031 userspace via ``HWCAP2_MTE``.
0032
0033 PROT_MTE
0034 --------
0035
0036 To access the allocation tags, a user process must enable the Tagged
0037 memory attribute on an address range using a new ``prot`` flag for
0038 ``mmap()`` and ``mprotect()``:
0039
0040 ``PROT_MTE`` - Pages allow access to the MTE allocation tags.
0041
0042 The allocation tag is set to 0 when such pages are first mapped in the
0043 user address space and preserved on copy-on-write. ``MAP_SHARED`` is
0044 supported and the allocation tags can be shared between processes.
0045
0046 **Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
0047 RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
0048 types of mapping will result in ``-EINVAL`` returned by these system
0049 calls.
0050
0051 **Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
0052 be cleared by ``mprotect()``.
0053
0054 **Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
0055 ``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
0056 point after the system call.
0057
0058 Tag Check Faults
0059 ----------------
0060
0061 When ``PROT_MTE`` is enabled on an address range and a mismatch between
0062 the logical and allocation tags occurs on access, there are three
0063 configurable behaviours:
0064
0065 - *Ignore* - This is the default mode. The CPU (and kernel) ignores the
0066 tag check fault.
0067
0068 - *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
0069 ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
0070 memory access is not performed. If ``SIGSEGV`` is ignored or blocked
0071 by the offending thread, the containing process is terminated with a
0072 ``coredump``.
0073
0074 - *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
0075 thread, asynchronously following one or multiple tag check faults,
0076 with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
0077 address is unknown).
0078
0079 - *Asymmetric* - Reads are handled as for synchronous mode while writes
0080 are handled as for asynchronous mode.
0081
0082 The user can select the above modes, per thread, using the
0083 ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
0084 contains any number of the following values in the ``PR_MTE_TCF_MASK``
0085 bit-field:
0086
0087 - ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults
0088 (ignored if combined with other options)
0089 - ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode
0090 - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
0091
0092 If no modes are specified, tag check faults are ignored. If a single
0093 mode is specified, the program will run in that mode. If multiple
0094 modes are specified, the mode is selected as described in the "Per-CPU
0095 preferred tag checking modes" section below.
0096
0097 The current tag check fault configuration can be read using the
0098 ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If
0099 multiple modes were requested then all will be reported.
0100
0101 Tag checking can also be disabled for a user thread by setting the
0102 ``PSTATE.TCO`` bit with ``MSR TCO, #1``.
0103
0104 **Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
0105 irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
0106 ``sigreturn()``.
0107
0108 **Note**: There are no *match-all* logical tags available for user
0109 applications.
0110
0111 **Note**: Kernel accesses to the user address space (e.g. ``read()``
0112 system call) are not checked if the user thread tag checking mode is
0113 ``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
0114 ``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
0115 address accesses, however it cannot always guarantee it. Kernel accesses
0116 to user addresses are always performed with an effective ``PSTATE.TCO``
0117 value of zero, regardless of the user configuration.
0118
0119 Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
0120 -----------------------------------------------------------------
0121
0122 The architecture allows excluding certain tags to be randomly generated
0123 via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
0124 excludes all tags other than 0. A user thread can enable specific tags
0125 in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
0126 flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
0127 in the ``PR_MTE_TAG_MASK`` bit-field.
0128
0129 **Note**: The hardware uses an exclude mask but the ``prctl()``
0130 interface provides an include mask. An include mask of ``0`` (exclusion
0131 mask ``0xffff``) results in the CPU always generating tag ``0``.
0132
0133 Per-CPU preferred tag checking mode
0134 -----------------------------------
0135
0136 On some CPUs the performance of MTE in stricter tag checking modes
0137 is similar to that of less strict tag checking modes. This makes it
0138 worthwhile to enable stricter checks on those CPUs when a less strict
0139 checking mode is requested, in order to gain the error detection
0140 benefits of the stricter checks without the performance downsides. To
0141 support this scenario, a privileged user may configure a stricter
0142 tag checking mode as the CPU's preferred tag checking mode.
0143
0144 The preferred tag checking mode for each CPU is controlled by
0145 ``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
0146 privileged user may write the value ``async``, ``sync`` or ``asymm``. The
0147 default preferred mode for each CPU is ``async``.
0148
0149 To allow a program to potentially run in the CPU's preferred tag
0150 checking mode, the user program may set multiple tag check fault mode
0151 bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
0152 flags, 0, 0, 0)`` system call. If both synchronous and asynchronous
0153 modes are requested then asymmetric mode may also be selected by the
0154 kernel. If the CPU's preferred tag checking mode is in the task's set
0155 of provided tag checking modes, that mode will be selected. Otherwise,
0156 one of the modes in the task's mode will be selected by the kernel
0157 from the task's mode set using the preference order:
0158
0159 1. Asynchronous
0160 2. Asymmetric
0161 3. Synchronous
0162
0163 Note that there is no way for userspace to request multiple modes and
0164 also disable asymmetric mode.
0165
0166 Initial process state
0167 ---------------------
0168
0169 On ``execve()``, the new process has the following configuration:
0170
0171 - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
0172 - No tag checking modes are selected (tag check faults ignored)
0173 - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
0174 - ``PSTATE.TCO`` set to 0
0175 - ``PROT_MTE`` not set on any of the initial memory maps
0176
0177 On ``fork()``, the new process inherits the parent's configuration and
0178 memory map attributes with the exception of the ``madvise()`` ranges
0179 with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
0180 to 0).
0181
0182 The ``ptrace()`` interface
0183 --------------------------
0184
0185 ``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
0186 the tags from or set the tags to a tracee's address space. The
0187 ``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
0188 data)`` where:
0189
0190 - ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
0191 - ``pid`` - the tracee's PID.
0192 - ``addr`` - address in the tracee's address space.
0193 - ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
0194 a buffer of ``iov_len`` length in the tracer's address space.
0195
0196 The tags in the tracer's ``iov_base`` buffer are represented as one
0197 4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
0198 tracee's address space.
0199
0200 **Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
0201 will use the corresponding aligned address.
0202
0203 ``ptrace()`` return value:
0204
0205 - 0 - tags were copied, the tracer's ``iov_len`` was updated to the
0206 number of tags transferred. This may be smaller than the requested
0207 ``iov_len`` if the requested address range in the tracee's or the
0208 tracer's space cannot be accessed or does not have valid tags.
0209 - ``-EPERM`` - the specified process cannot be traced.
0210 - ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
0211 address) and no tags copied. ``iov_len`` not updated.
0212 - ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
0213 or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
0214 - ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
0215 mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
0216
0217 **Note**: There are no transient errors for the requests above, so user
0218 programs should not retry in case of a non-zero system call return.
0219
0220 ``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
0221 ``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
0222 address ABI control and MTE configuration of a process as per the
0223 ``prctl()`` options described in
0224 Documentation/arm64/tagged-address-abi.rst and above. The corresponding
0225 ``regset`` is 1 element of 8 bytes (``sizeof(long))``).
0226
0227 Core dump support
0228 -----------------
0229
0230 The allocation tags for user memory mapped with ``PROT_MTE`` are dumped
0231 in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The
0232 program header for such segment is defined as:
0233
0234 :``p_type``: ``PT_AARCH64_MEMTAG_MTE``
0235 :``p_flags``: 0
0236 :``p_offset``: segment file offset
0237 :``p_vaddr``: segment virtual address, same as the corresponding
0238 ``PT_LOAD`` segment
0239 :``p_paddr``: 0
0240 :``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32``
0241 (two 4-bit tags cover 32 bytes of memory)
0242 :``p_memsz``: segment size in memory, same as the corresponding
0243 ``PT_LOAD`` segment
0244 :``p_align``: 0
0245
0246 The tags are stored in the core file at ``p_offset`` as two 4-bit tags
0247 in a byte. With the tag granule of 16 bytes, a 4K page requires 128
0248 bytes in the core file.
0249
0250 Example of correct usage
0251 ========================
0252
0253 *MTE Example code*
0254
0255 .. code-block:: c
0256
0257 /*
0258 * To be compiled with -march=armv8.5-a+memtag
0259 */
0260 #include <errno.h>
0261 #include <stdint.h>
0262 #include <stdio.h>
0263 #include <stdlib.h>
0264 #include <unistd.h>
0265 #include <sys/auxv.h>
0266 #include <sys/mman.h>
0267 #include <sys/prctl.h>
0268
0269 /*
0270 * From arch/arm64/include/uapi/asm/hwcap.h
0271 */
0272 #define HWCAP2_MTE (1 << 18)
0273
0274 /*
0275 * From arch/arm64/include/uapi/asm/mman.h
0276 */
0277 #define PROT_MTE 0x20
0278
0279 /*
0280 * From include/uapi/linux/prctl.h
0281 */
0282 #define PR_SET_TAGGED_ADDR_CTRL 55
0283 #define PR_GET_TAGGED_ADDR_CTRL 56
0284 # define PR_TAGGED_ADDR_ENABLE (1UL << 0)
0285 # define PR_MTE_TCF_SHIFT 1
0286 # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT)
0287 # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT)
0288 # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT)
0289 # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT)
0290 # define PR_MTE_TAG_SHIFT 3
0291 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT)
0292
0293 /*
0294 * Insert a random logical tag into the given pointer.
0295 */
0296 #define insert_random_tag(ptr) ({ \
0297 uint64_t __val; \
0298 asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \
0299 __val; \
0300 })
0301
0302 /*
0303 * Set the allocation tag on the destination address.
0304 */
0305 #define set_tag(tagged_addr) do { \
0306 asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
0307 } while (0)
0308
0309 int main()
0310 {
0311 unsigned char *a;
0312 unsigned long page_sz = sysconf(_SC_PAGESIZE);
0313 unsigned long hwcap2 = getauxval(AT_HWCAP2);
0314
0315 /* check if MTE is present */
0316 if (!(hwcap2 & HWCAP2_MTE))
0317 return EXIT_FAILURE;
0318
0319 /*
0320 * Enable the tagged address ABI, synchronous or asynchronous MTE
0321 * tag check faults (based on per-CPU preference) and allow all
0322 * non-zero tags in the randomly generated set.
0323 */
0324 if (prctl(PR_SET_TAGGED_ADDR_CTRL,
0325 PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
0326 (0xfffe << PR_MTE_TAG_SHIFT),
0327 0, 0, 0)) {
0328 perror("prctl() failed");
0329 return EXIT_FAILURE;
0330 }
0331
0332 a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
0333 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
0334 if (a == MAP_FAILED) {
0335 perror("mmap() failed");
0336 return EXIT_FAILURE;
0337 }
0338
0339 /*
0340 * Enable MTE on the above anonymous mmap. The flag could be passed
0341 * directly to mmap() and skip this step.
0342 */
0343 if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
0344 perror("mprotect() failed");
0345 return EXIT_FAILURE;
0346 }
0347
0348 /* access with the default tag (0) */
0349 a[0] = 1;
0350 a[1] = 2;
0351
0352 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
0353
0354 /* set the logical and allocation tags */
0355 a = (unsigned char *)insert_random_tag(a);
0356 set_tag(a);
0357
0358 printf("%p\n", a);
0359
0360 /* non-zero tag access */
0361 a[0] = 3;
0362 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
0363
0364 /*
0365 * If MTE is enabled correctly the next instruction will generate an
0366 * exception.
0367 */
0368 printf("Expecting SIGSEGV...\n");
0369 a[16] = 0xdd;
0370
0371 /* this should not be printed in the PR_MTE_TCF_SYNC mode */
0372 printf("...haven't got one\n");
0373
0374 return EXIT_FAILURE;
0375 }