Back to home page

OSCL-LXR

 
 

    


0001 ===============================================
0002 Memory Tagging Extension (MTE) in AArch64 Linux
0003 ===============================================
0004 
0005 Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
0006          Catalin Marinas <catalin.marinas@arm.com>
0007 
0008 Date: 2020-02-25
0009 
0010 This document describes the provision of the Memory Tagging Extension
0011 functionality in AArch64 Linux.
0012 
0013 Introduction
0014 ============
0015 
0016 ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
0017 feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
0018 (Top Byte Ignore) feature and allows software to access a 4-bit
0019 allocation tag for each 16-byte granule in the physical address space.
0020 Such memory range must be mapped with the Normal-Tagged memory
0021 attribute. A logical tag is derived from bits 59-56 of the virtual
0022 address used for the memory access. A CPU with MTE enabled will compare
0023 the logical tag against the allocation tag and potentially raise an
0024 exception on mismatch, subject to system registers configuration.
0025 
0026 Userspace Support
0027 =================
0028 
0029 When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
0030 supported by the hardware, the kernel advertises the feature to
0031 userspace via ``HWCAP2_MTE``.
0032 
0033 PROT_MTE
0034 --------
0035 
0036 To access the allocation tags, a user process must enable the Tagged
0037 memory attribute on an address range using a new ``prot`` flag for
0038 ``mmap()`` and ``mprotect()``:
0039 
0040 ``PROT_MTE`` - Pages allow access to the MTE allocation tags.
0041 
0042 The allocation tag is set to 0 when such pages are first mapped in the
0043 user address space and preserved on copy-on-write. ``MAP_SHARED`` is
0044 supported and the allocation tags can be shared between processes.
0045 
0046 **Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
0047 RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
0048 types of mapping will result in ``-EINVAL`` returned by these system
0049 calls.
0050 
0051 **Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
0052 be cleared by ``mprotect()``.
0053 
0054 **Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
0055 ``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
0056 point after the system call.
0057 
0058 Tag Check Faults
0059 ----------------
0060 
0061 When ``PROT_MTE`` is enabled on an address range and a mismatch between
0062 the logical and allocation tags occurs on access, there are three
0063 configurable behaviours:
0064 
0065 - *Ignore* - This is the default mode. The CPU (and kernel) ignores the
0066   tag check fault.
0067 
0068 - *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
0069   ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
0070   memory access is not performed. If ``SIGSEGV`` is ignored or blocked
0071   by the offending thread, the containing process is terminated with a
0072   ``coredump``.
0073 
0074 - *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
0075   thread, asynchronously following one or multiple tag check faults,
0076   with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
0077   address is unknown).
0078 
0079 - *Asymmetric* - Reads are handled as for synchronous mode while writes
0080   are handled as for asynchronous mode.
0081 
0082 The user can select the above modes, per thread, using the
0083 ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
0084 contains any number of the following values in the ``PR_MTE_TCF_MASK``
0085 bit-field:
0086 
0087 - ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
0088                          (ignored if combined with other options)
0089 - ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
0090 - ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
0091 
0092 If no modes are specified, tag check faults are ignored. If a single
0093 mode is specified, the program will run in that mode. If multiple
0094 modes are specified, the mode is selected as described in the "Per-CPU
0095 preferred tag checking modes" section below.
0096 
0097 The current tag check fault configuration can be read using the
0098 ``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If
0099 multiple modes were requested then all will be reported.
0100 
0101 Tag checking can also be disabled for a user thread by setting the
0102 ``PSTATE.TCO`` bit with ``MSR TCO, #1``.
0103 
0104 **Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
0105 irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
0106 ``sigreturn()``.
0107 
0108 **Note**: There are no *match-all* logical tags available for user
0109 applications.
0110 
0111 **Note**: Kernel accesses to the user address space (e.g. ``read()``
0112 system call) are not checked if the user thread tag checking mode is
0113 ``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
0114 ``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
0115 address accesses, however it cannot always guarantee it. Kernel accesses
0116 to user addresses are always performed with an effective ``PSTATE.TCO``
0117 value of zero, regardless of the user configuration.
0118 
0119 Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
0120 -----------------------------------------------------------------
0121 
0122 The architecture allows excluding certain tags to be randomly generated
0123 via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
0124 excludes all tags other than 0. A user thread can enable specific tags
0125 in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
0126 flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
0127 in the ``PR_MTE_TAG_MASK`` bit-field.
0128 
0129 **Note**: The hardware uses an exclude mask but the ``prctl()``
0130 interface provides an include mask. An include mask of ``0`` (exclusion
0131 mask ``0xffff``) results in the CPU always generating tag ``0``.
0132 
0133 Per-CPU preferred tag checking mode
0134 -----------------------------------
0135 
0136 On some CPUs the performance of MTE in stricter tag checking modes
0137 is similar to that of less strict tag checking modes. This makes it
0138 worthwhile to enable stricter checks on those CPUs when a less strict
0139 checking mode is requested, in order to gain the error detection
0140 benefits of the stricter checks without the performance downsides. To
0141 support this scenario, a privileged user may configure a stricter
0142 tag checking mode as the CPU's preferred tag checking mode.
0143 
0144 The preferred tag checking mode for each CPU is controlled by
0145 ``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
0146 privileged user may write the value ``async``, ``sync`` or ``asymm``.  The
0147 default preferred mode for each CPU is ``async``.
0148 
0149 To allow a program to potentially run in the CPU's preferred tag
0150 checking mode, the user program may set multiple tag check fault mode
0151 bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
0152 flags, 0, 0, 0)`` system call. If both synchronous and asynchronous
0153 modes are requested then asymmetric mode may also be selected by the
0154 kernel. If the CPU's preferred tag checking mode is in the task's set
0155 of provided tag checking modes, that mode will be selected. Otherwise,
0156 one of the modes in the task's mode will be selected by the kernel
0157 from the task's mode set using the preference order:
0158 
0159         1. Asynchronous
0160         2. Asymmetric
0161         3. Synchronous
0162 
0163 Note that there is no way for userspace to request multiple modes and
0164 also disable asymmetric mode.
0165 
0166 Initial process state
0167 ---------------------
0168 
0169 On ``execve()``, the new process has the following configuration:
0170 
0171 - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
0172 - No tag checking modes are selected (tag check faults ignored)
0173 - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
0174 - ``PSTATE.TCO`` set to 0
0175 - ``PROT_MTE`` not set on any of the initial memory maps
0176 
0177 On ``fork()``, the new process inherits the parent's configuration and
0178 memory map attributes with the exception of the ``madvise()`` ranges
0179 with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
0180 to 0).
0181 
0182 The ``ptrace()`` interface
0183 --------------------------
0184 
0185 ``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
0186 the tags from or set the tags to a tracee's address space. The
0187 ``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
0188 data)`` where:
0189 
0190 - ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
0191 - ``pid`` - the tracee's PID.
0192 - ``addr`` - address in the tracee's address space.
0193 - ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
0194   a buffer of ``iov_len`` length in the tracer's address space.
0195 
0196 The tags in the tracer's ``iov_base`` buffer are represented as one
0197 4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
0198 tracee's address space.
0199 
0200 **Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
0201 will use the corresponding aligned address.
0202 
0203 ``ptrace()`` return value:
0204 
0205 - 0 - tags were copied, the tracer's ``iov_len`` was updated to the
0206   number of tags transferred. This may be smaller than the requested
0207   ``iov_len`` if the requested address range in the tracee's or the
0208   tracer's space cannot be accessed or does not have valid tags.
0209 - ``-EPERM`` - the specified process cannot be traced.
0210 - ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
0211   address) and no tags copied. ``iov_len`` not updated.
0212 - ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
0213   or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
0214 - ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
0215   mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
0216 
0217 **Note**: There are no transient errors for the requests above, so user
0218 programs should not retry in case of a non-zero system call return.
0219 
0220 ``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
0221 ``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
0222 address ABI control and MTE configuration of a process as per the
0223 ``prctl()`` options described in
0224 Documentation/arm64/tagged-address-abi.rst and above. The corresponding
0225 ``regset`` is 1 element of 8 bytes (``sizeof(long))``).
0226 
0227 Core dump support
0228 -----------------
0229 
0230 The allocation tags for user memory mapped with ``PROT_MTE`` are dumped
0231 in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The
0232 program header for such segment is defined as:
0233 
0234 :``p_type``: ``PT_AARCH64_MEMTAG_MTE``
0235 :``p_flags``: 0
0236 :``p_offset``: segment file offset
0237 :``p_vaddr``: segment virtual address, same as the corresponding
0238   ``PT_LOAD`` segment
0239 :``p_paddr``: 0
0240 :``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32``
0241   (two 4-bit tags cover 32 bytes of memory)
0242 :``p_memsz``: segment size in memory, same as the corresponding
0243   ``PT_LOAD`` segment
0244 :``p_align``: 0
0245 
0246 The tags are stored in the core file at ``p_offset`` as two 4-bit tags
0247 in a byte. With the tag granule of 16 bytes, a 4K page requires 128
0248 bytes in the core file.
0249 
0250 Example of correct usage
0251 ========================
0252 
0253 *MTE Example code*
0254 
0255 .. code-block:: c
0256 
0257     /*
0258      * To be compiled with -march=armv8.5-a+memtag
0259      */
0260     #include <errno.h>
0261     #include <stdint.h>
0262     #include <stdio.h>
0263     #include <stdlib.h>
0264     #include <unistd.h>
0265     #include <sys/auxv.h>
0266     #include <sys/mman.h>
0267     #include <sys/prctl.h>
0268 
0269     /*
0270      * From arch/arm64/include/uapi/asm/hwcap.h
0271      */
0272     #define HWCAP2_MTE              (1 << 18)
0273 
0274     /*
0275      * From arch/arm64/include/uapi/asm/mman.h
0276      */
0277     #define PROT_MTE                 0x20
0278 
0279     /*
0280      * From include/uapi/linux/prctl.h
0281      */
0282     #define PR_SET_TAGGED_ADDR_CTRL 55
0283     #define PR_GET_TAGGED_ADDR_CTRL 56
0284     # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
0285     # define PR_MTE_TCF_SHIFT       1
0286     # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
0287     # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
0288     # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
0289     # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
0290     # define PR_MTE_TAG_SHIFT       3
0291     # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
0292 
0293     /*
0294      * Insert a random logical tag into the given pointer.
0295      */
0296     #define insert_random_tag(ptr) ({                       \
0297             uint64_t __val;                                 \
0298             asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
0299             __val;                                          \
0300     })
0301 
0302     /*
0303      * Set the allocation tag on the destination address.
0304      */
0305     #define set_tag(tagged_addr) do {                                      \
0306             asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
0307     } while (0)
0308 
0309     int main()
0310     {
0311             unsigned char *a;
0312             unsigned long page_sz = sysconf(_SC_PAGESIZE);
0313             unsigned long hwcap2 = getauxval(AT_HWCAP2);
0314 
0315             /* check if MTE is present */
0316             if (!(hwcap2 & HWCAP2_MTE))
0317                     return EXIT_FAILURE;
0318 
0319             /*
0320              * Enable the tagged address ABI, synchronous or asynchronous MTE
0321              * tag check faults (based on per-CPU preference) and allow all
0322              * non-zero tags in the randomly generated set.
0323              */
0324             if (prctl(PR_SET_TAGGED_ADDR_CTRL,
0325                       PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
0326                       (0xfffe << PR_MTE_TAG_SHIFT),
0327                       0, 0, 0)) {
0328                     perror("prctl() failed");
0329                     return EXIT_FAILURE;
0330             }
0331 
0332             a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
0333                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
0334             if (a == MAP_FAILED) {
0335                     perror("mmap() failed");
0336                     return EXIT_FAILURE;
0337             }
0338 
0339             /*
0340              * Enable MTE on the above anonymous mmap. The flag could be passed
0341              * directly to mmap() and skip this step.
0342              */
0343             if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
0344                     perror("mprotect() failed");
0345                     return EXIT_FAILURE;
0346             }
0347 
0348             /* access with the default tag (0) */
0349             a[0] = 1;
0350             a[1] = 2;
0351 
0352             printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
0353 
0354             /* set the logical and allocation tags */
0355             a = (unsigned char *)insert_random_tag(a);
0356             set_tag(a);
0357 
0358             printf("%p\n", a);
0359 
0360             /* non-zero tag access */
0361             a[0] = 3;
0362             printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
0363 
0364             /*
0365              * If MTE is enabled correctly the next instruction will generate an
0366              * exception.
0367              */
0368             printf("Expecting SIGSEGV...\n");
0369             a[16] = 0xdd;
0370 
0371             /* this should not be printed in the PR_MTE_TCF_SYNC mode */
0372             printf("...haven't got one\n");
0373 
0374             return EXIT_FAILURE;
0375     }