Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ======================
0004 Memory Protection Keys
0005 ======================
0006 
0007 Memory Protection Keys provide a mechanism for enforcing page-based
0008 protections, but without requiring modification of the page tables when an
0009 application changes protection domains.
0010 
0011 Pkeys Userspace (PKU) is a feature which can be found on:
0012         * Intel server CPUs, Skylake and later
0013         * Intel client CPUs, Tiger Lake (11th Gen Core) and later
0014         * Future AMD CPUs
0015 
0016 Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
0017 a "protection key", giving 16 possible keys.
0018 
0019 Protections for each key are defined with a per-CPU user-accessible register
0020 (PKRU).  Each of these is a 32-bit register storing two bits (Access Disable
0021 and Write Disable) for each of 16 keys.
0022 
0023 Being a CPU register, PKRU is inherently thread-local, potentially giving each
0024 thread a different set of protections from every other thread.
0025 
0026 There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
0027 register.  The feature is only available in 64-bit mode, even though there is
0028 theoretically space in the PAE PTEs.  These permissions are enforced on data
0029 access only and have no effect on instruction fetches.
0030 
0031 Syscalls
0032 ========
0033 
0034 There are 3 system calls which directly interact with pkeys::
0035 
0036         int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
0037         int pkey_free(int pkey);
0038         int pkey_mprotect(unsigned long start, size_t len,
0039                           unsigned long prot, int pkey);
0040 
0041 Before a pkey can be used, it must first be allocated with
0042 pkey_alloc().  An application calls the WRPKRU instruction
0043 directly in order to change access permissions to memory covered
0044 with a key.  In this example WRPKRU is wrapped by a C function
0045 called pkey_set().
0046 ::
0047 
0048         int real_prot = PROT_READ|PROT_WRITE;
0049         pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
0050         ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
0051         ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
0052         ... application runs here
0053 
0054 Now, if the application needs to update the data at 'ptr', it can
0055 gain access, do the update, then remove its write access::
0056 
0057         pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
0058         *ptr = foo; // assign something
0059         pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
0060 
0061 Now when it frees the memory, it will also free the pkey since it
0062 is no longer in use::
0063 
0064         munmap(ptr, PAGE_SIZE);
0065         pkey_free(pkey);
0066 
0067 .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
0068           An example implementation can be found in
0069           tools/testing/selftests/x86/protection_keys.c.
0070 
0071 Behavior
0072 ========
0073 
0074 The kernel attempts to make protection keys consistent with the
0075 behavior of a plain mprotect().  For instance if you do this::
0076 
0077         mprotect(ptr, size, PROT_NONE);
0078         something(ptr);
0079 
0080 you can expect the same effects with protection keys when doing this::
0081 
0082         pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
0083         pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
0084         something(ptr);
0085 
0086 That should be true whether something() is a direct access to 'ptr'
0087 like::
0088 
0089         *ptr = foo;
0090 
0091 or when the kernel does the access on the application's behalf like
0092 with a read()::
0093 
0094         read(fd, ptr, 1);
0095 
0096 The kernel will send a SIGSEGV in both cases, but si_code will be set
0097 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
0098 the plain mprotect() permissions are violated.