0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ==============
0004 5-level paging
0005 ==============
0006
0007 Overview
0008 ========
0009 Original x86-64 was limited by 4-level paging to 256 TiB of virtual address
0010 space and 64 TiB of physical address space. We are already bumping into
0011 this limit: some vendors offer servers with 64 TiB of memory today.
0012
0013 To overcome the limitation upcoming hardware will introduce support for
0014 5-level paging. It is a straight-forward extension of the current page
0015 table structure adding one more layer of translation.
0016
0017 It bumps the limits to 128 PiB of virtual address space and 4 PiB of
0018 physical address space. This "ought to be enough for anybody" ©.
0019
0020 QEMU 2.9 and later support 5-level paging.
0021
0022 Virtual memory layout for 5-level paging is described in
0023 Documentation/x86/x86_64/mm.rst
0024
0025
0026 Enabling 5-level paging
0027 =======================
0028 CONFIG_X86_5LEVEL=y enables the feature.
0029
0030 Kernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware.
0031 In this case additional page table level -- p4d -- will be folded at
0032 runtime.
0033
0034 User-space and large virtual address space
0035 ==========================================
0036 On x86, 5-level paging enables 56-bit userspace virtual address space.
0037 Not all user space is ready to handle wide addresses. It's known that
0038 at least some JIT compilers use higher bits in pointers to encode their
0039 information. It collides with valid pointers with 5-level paging and
0040 leads to crashes.
0041
0042 To mitigate this, we are not going to allocate virtual address space
0043 above 47-bit by default.
0044
0045 But userspace can ask for allocation from full address space by
0046 specifying hint address (with or without MAP_FIXED) above 47-bits.
0047
0048 If hint address set above 47-bit, but MAP_FIXED is not specified, we try
0049 to look for unmapped area by specified address. If it's already
0050 occupied, we look for unmapped area in *full* address space, rather than
0051 from 47-bit window.
0052
0053 A high hint address would only affect the allocation in question, but not
0054 any future mmap()s.
0055
0056 Specifying high hint address on older kernel or on machine without 5-level
0057 paging support is safe. The hint will be ignored and kernel will fall back
0058 to allocation from 47-bit address space.
0059
0060 This approach helps to easily make application's memory allocator aware
0061 about large address space without manually tracking allocated virtual
0062 address space.
0063
0064 One important case we need to handle here is interaction with MPX.
0065 MPX (without MAWA extension) cannot handle addresses above 47-bit, so we
0066 need to make sure that MPX cannot be enabled we already have VMA above
0067 the boundary and forbid creating such VMAs once MPX is enabled.