Back to home page

OSCL-LXR

 
 

    


0001 ==============================
0002 Memory Layout on AArch64 Linux
0003 ==============================
0004 
0005 Author: Catalin Marinas <catalin.marinas@arm.com>
0006 
0007 This document describes the virtual memory layout used by the AArch64
0008 Linux kernel. The architecture allows up to 4 levels of translation
0009 tables with a 4KB page size and up to 3 levels with a 64KB page size.
0010 
0011 AArch64 Linux uses either 3 levels or 4 levels of translation tables
0012 with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
0013 (256TB) virtual addresses, respectively, for both user and kernel. With
0014 64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
0015 virtual address, are used but the memory layout is the same.
0016 
0017 ARMv8.2 adds optional support for Large Virtual Address space. This is
0018 only available when running with a 64KB page size and expands the
0019 number of descriptors in the first level of translation.
0020 
0021 User addresses have bits 63:48 set to 0 while the kernel addresses have
0022 the same bits set to 1. TTBRx selection is given by bit 63 of the
0023 virtual address. The swapper_pg_dir contains only kernel (global)
0024 mappings while the user pgd contains only user (non-global) mappings.
0025 The swapper_pg_dir address is written to TTBR1 and never written to
0026 TTBR0.
0027 
0028 
0029 AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
0030 
0031   Start                 End                     Size            Use
0032   -----------------------------------------------------------------------
0033   0000000000000000      0000ffffffffffff         256TB          user
0034   ffff000000000000      ffff7fffffffffff         128TB          kernel logical memory map
0035  [ffff600000000000      ffff7fffffffffff]         32TB          [kasan shadow region]
0036   ffff800000000000      ffff800007ffffff         128MB          modules
0037   ffff800008000000      fffffbffefffffff         124TB          vmalloc
0038   fffffbfff0000000      fffffbfffdffffff         224MB          fixed mappings (top down)
0039   fffffbfffe000000      fffffbfffe7fffff           8MB          [guard region]
0040   fffffbfffe800000      fffffbffff7fffff          16MB          PCI I/O space
0041   fffffbffff800000      fffffbffffffffff           8MB          [guard region]
0042   fffffc0000000000      fffffdffffffffff           2TB          vmemmap
0043   fffffe0000000000      ffffffffffffffff           2TB          [guard region]
0044 
0045 
0046 AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
0047 
0048   Start                 End                     Size            Use
0049   -----------------------------------------------------------------------
0050   0000000000000000      000fffffffffffff           4PB          user
0051   fff0000000000000      ffff7fffffffffff          ~4PB          kernel logical memory map
0052  [fffd800000000000      ffff7fffffffffff]        512TB          [kasan shadow region]
0053   ffff800000000000      ffff800007ffffff         128MB          modules
0054   ffff800008000000      fffffbffefffffff         124TB          vmalloc
0055   fffffbfff0000000      fffffbfffdffffff         224MB          fixed mappings (top down)
0056   fffffbfffe000000      fffffbfffe7fffff           8MB          [guard region]
0057   fffffbfffe800000      fffffbffff7fffff          16MB          PCI I/O space
0058   fffffbffff800000      fffffbffffffffff           8MB          [guard region]
0059   fffffc0000000000      ffffffdfffffffff          ~4TB          vmemmap
0060   ffffffe000000000      ffffffffffffffff         128GB          [guard region]
0061 
0062 
0063 Translation table lookup with 4KB pages::
0064 
0065   +--------+--------+--------+--------+--------+--------+--------+--------+
0066   |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
0067   +--------+--------+--------+--------+--------+--------+--------+--------+
0068    |                 |         |         |         |         |
0069    |                 |         |         |         |         v
0070    |                 |         |         |         |   [11:0]  in-page offset
0071    |                 |         |         |         +-> [20:12] L3 index
0072    |                 |         |         +-----------> [29:21] L2 index
0073    |                 |         +---------------------> [38:30] L1 index
0074    |                 +-------------------------------> [47:39] L0 index
0075    +-------------------------------------------------> [63] TTBR0/1
0076 
0077 
0078 Translation table lookup with 64KB pages::
0079 
0080   +--------+--------+--------+--------+--------+--------+--------+--------+
0081   |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
0082   +--------+--------+--------+--------+--------+--------+--------+--------+
0083    |                 |    |               |              |
0084    |                 |    |               |              v
0085    |                 |    |               |            [15:0]  in-page offset
0086    |                 |    |               +----------> [28:16] L3 index
0087    |                 |    +--------------------------> [41:29] L2 index
0088    |                 +-------------------------------> [47:42] L1 index (48-bit)
0089    |                                                   [51:42] L1 index (52-bit)
0090    +-------------------------------------------------> [63] TTBR0/1
0091 
0092 
0093 When using KVM without the Virtualization Host Extensions, the
0094 hypervisor maps kernel pages in EL2 at a fixed (and potentially
0095 random) offset from the linear mapping. See the kern_hyp_va macro and
0096 kvm_update_va_mask function for more details. MMIO devices such as
0097 GICv2 gets mapped next to the HYP idmap page, as do vectors when
0098 ARM64_SPECTRE_V3A is enabled for particular CPUs.
0099 
0100 When using KVM with the Virtualization Host Extensions, no additional
0101 mappings are created, since the host kernel runs directly in EL2.
0102 
0103 52-bit VA support in the kernel
0104 -------------------------------
0105 If the ARMv8.2-LVA optional feature is present, and we are running
0106 with a 64KB page size; then it is possible to use 52-bits of address
0107 space for both userspace and kernel addresses. However, any kernel
0108 binary that supports 52-bit must also be able to fall back to 48-bit
0109 at early boot time if the hardware feature is not present.
0110 
0111 This fallback mechanism necessitates the kernel .text to be in the
0112 higher addresses such that they are invariant to 48/52-bit VAs. Due
0113 to the kasan shadow being a fraction of the entire kernel VA space,
0114 the end of the kasan shadow must also be in the higher half of the
0115 kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
0116 the end of the kasan shadow is invariant and dependent on ~0UL,
0117 whilst the start address will "grow" towards the lower addresses).
0118 
0119 In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
0120 is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
0121 this obviates the need for an extra variable read. The physvirt
0122 offset and vmemmap offsets are computed at early boot to enable
0123 this logic.
0124 
0125 As a single binary will need to support both 48-bit and 52-bit VA
0126 spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
0127 also must be sized large enough to accommodate a fixed PAGE_OFFSET.
0128 
0129 Most code in the kernel should not need to consider the VA_BITS, for
0130 code that does need to know the VA size the variables are
0131 defined as follows:
0132 
0133 VA_BITS         constant        the *maximum* VA space size
0134 
0135 VA_BITS_MIN     constant        the *minimum* VA space size
0136 
0137 vabits_actual   variable        the *actual* VA space size
0138 
0139 
0140 Maximum and minimum sizes can be useful to ensure that buffers are
0141 sized large enough or that addresses are positioned close enough for
0142 the "worst" case.
0143 
0144 52-bit userspace VAs
0145 --------------------
0146 To maintain compatibility with software that relies on the ARMv8.0
0147 VA space maximum size of 48-bits, the kernel will, by default,
0148 return virtual addresses to userspace from a 48-bit range.
0149 
0150 Software can "opt-in" to receiving VAs from a 52-bit space by
0151 specifying an mmap hint parameter that is larger than 48-bit.
0152 
0153 For example:
0154 
0155 .. code-block:: c
0156 
0157    maybe_high_address = mmap(~0UL, size, prot, flags,...);
0158 
0159 It is also possible to build a debug kernel that returns addresses
0160 from a 52-bit space by enabling the following kernel config options:
0161 
0162 .. code-block:: sh
0163 
0164    CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
0165 
0166 Note that this option is only intended for debugging applications
0167 and should not be used in production.