Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 .. _physical_memory_model:
0004 
0005 =====================
0006 Physical Memory Model
0007 =====================
0008 
0009 Physical memory in a system may be addressed in different ways. The
0010 simplest case is when the physical memory starts at address 0 and
0011 spans a contiguous range up to the maximal address. It could be,
0012 however, that this range contains small holes that are not accessible
0013 for the CPU. Then there could be several contiguous ranges at
0014 completely distinct addresses. And, don't forget about NUMA, where
0015 different memory banks are attached to different CPUs.
0016 
0017 Linux abstracts this diversity using one of the two memory models:
0018 FLATMEM and SPARSEMEM. Each architecture defines what
0019 memory models it supports, what the default memory model is and
0020 whether it is possible to manually override that default.
0021 
0022 All the memory models track the status of physical page frames using
0023 struct page arranged in one or more arrays.
0024 
0025 Regardless of the selected memory model, there exists one-to-one
0026 mapping between the physical page frame number (PFN) and the
0027 corresponding `struct page`.
0028 
0029 Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
0030 helpers that allow the conversion from PFN to `struct page` and vice
0031 versa.
0032 
0033 FLATMEM
0034 =======
0035 
0036 The simplest memory model is FLATMEM. This model is suitable for
0037 non-NUMA systems with contiguous, or mostly contiguous, physical
0038 memory.
0039 
0040 In the FLATMEM memory model, there is a global `mem_map` array that
0041 maps the entire physical memory. For most architectures, the holes
0042 have entries in the `mem_map` array. The `struct page` objects
0043 corresponding to the holes are never fully initialized.
0044 
0045 To allocate the `mem_map` array, architecture specific setup code should
0046 call :c:func:`free_area_init` function. Yet, the mappings array is not
0047 usable until the call to :c:func:`memblock_free_all` that hands all the
0048 memory to the page allocator.
0049 
0050 An architecture may free parts of the `mem_map` array that do not cover the
0051 actual physical pages. In such case, the architecture specific
0052 :c:func:`pfn_valid` implementation should take the holes in the
0053 `mem_map` into account.
0054 
0055 With FLATMEM, the conversion between a PFN and the `struct page` is
0056 straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
0057 `mem_map` array.
0058 
0059 The `ARCH_PFN_OFFSET` defines the first page frame number for
0060 systems with physical memory starting at address different from 0.
0061 
0062 SPARSEMEM
0063 =========
0064 
0065 SPARSEMEM is the most versatile memory model available in Linux and it
0066 is the only memory model that supports several advanced features such
0067 as hot-plug and hot-remove of the physical memory, alternative memory
0068 maps for non-volatile memory devices and deferred initialization of
0069 the memory map for larger systems.
0070 
0071 The SPARSEMEM model presents the physical memory as a collection of
0072 sections. A section is represented with struct mem_section
0073 that contains `section_mem_map` that is, logically, a pointer to an
0074 array of struct pages. However, it is stored with some other magic
0075 that aids the sections management. The section size and maximal number
0076 of section is specified using `SECTION_SIZE_BITS` and
0077 `MAX_PHYSMEM_BITS` constants defined by each architecture that
0078 supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
0079 physical address that an architecture supports, the
0080 `SECTION_SIZE_BITS` is an arbitrary value.
0081 
0082 The maximal number of sections is denoted `NR_MEM_SECTIONS` and
0083 defined as
0084 
0085 .. math::
0086 
0087    NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
0088 
0089 The `mem_section` objects are arranged in a two-dimensional array
0090 called `mem_sections`. The size and placement of this array depend
0091 on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
0092 sections:
0093 
0094 * When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
0095   array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
0096   single `mem_section` object.
0097 * When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
0098   array is dynamically allocated. Each row contains PAGE_SIZE worth of
0099   `mem_section` objects and the number of rows is calculated to fit
0100   all the memory sections.
0101 
0102 The architecture setup code should call sparse_init() to
0103 initialize the memory sections and the memory maps.
0104 
0105 With SPARSEMEM there are two possible ways to convert a PFN to the
0106 corresponding `struct page` - a "classic sparse" and "sparse
0107 vmemmap". The selection is made at build time and it is determined by
0108 the value of `CONFIG_SPARSEMEM_VMEMMAP`.
0109 
0110 The classic sparse encodes the section number of a page in page->flags
0111 and uses high bits of a PFN to access the section that maps that page
0112 frame. Inside a section, the PFN is the index to the array of pages.
0113 
0114 The sparse vmemmap uses a virtually mapped memory map to optimize
0115 pfn_to_page and page_to_pfn operations. There is a global `struct
0116 page *vmemmap` pointer that points to a virtually contiguous array of
0117 `struct page` objects. A PFN is an index to that array and the
0118 offset of the `struct page` from `vmemmap` is the PFN of that
0119 page.
0120 
0121 To use vmemmap, an architecture has to reserve a range of virtual
0122 addresses that will map the physical pages containing the memory
0123 map and make sure that `vmemmap` points to that range. In addition,
0124 the architecture should implement :c:func:`vmemmap_populate` method
0125 that will allocate the physical memory and create page tables for the
0126 virtual memory map. If an architecture does not have any special
0127 requirements for the vmemmap mappings, it can use default
0128 :c:func:`vmemmap_populate_basepages` provided by the generic memory
0129 management.
0130 
0131 The virtually mapped memory map allows storing `struct page` objects
0132 for persistent memory devices in pre-allocated storage on those
0133 devices. This storage is represented with struct vmem_altmap
0134 that is eventually passed to vmemmap_populate() through a long chain
0135 of function calls. The vmemmap_populate() implementation may use the
0136 `vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to
0137 allocate memory map on the persistent memory device.
0138 
0139 ZONE_DEVICE
0140 ===========
0141 The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
0142 `struct page` `mem_map` services for device driver identified physical
0143 address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
0144 that the page objects for these address ranges are never marked online,
0145 and that a reference must be taken against the device, not just the page
0146 to keep the memory pinned for active use. `ZONE_DEVICE`, via
0147 :c:func:`devm_memremap_pages`, performs just enough memory hotplug to
0148 turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
0149 :c:func:`get_user_pages` service for the given range of pfns. Since the
0150 page reference count never drops below 1 the page is never tracked as
0151 free memory and the page's `struct list_head lru` space is repurposed
0152 for back referencing to the host device / driver that mapped the memory.
0153 
0154 While `SPARSEMEM` presents memory as a collection of sections,
0155 optionally collected into memory blocks, `ZONE_DEVICE` users have a need
0156 for smaller granularity of populating the `mem_map`. Given that
0157 `ZONE_DEVICE` memory is never marked online it is subsequently never
0158 subject to its memory ranges being exposed through the sysfs memory
0159 hotplug api on memory block boundaries. The implementation relies on
0160 this lack of user-api constraint to allow sub-section sized memory
0161 ranges to be specified to :c:func:`arch_add_memory`, the top-half of
0162 memory hotplug. Sub-section support allows for 2MB as the cross-arch
0163 common alignment granularity for :c:func:`devm_memremap_pages`.
0164 
0165 The users of `ZONE_DEVICE` are:
0166 
0167 * pmem: Map platform persistent memory to be used as a direct-I/O target
0168   via DAX mappings.
0169 
0170 * hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
0171   event callbacks to allow a device-driver to coordinate memory management
0172   events related to device-memory, typically GPU memory. See
0173   Documentation/mm/hmm.rst.
0174 
0175 * p2pdma: Create `struct page` objects to allow peer devices in a
0176   PCI/-E topology to coordinate direct-DMA operations between themselves,
0177   i.e. bypass host memory.