0001 .. _pagemap:
0002
0003 =============================
0004 Examining Process Page Tables
0005 =============================
0006
0007 pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
0008 userspace programs to examine the page tables and related information by
0009 reading files in ``/proc``.
0010
0011 There are four components to pagemap:
0012
0013 * ``/proc/pid/pagemap``. This file lets a userspace process find out which
0014 physical frame each virtual page is mapped to. It contains one 64-bit
0015 value for each virtual page, containing the following data (from
0016 ``fs/proc/task_mmu.c``, above pagemap_read):
0017
0018 * Bits 0-54 page frame number (PFN) if present
0019 * Bits 0-4 swap type if swapped
0020 * Bits 5-54 swap offset if swapped
0021 * Bit 55 pte is soft-dirty (see
0022 :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
0023 * Bit 56 page exclusively mapped (since 4.2)
0024 * Bit 57 pte is uffd-wp write-protected (since 5.13) (see
0025 :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
0026 * Bits 58-60 zero
0027 * Bit 61 page is file-page or shared-anon (since 3.5)
0028 * Bit 62 page swapped
0029 * Bit 63 page present
0030
0031 Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
0032 In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from
0033 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
0034 Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
0035
0036 If the page is not present but in swap, then the PFN contains an
0037 encoding of the swap file number and the page's offset into the
0038 swap. Unmapped pages return a null PFN. This allows determining
0039 precisely which pages are mapped (or in swap) and comparing mapped
0040 pages between processes.
0041
0042 Efficient users of this interface will use ``/proc/pid/maps`` to
0043 determine which areas of memory are actually mapped and llseek to
0044 skip over unmapped regions.
0045
0046 * ``/proc/kpagecount``. This file contains a 64-bit count of the number of
0047 times each page is mapped, indexed by PFN.
0048
0049 The page-types tool in the tools/vm directory can be used to query the
0050 number of times a page is mapped.
0051
0052 * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
0053 page, indexed by PFN.
0054
0055 The flags are (from ``fs/proc/page.c``, above kpageflags_read):
0056
0057 0. LOCKED
0058 1. ERROR
0059 2. REFERENCED
0060 3. UPTODATE
0061 4. DIRTY
0062 5. LRU
0063 6. ACTIVE
0064 7. SLAB
0065 8. WRITEBACK
0066 9. RECLAIM
0067 10. BUDDY
0068 11. MMAP
0069 12. ANON
0070 13. SWAPCACHE
0071 14. SWAPBACKED
0072 15. COMPOUND_HEAD
0073 16. COMPOUND_TAIL
0074 17. HUGE
0075 18. UNEVICTABLE
0076 19. HWPOISON
0077 20. NOPAGE
0078 21. KSM
0079 22. THP
0080 23. OFFLINE
0081 24. ZERO_PAGE
0082 25. IDLE
0083 26. PGTABLE
0084
0085 * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the
0086 memory cgroup each page is charged to, indexed by PFN. Only available when
0087 CONFIG_MEMCG is set.
0088
0089 Short descriptions to the page flags
0090 ====================================
0091
0092 0 - LOCKED
0093 The page is being locked for exclusive access, e.g. by undergoing read/write
0094 IO.
0095 7 - SLAB
0096 The page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator.
0097 When compound page is used, SLUB/SLQB will only set this flag on the head
0098 page; SLOB will not flag it at all.
0099 10 - BUDDY
0100 A free memory block managed by the buddy system allocator.
0101 The buddy system organizes free memory in blocks of various orders.
0102 An order N block has 2^N physically contiguous pages, with the BUDDY flag
0103 set for and _only_ for the first page.
0104 15 - COMPOUND_HEAD
0105 A compound page with order N consists of 2^N physically contiguous pages.
0106 A compound page with order 2 takes the form of "HTTT", where H donates its
0107 head page and T donates its tail page(s). The major consumers of compound
0108 pages are hugeTLB pages
0109 (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
0110 the SLUB etc. memory allocators and various device drivers.
0111 However in this interface, only huge/giga pages are made visible
0112 to end users.
0113 16 - COMPOUND_TAIL
0114 A compound page tail (see description above).
0115 17 - HUGE
0116 This is an integral part of a HugeTLB page.
0117 19 - HWPOISON
0118 Hardware detected memory corruption on this page: don't touch the data!
0119 20 - NOPAGE
0120 No page frame exists at the requested address.
0121 21 - KSM
0122 Identical memory pages dynamically shared between one or more processes.
0123 22 - THP
0124 Contiguous pages which construct transparent hugepages.
0125 23 - OFFLINE
0126 The page is logically offline.
0127 24 - ZERO_PAGE
0128 Zero page for pfn_zero or huge_zero page.
0129 25 - IDLE
0130 The page has not been accessed since it was marked idle (see
0131 :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
0132 Note that this flag may be stale in case the page was accessed via
0133 a PTE. To make sure the flag is up-to-date one has to read
0134 ``/sys/kernel/mm/page_idle/bitmap`` first.
0135 26 - PGTABLE
0136 The page is in use as a page table.
0137
0138 IO related page flags
0139 ---------------------
0140
0141 1 - ERROR
0142 IO error occurred.
0143 3 - UPTODATE
0144 The page has up-to-date data.
0145 ie. for file backed page: (in-memory data revision >= on-disk one)
0146 4 - DIRTY
0147 The page has been written to, hence contains new data.
0148 i.e. for file backed page: (in-memory data revision > on-disk one)
0149 8 - WRITEBACK
0150 The page is being synced to disk.
0151
0152 LRU related page flags
0153 ----------------------
0154
0155 5 - LRU
0156 The page is in one of the LRU lists.
0157 6 - ACTIVE
0158 The page is in the active LRU list.
0159 18 - UNEVICTABLE
0160 The page is in the unevictable (non-)LRU list It is somehow pinned and
0161 not a candidate for LRU page reclaims, e.g. ramfs pages,
0162 shmctl(SHM_LOCK) and mlock() memory segments.
0163 2 - REFERENCED
0164 The page has been referenced since last LRU list enqueue/requeue.
0165 9 - RECLAIM
0166 The page will be reclaimed soon after its pageout IO completed.
0167 11 - MMAP
0168 A memory mapped page.
0169 12 - ANON
0170 A memory mapped page that is not part of a file.
0171 13 - SWAPCACHE
0172 The page is mapped to swap space, i.e. has an associated swap entry.
0173 14 - SWAPBACKED
0174 The page is backed by swap/RAM.
0175
0176 The page-types tool in the tools/vm directory can be used to query the
0177 above flags.
0178
0179 Using pagemap to do something useful
0180 ====================================
0181
0182 The general procedure for using pagemap to find out about a process' memory
0183 usage goes like this:
0184
0185 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
0186 mapped to what.
0187 2. Select the maps you are interested in -- all of them, or a particular
0188 library, or the stack or the heap, etc.
0189 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
0190 4. Read a u64 for each page from pagemap.
0191 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you
0192 just read, seek to that entry in the file, and read the data you want.
0193
0194 For example, to find the "unique set size" (USS), which is the amount of
0195 memory that a process is using that is not shared with any other process,
0196 you can go through every map in the process, find the PFNs, look those up
0197 in kpagecount, and tally up the number of pages that are only referenced
0198 once.
0199
0200 Exceptions for Shared Memory
0201 ============================
0202
0203 Page table entries for shared pages are cleared when the pages are zapped or
0204 swapped out. This makes swapped out pages indistinguishable from never-allocated
0205 ones.
0206
0207 In kernel space, the swap location can still be retrieved from the page cache.
0208 However, values stored only on the normal PTE get lost irretrievably when the
0209 page is swapped out (i.e. SOFT_DIRTY).
0210
0211 In user space, whether the page is present, swapped or none can be deduced with
0212 the help of lseek and/or mincore system calls.
0213
0214 lseek() can differentiate between accessed pages (present or swapped out) and
0215 holes (none/non-allocated) by specifying the SEEK_DATA flag on the file where
0216 the pages are backed. For anonymous shared pages, the file can be found in
0217 ``/proc/pid/map_files/``.
0218
0219 mincore() can differentiate between pages in memory (present, including swap
0220 cache) and out of memory (swapped out or none/non-allocated).
0221
0222 Other notes
0223 ===========
0224
0225 Reading from any of the files will return -EINVAL if you are not starting
0226 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
0227 into the file), or if the size of the read is not a multiple of 8 bytes.
0228
0229 Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
0230 always 12 at most architectures). Since Linux 3.11 their meaning changes
0231 after first clear of soft-dirty bits. Since Linux 4.2 they are used for
0232 flags unconditionally.