Back to home page

OSCL-LXR

 
 

    


0001 .. _pagemap:
0002 
0003 =============================
0004 Examining Process Page Tables
0005 =============================
0006 
0007 pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
0008 userspace programs to examine the page tables and related information by
0009 reading files in ``/proc``.
0010 
0011 There are four components to pagemap:
0012 
0013  * ``/proc/pid/pagemap``.  This file lets a userspace process find out which
0014    physical frame each virtual page is mapped to.  It contains one 64-bit
0015    value for each virtual page, containing the following data (from
0016    ``fs/proc/task_mmu.c``, above pagemap_read):
0017 
0018     * Bits 0-54  page frame number (PFN) if present
0019     * Bits 0-4   swap type if swapped
0020     * Bits 5-54  swap offset if swapped
0021     * Bit  55    pte is soft-dirty (see
0022       :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
0023     * Bit  56    page exclusively mapped (since 4.2)
0024     * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
0025       :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
0026     * Bits 58-60 zero
0027     * Bit  61    page is file-page or shared-anon (since 3.5)
0028     * Bit  62    page swapped
0029     * Bit  63    page present
0030 
0031    Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
0032    In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
0033    4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
0034    Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
0035 
0036    If the page is not present but in swap, then the PFN contains an
0037    encoding of the swap file number and the page's offset into the
0038    swap. Unmapped pages return a null PFN. This allows determining
0039    precisely which pages are mapped (or in swap) and comparing mapped
0040    pages between processes.
0041 
0042    Efficient users of this interface will use ``/proc/pid/maps`` to
0043    determine which areas of memory are actually mapped and llseek to
0044    skip over unmapped regions.
0045 
0046  * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
0047    times each page is mapped, indexed by PFN.
0048 
0049 The page-types tool in the tools/vm directory can be used to query the
0050 number of times a page is mapped.
0051 
0052  * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
0053    page, indexed by PFN.
0054 
0055    The flags are (from ``fs/proc/page.c``, above kpageflags_read):
0056 
0057     0. LOCKED
0058     1. ERROR
0059     2. REFERENCED
0060     3. UPTODATE
0061     4. DIRTY
0062     5. LRU
0063     6. ACTIVE
0064     7. SLAB
0065     8. WRITEBACK
0066     9. RECLAIM
0067     10. BUDDY
0068     11. MMAP
0069     12. ANON
0070     13. SWAPCACHE
0071     14. SWAPBACKED
0072     15. COMPOUND_HEAD
0073     16. COMPOUND_TAIL
0074     17. HUGE
0075     18. UNEVICTABLE
0076     19. HWPOISON
0077     20. NOPAGE
0078     21. KSM
0079     22. THP
0080     23. OFFLINE
0081     24. ZERO_PAGE
0082     25. IDLE
0083     26. PGTABLE
0084 
0085  * ``/proc/kpagecgroup``.  This file contains a 64-bit inode number of the
0086    memory cgroup each page is charged to, indexed by PFN. Only available when
0087    CONFIG_MEMCG is set.
0088 
0089 Short descriptions to the page flags
0090 ====================================
0091 
0092 0 - LOCKED
0093    The page is being locked for exclusive access, e.g. by undergoing read/write
0094    IO.
0095 7 - SLAB
0096    The page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator.
0097    When compound page is used, SLUB/SLQB will only set this flag on the head
0098    page; SLOB will not flag it at all.
0099 10 - BUDDY
0100     A free memory block managed by the buddy system allocator.
0101     The buddy system organizes free memory in blocks of various orders.
0102     An order N block has 2^N physically contiguous pages, with the BUDDY flag
0103     set for and _only_ for the first page.
0104 15 - COMPOUND_HEAD
0105     A compound page with order N consists of 2^N physically contiguous pages.
0106     A compound page with order 2 takes the form of "HTTT", where H donates its
0107     head page and T donates its tail page(s).  The major consumers of compound
0108     pages are hugeTLB pages
0109     (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
0110     the SLUB etc.  memory allocators and various device drivers.
0111     However in this interface, only huge/giga pages are made visible
0112     to end users.
0113 16 - COMPOUND_TAIL
0114     A compound page tail (see description above).
0115 17 - HUGE
0116     This is an integral part of a HugeTLB page.
0117 19 - HWPOISON
0118     Hardware detected memory corruption on this page: don't touch the data!
0119 20 - NOPAGE
0120     No page frame exists at the requested address.
0121 21 - KSM
0122     Identical memory pages dynamically shared between one or more processes.
0123 22 - THP
0124     Contiguous pages which construct transparent hugepages.
0125 23 - OFFLINE
0126     The page is logically offline.
0127 24 - ZERO_PAGE
0128     Zero page for pfn_zero or huge_zero page.
0129 25 - IDLE
0130     The page has not been accessed since it was marked idle (see
0131     :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
0132     Note that this flag may be stale in case the page was accessed via
0133     a PTE. To make sure the flag is up-to-date one has to read
0134     ``/sys/kernel/mm/page_idle/bitmap`` first.
0135 26 - PGTABLE
0136     The page is in use as a page table.
0137 
0138 IO related page flags
0139 ---------------------
0140 
0141 1 - ERROR
0142    IO error occurred.
0143 3 - UPTODATE
0144    The page has up-to-date data.
0145    ie. for file backed page: (in-memory data revision >= on-disk one)
0146 4 - DIRTY
0147    The page has been written to, hence contains new data.
0148    i.e. for file backed page: (in-memory data revision >  on-disk one)
0149 8 - WRITEBACK
0150    The page is being synced to disk.
0151 
0152 LRU related page flags
0153 ----------------------
0154 
0155 5 - LRU
0156    The page is in one of the LRU lists.
0157 6 - ACTIVE
0158    The page is in the active LRU list.
0159 18 - UNEVICTABLE
0160    The page is in the unevictable (non-)LRU list It is somehow pinned and
0161    not a candidate for LRU page reclaims, e.g. ramfs pages,
0162    shmctl(SHM_LOCK) and mlock() memory segments.
0163 2 - REFERENCED
0164    The page has been referenced since last LRU list enqueue/requeue.
0165 9 - RECLAIM
0166    The page will be reclaimed soon after its pageout IO completed.
0167 11 - MMAP
0168    A memory mapped page.
0169 12 - ANON
0170    A memory mapped page that is not part of a file.
0171 13 - SWAPCACHE
0172    The page is mapped to swap space, i.e. has an associated swap entry.
0173 14 - SWAPBACKED
0174    The page is backed by swap/RAM.
0175 
0176 The page-types tool in the tools/vm directory can be used to query the
0177 above flags.
0178 
0179 Using pagemap to do something useful
0180 ====================================
0181 
0182 The general procedure for using pagemap to find out about a process' memory
0183 usage goes like this:
0184 
0185  1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
0186     mapped to what.
0187  2. Select the maps you are interested in -- all of them, or a particular
0188     library, or the stack or the heap, etc.
0189  3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
0190  4. Read a u64 for each page from pagemap.
0191  5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``.  For each PFN you
0192     just read, seek to that entry in the file, and read the data you want.
0193 
0194 For example, to find the "unique set size" (USS), which is the amount of
0195 memory that a process is using that is not shared with any other process,
0196 you can go through every map in the process, find the PFNs, look those up
0197 in kpagecount, and tally up the number of pages that are only referenced
0198 once.
0199 
0200 Exceptions for Shared Memory
0201 ============================
0202 
0203 Page table entries for shared pages are cleared when the pages are zapped or
0204 swapped out. This makes swapped out pages indistinguishable from never-allocated
0205 ones.
0206 
0207 In kernel space, the swap location can still be retrieved from the page cache.
0208 However, values stored only on the normal PTE get lost irretrievably when the
0209 page is swapped out (i.e. SOFT_DIRTY).
0210 
0211 In user space, whether the page is present, swapped or none can be deduced with
0212 the help of lseek and/or mincore system calls.
0213 
0214 lseek() can differentiate between accessed pages (present or swapped out) and
0215 holes (none/non-allocated) by specifying the SEEK_DATA flag on the file where
0216 the pages are backed. For anonymous shared pages, the file can be found in
0217 ``/proc/pid/map_files/``.
0218 
0219 mincore() can differentiate between pages in memory (present, including swap
0220 cache) and out of memory (swapped out or none/non-allocated).
0221 
0222 Other notes
0223 ===========
0224 
0225 Reading from any of the files will return -EINVAL if you are not starting
0226 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
0227 into the file), or if the size of the read is not a multiple of 8 bytes.
0228 
0229 Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
0230 always 12 at most architectures). Since Linux 3.11 their meaning changes
0231 after first clear of soft-dirty bits. Since Linux 4.2 they are used for
0232 flags unconditionally.