0001 .. _split_page_table_lock:
0002
0003 =====================
0004 Split page table lock
0005 =====================
0006
0007 Originally, mm->page_table_lock spinlock protected all page tables of the
0008 mm_struct. But this approach leads to poor page fault scalability of
0009 multi-threaded applications due high contention on the lock. To improve
0010 scalability, split page table lock was introduced.
0011
0012 With split page table lock we have separate per-table lock to serialize
0013 access to the table. At the moment we use split lock for PTE and PMD
0014 tables. Access to higher level tables protected by mm->page_table_lock.
0015
0016 There are helpers to lock/unlock a table and other accessor functions:
0017
0018 - pte_offset_map_lock()
0019 maps pte and takes PTE table lock, returns pointer to the taken
0020 lock;
0021 - pte_unmap_unlock()
0022 unlocks and unmaps PTE table;
0023 - pte_alloc_map_lock()
0024 allocates PTE table if needed and take the lock, returns pointer
0025 to taken lock or NULL if allocation failed;
0026 - pte_lockptr()
0027 returns pointer to PTE table lock;
0028 - pmd_lock()
0029 takes PMD table lock, returns pointer to taken lock;
0030 - pmd_lockptr()
0031 returns pointer to PMD table lock;
0032
0033 Split page table lock for PTE tables is enabled compile-time if
0034 CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
0035 If split lock is disabled, all tables are guarded by mm->page_table_lock.
0036
0037 Split page table lock for PMD tables is enabled, if it's enabled for PTE
0038 tables and the architecture supports it (see below).
0039
0040 Hugetlb and split page table lock
0041 =================================
0042
0043 Hugetlb can support several page sizes. We use split lock only for PMD
0044 level, but not for PUD.
0045
0046 Hugetlb-specific helpers:
0047
0048 - huge_pte_lock()
0049 takes pmd split lock for PMD_SIZE page, mm->page_table_lock
0050 otherwise;
0051 - huge_pte_lockptr()
0052 returns pointer to table lock;
0053
0054 Support of split page table lock by an architecture
0055 ===================================================
0056
0057 There's no need in special enabling of PTE split page table lock: everything
0058 required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
0059 must be called on PTE table allocation / freeing.
0060
0061 Make sure the architecture doesn't use slab allocator for page table
0062 allocation: slab uses page->slab_cache for its pages.
0063 This field shares storage with page->ptl.
0064
0065 PMD split lock only makes sense if you have more than two page table
0066 levels.
0067
0068 PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
0069 allocation and pgtable_pmd_page_dtor() on freeing.
0070
0071 Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
0072 pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
0073 paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
0074
0075 With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
0076
0077 NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
0078 be handled properly.
0079
0080 page->ptl
0081 =========
0082
0083 page->ptl is used to access split page table lock, where 'page' is struct
0084 page of page containing the table. It shares storage with page->private
0085 (and few other fields in union).
0086
0087 To avoid increasing size of struct page and have best performance, we use a
0088 trick:
0089
0090 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
0091 can avoid indirect access and save a cache line.
0092 - if size of spinlock_t is bigger then size of long, we use page->ptl as
0093 pointer to spinlock_t and allocate it dynamically. This allows to use
0094 split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
0095 one more cache line for indirect access;
0096
0097 The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
0098 pgtable_pmd_page_ctor() for PMD table.
0099
0100 Please, never access page->ptl directly -- use appropriate helper.