0001 ============================
0002 Subsystem Trace Points: kmem
0003 ============================
0004
0005 The kmem tracing system captures events related to object and page allocation
0006 within the kernel. Broadly speaking there are five major subheadings.
0007
0008 - Slab allocation of small objects of unknown type (kmalloc)
0009 - Slab allocation of small objects of known type
0010 - Page allocation
0011 - Per-CPU Allocator Activity
0012 - External Fragmentation
0013
0014 This document describes what each of the tracepoints is and why they
0015 might be useful.
0016
0017 1. Slab allocation of small objects of unknown type
0018 ===================================================
0019 ::
0020
0021 kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
0022 kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
0023 kfree call_site=%lx ptr=%p
0024
0025 Heavy activity for these events may indicate that a specific cache is
0026 justified, particularly if kmalloc slab pages are getting significantly
0027 internal fragmented as a result of the allocation pattern. By correlating
0028 kmalloc with kfree, it may be possible to identify memory leaks and where
0029 the allocation sites were.
0030
0031
0032 2. Slab allocation of small objects of known type
0033 =================================================
0034 ::
0035
0036 kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
0037 kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
0038 kmem_cache_free call_site=%lx ptr=%p
0039
0040 These events are similar in usage to the kmalloc-related events except that
0041 it is likely easier to pin the event down to a specific cache. At the time
0042 of writing, no information is available on what slab is being allocated from,
0043 but the call_site can usually be used to extrapolate that information.
0044
0045 3. Page allocation
0046 ==================
0047 ::
0048
0049 mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
0050 mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
0051 mm_page_free page=%p pfn=%lu order=%d
0052 mm_page_free_batched page=%p pfn=%lu order=%d cold=%d
0053
0054 These four events deal with page allocation and freeing. mm_page_alloc is
0055 a simple indicator of page allocator activity. Pages may be allocated from
0056 the per-CPU allocator (high performance) or the buddy allocator.
0057
0058 If pages are allocated directly from the buddy allocator, the
0059 mm_page_alloc_zone_locked event is triggered. This event is important as high
0060 amounts of activity imply high activity on the zone->lock. Taking this lock
0061 impairs performance by disabling interrupts, dirtying cache lines between
0062 CPUs and serialising many CPUs.
0063
0064 When a page is freed directly by the caller, the only mm_page_free event
0065 is triggered. Significant amounts of activity here could indicate that the
0066 callers should be batching their activities.
0067
0068 When pages are freed in batch, the also mm_page_free_batched is triggered.
0069 Broadly speaking, pages are taken off the LRU lock in bulk and
0070 freed in batch with a page list. Significant amounts of activity here could
0071 indicate that the system is under memory pressure and can also indicate
0072 contention on the lruvec->lru_lock.
0073
0074 4. Per-CPU Allocator Activity
0075 =============================
0076 ::
0077
0078 mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
0079 mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d
0080
0081 In front of the page allocator is a per-cpu page allocator. It exists only
0082 for order-0 pages, reduces contention on the zone->lock and reduces the
0083 amount of writing on struct page.
0084
0085 When a per-CPU list is empty or pages of the wrong type are allocated,
0086 the zone->lock will be taken once and the per-CPU list refilled. The event
0087 triggered is mm_page_alloc_zone_locked for each page allocated with the
0088 event indicating whether it is for a percpu_refill or not.
0089
0090 When the per-CPU list is too full, a number of pages are freed, each one
0091 which triggers a mm_page_pcpu_drain event.
0092
0093 The individual nature of the events is so that pages can be tracked
0094 between allocation and freeing. A number of drain or refill pages that occur
0095 consecutively imply the zone->lock being taken once. Large amounts of per-CPU
0096 refills and drains could imply an imbalance between CPUs where too much work
0097 is being concentrated in one place. It could also indicate that the per-CPU
0098 lists should be a larger size. Finally, large amounts of refills on one CPU
0099 and drains on another could be a factor in causing large amounts of cache
0100 line bounces due to writes between CPUs and worth investigating if pages
0101 can be allocated and freed on the same CPU through some algorithm change.
0102
0103 5. External Fragmentation
0104 =========================
0105 ::
0106
0107 mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
0108
0109 External fragmentation affects whether a high-order allocation will be
0110 successful or not. For some types of hardware, this is important although
0111 it is avoided where possible. If the system is using huge pages and needs
0112 to be able to resize the pool over the lifetime of the system, this value
0113 is important.
0114
0115 Large numbers of this event implies that memory is fragmenting and
0116 high-order allocations will start failing at some time in the future. One
0117 means of reducing the occurrence of this event is to increase the size of
0118 min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
0119 pageblock_size is usually the size of the default hugepage size.