Documentation/core-api/memory-allocation.rst

0001 .. _memory_allocation:
0002
0003 =======================
0004 Memory Allocation Guide
0005 =======================
0006
0007 Linux provides a variety of APIs for memory allocation. You can
0008 allocate small chunks using `kmalloc` or `kmem_cache_alloc` families,
0009 large virtually contiguous areas using `vmalloc` and its derivatives,
0010 or you can directly request pages from the page allocator with
0011 `alloc_pages`. It is also possible to use more specialized allocators,
0012 for instance `cma_alloc` or `zs_malloc`.
0013
0014 Most of the memory allocation APIs use GFP flags to express how that
0015 memory should be allocated. The GFP acronym stands for "get free
0016 pages", the underlying memory allocation function.
0017
0018 Diversity of the allocation APIs combined with the numerous GFP flags
0019 makes the question "How should I allocate memory?" not that easy to
0020 answer, although very likely you should use
0021
0022 ::
0023
0024   kzalloc(<size>, GFP_KERNEL);
0025
0026 Of course there are cases when other allocation APIs and different GFP
0027 flags must be used.
0028
0029 Get Free Page flags
0030 ===================
0031
0032 The GFP flags control the allocators behavior. They tell what memory
0033 zones can be used, how hard the allocator should try to find free
0034 memory, whether the memory can be accessed by the userspace etc. The
0035 :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
0036 reference documentation for the GFP flags and their combinations and
0037 here we briefly outline their recommended usage:
0038
0039   * Most of the time ``GFP_KERNEL`` is what you need. Memory for the
0040     kernel data structures, DMAable memory, inode cache, all these and
0041     many other allocations types can use ``GFP_KERNEL``. Note, that
0042     using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
0043     direct reclaim may be triggered under memory pressure; the calling
0044     context must be allowed to sleep.
0045   * If the allocation is performed from an atomic context, e.g interrupt
0046     handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and
0047     IO or filesystem operations. Consequently, under memory pressure
0048     ``GFP_NOWAIT`` allocation is likely to fail. Allocations which
0049     have a reasonable fallback should be using ``GFP_NOWARN``.
0050   * If you think that accessing memory reserves is justified and the kernel
0051     will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.
0052   * Untrusted allocations triggered from userspace should be a subject
0053     of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
0054     is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
0055     allocations that should be accounted.
0056   * Userspace allocations should use either of the ``GFP_USER``,
0057     ``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer
0058     the flag name the less restrictive it is.
0059
0060     ``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory
0061     will be directly accessible by the kernel and implies that the
0062     data is movable.
0063
0064     ``GFP_HIGHUSER`` means that the allocated memory is not movable,
0065     but it is not required to be directly accessible by the kernel. An
0066     example may be a hardware allocation that maps data directly into
0067     userspace but has no addressing limitations.
0068
0069     ``GFP_USER`` means that the allocated memory is not movable and it
0070     must be directly accessible by the kernel.
0071
0072 You may notice that quite a few allocations in the existing code
0073 specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to
0074 prevent recursion deadlocks caused by direct memory reclaim calling
0075 back into the FS or IO paths and blocking on already held
0076 resources. Since 4.12 the preferred way to address this issue is to
0077 use new scope APIs described in
0078 :ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
0079
0080 Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
0081 used to ensure that the allocated memory is accessible by hardware
0082 with limited addressing capabilities. So unless you are writing a
0083 driver for a device with such restrictions, avoid using these flags.
0084 And even with hardware with restrictions it is preferable to use
0085 `dma_alloc*` APIs.
0086
0087 GFP flags and reclaim behavior
0088 ------------------------------
0089 Memory allocations may trigger direct or background reclaim and it is
0090 useful to understand how hard the page allocator will try to satisfy that
0091 or another request.
0092
0093   * ``GFP_KERNEL & ~__GFP_RECLAIM`` - optimistic allocation without _any_
0094     attempt to free memory at all. The most light weight mode which even
0095     doesn't kick the background reclaim. Should be used carefully because it
0096     might deplete the memory and the next user might hit the more aggressive
0097     reclaim.
0098
0099   * ``GFP_KERNEL & ~__GFP_DIRECT_RECLAIM`` (or ``GFP_NOWAIT``)- optimistic
0100     allocation without any attempt to free memory from the current
0101     context but can wake kswapd to reclaim memory if the zone is below
0102     the low watermark. Can be used from either atomic contexts or when
0103     the request is a performance optimization and there is another
0104     fallback for a slow path.
0105
0106   * ``(GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM`` (aka ``GFP_ATOMIC``) -
0107     non sleeping allocation with an expensive fallback so it can access
0108     some portion of memory reserves. Usually used from interrupt/bottom-half
0109     context with an expensive slow path fallback.
0110
0111   * ``GFP_KERNEL`` - both background and direct reclaim are allowed and the
0112     **default** page allocator behavior is used. That means that not costly
0113     allocation requests are basically no-fail but there is no guarantee of
0114     that behavior so failures have to be checked properly by callers
0115     (e.g. OOM killer victim is allowed to fail currently).
0116
0117   * ``GFP_KERNEL | __GFP_NORETRY`` - overrides the default allocator behavior
0118     and all allocation requests fail early rather than cause disruptive
0119     reclaim (one round of reclaim in this implementation). The OOM killer
0120     is not invoked.
0121
0122   * ``GFP_KERNEL | __GFP_RETRY_MAYFAIL`` - overrides the default allocator
0123     behavior and all allocation requests try really hard. The request
0124     will fail if the reclaim cannot make any progress. The OOM killer
0125     won't be triggered.
0126
0127   * ``GFP_KERNEL | __GFP_NOFAIL`` - overrides the default allocator behavior
0128     and all allocation requests will loop endlessly until they succeed.
0129     This might be really dangerous especially for larger orders.
0130
0131 Selecting memory allocator
0132 ==========================
0133
0134 The most straightforward way to allocate memory is to use a function
0135 from the kmalloc() family. And, to be on the safe side it's best to use
0136 routines that set memory to zero, like kzalloc(). If you need to
0137 allocate memory for an array, there are kmalloc_array() and kcalloc()
0138 helpers. The helpers struct_size(), array_size() and array3_size() can
0139 be used to safely calculate object sizes without overflowing.
0140
0141 The maximal size of a chunk that can be allocated with `kmalloc` is
0142 limited. The actual limit depends on the hardware and the kernel
0143 configuration, but it is a good practice to use `kmalloc` for objects
0144 smaller than page size.
0145
0146 The address of a chunk allocated with `kmalloc` is aligned to at least
0147 ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a power of two, the
0148 alignment is also guaranteed to be at least the respective size.
0149
0150 Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
0151 to kmalloc_array(): a helper for resizing arrays is provided in the form of
0152 krealloc_array().
0153
0154 For large allocations you can use vmalloc() and vzalloc(), or directly
0155 request pages from the page allocator. The memory allocated by `vmalloc`
0156 and related functions is not physically contiguous.
0157
0158 If you are not sure whether the allocation size is too large for
0159 `kmalloc`, it is possible to use kvmalloc() and its derivatives. It will
0160 try to allocate memory with `kmalloc` and if the allocation fails it
0161 will be retried with `vmalloc`. There are restrictions on which GFP
0162 flags can be used with `kvmalloc`; please see kvmalloc_node() reference
0163 documentation. Note that `kvmalloc` may return memory that is not
0164 physically contiguous.
0165
0166 If you need to allocate many identical objects you can use the slab
0167 cache allocator. The cache should be set up with kmem_cache_create() or
0168 kmem_cache_create_usercopy() before it can be used. The second function
0169 should be used if a part of the cache might be copied to the userspace.
0170 After the cache is created kmem_cache_alloc() and its convenience
0171 wrappers can allocate memory from that cache.
0172
0173 When the allocated memory is no longer needed it must be freed. You can
0174 use kvfree() for the memory allocated with `kmalloc`, `vmalloc` and
0175 `kvmalloc`. The slab caches should be freed with kmem_cache_free(). And
0176 don't forget to destroy the cache with kmem_cache_destroy().