0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =============
0004 Page Pool API
0005 =============
0006
0007 The page_pool allocator is optimized for the XDP mode that uses one frame
0008 per-page, but it can fallback on the regular page allocator APIs.
0009
0010 Basic use involves replacing alloc_pages() calls with the
0011 page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
0012 replacing dev_alloc_pages().
0013
0014 API keeps track of inflight pages, in order to let API user know
0015 when it is safe to free a page_pool object. Thus, API users
0016 must run page_pool_release_page() when a page is leaving the page_pool or
0017 call page_pool_put_page() where appropriate in order to maintain correct
0018 accounting.
0019
0020 API user must call page_pool_put_page() once on a page, as it
0021 will either recycle the page, or in case of refcnt > 1, it will
0022 release the DMA mapping and inflight state accounting.
0023
0024 Architecture overview
0025 =====================
0026
0027 .. code-block:: none
0028
0029 +------------------+
0030 | Driver |
0031 +------------------+
0032 ^
0033 |
0034 |
0035 |
0036 v
0037 +--------------------------------------------+
0038 | request memory |
0039 +--------------------------------------------+
0040 ^ ^
0041 | |
0042 | Pool empty | Pool has entries
0043 | |
0044 v v
0045 +-----------------------+ +------------------------+
0046 | alloc (and map) pages | | get page from cache |
0047 +-----------------------+ +------------------------+
0048 ^ ^
0049 | |
0050 | cache available | No entries, refill
0051 | | from ptr-ring
0052 | |
0053 v v
0054 +-----------------+ +------------------+
0055 | Fast cache | | ptr-ring cache |
0056 +-----------------+ +------------------+
0057
0058 API interface
0059 =============
0060 The number of pools created **must** match the number of hardware queues
0061 unless hardware restrictions make that impossible. This would otherwise beat the
0062 purpose of page pool, which is allocate pages fast from cache without locking.
0063 This lockless guarantee naturally comes from running under a NAPI softirq.
0064 The protection doesn't strictly have to be NAPI, any guarantee that allocating
0065 a page will cause no race conditions is enough.
0066
0067 * page_pool_create(): Create a pool.
0068 * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
0069 * order: 2^order pages on allocation
0070 * pool_size: size of the ptr_ring
0071 * nid: preferred NUMA node for allocation
0072 * dev: struct device. Used on DMA operations
0073 * dma_dir: DMA direction
0074 * max_len: max DMA sync memory size
0075 * offset: DMA address offset
0076
0077 * page_pool_put_page(): The outcome of this depends on the page refcnt. If the
0078 driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
0079 the allocator owns the page and will try to recycle it in one of the pool
0080 caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
0081 using dma_sync_single_range_for_device().
0082
0083 * page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
0084 for the entire memory area configured in area pool->max_len.
0085
0086 * page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
0087 must guarantee safe context (e.g NAPI), since it will recycle the page
0088 directly into the pool fast cache.
0089
0090 * page_pool_release_page(): Unmap the page (if mapped) and account for it on
0091 inflight counters.
0092
0093 * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
0094 caches.
0095
0096 * page_pool_get_dma_addr(): Retrieve the stored DMA address.
0097
0098 * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
0099
0100 * page_pool_put_page_bulk(): Tries to refill a number of pages into the
0101 ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
0102 page_pool_put_page_bulk() will release leftover pages to the page allocator.
0103 page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
0104 completion loop for the XDP_REDIRECT use case.
0105 Please note the caller must not use data area after running
0106 page_pool_put_page_bulk(), as this function overwrites it.
0107
0108 * page_pool_get_stats(): Retrieve statistics about the page_pool. This API
0109 is only available if the kernel has been configured with
0110 ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct
0111 page_pool_stats`` structure is passed to this API which is filled in. The
0112 caller can then report those stats to the user (perhaps via ethtool,
0113 debugfs, etc.). See below for an example usage of this API.
0114
0115 Stats API and structures
0116 ------------------------
0117 If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
0118 ``page_pool_get_stats()`` and structures described below are available. It
0119 takes a pointer to a ``struct page_pool`` and a pointer to a ``struct
0120 page_pool_stats`` allocated by the caller.
0121
0122 The API will fill in the provided ``struct page_pool_stats`` with
0123 statistics about the page_pool.
0124
0125 The stats structure has the following fields::
0126
0127 struct page_pool_stats {
0128 struct page_pool_alloc_stats alloc_stats;
0129 struct page_pool_recycle_stats recycle_stats;
0130 };
0131
0132
0133 The ``struct page_pool_alloc_stats`` has the following fields:
0134 * ``fast``: successful fast path allocations
0135 * ``slow``: slow path order-0 allocations
0136 * ``slow_high_order``: slow path high order allocations
0137 * ``empty``: ptr ring is empty, so a slow path allocation was forced.
0138 * ``refill``: an allocation which triggered a refill of the cache
0139 * ``waive``: pages obtained from the ptr ring that cannot be added to
0140 the cache due to a NUMA mismatch.
0141
0142 The ``struct page_pool_recycle_stats`` has the following fields:
0143 * ``cached``: recycling placed page in the page pool cache
0144 * ``cache_full``: page pool cache was full
0145 * ``ring``: page placed into the ptr ring
0146 * ``ring_full``: page released from page pool because the ptr ring was full
0147 * ``released_refcnt``: page released (and not recycled) because refcnt > 1
0148
0149 Coding examples
0150 ===============
0151
0152 Registration
0153 ------------
0154
0155 .. code-block:: c
0156
0157 /* Page pool registration */
0158 struct page_pool_params pp_params = { 0 };
0159 struct xdp_rxq_info xdp_rxq;
0160 int err;
0161
0162 pp_params.order = 0;
0163 /* internal DMA mapping in page_pool */
0164 pp_params.flags = PP_FLAG_DMA_MAP;
0165 pp_params.pool_size = DESC_NUM;
0166 pp_params.nid = NUMA_NO_NODE;
0167 pp_params.dev = priv->dev;
0168 pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
0169 page_pool = page_pool_create(&pp_params);
0170
0171 err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
0172 if (err)
0173 goto err_out;
0174
0175 err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
0176 if (err)
0177 goto err_out;
0178
0179 NAPI poller
0180 -----------
0181
0182
0183 .. code-block:: c
0184
0185 /* NAPI Rx poller */
0186 enum dma_data_direction dma_dir;
0187
0188 dma_dir = page_pool_get_dma_dir(dring->page_pool);
0189 while (done < budget) {
0190 if (some error)
0191 page_pool_recycle_direct(page_pool, page);
0192 if (packet_is_xdp) {
0193 if XDP_DROP:
0194 page_pool_recycle_direct(page_pool, page);
0195 } else (packet_is_skb) {
0196 page_pool_release_page(page_pool, page);
0197 new_page = page_pool_dev_alloc_pages(page_pool);
0198 }
0199 }
0200
0201 Stats
0202 -----
0203
0204 .. code-block:: c
0205
0206 #ifdef CONFIG_PAGE_POOL_STATS
0207 /* retrieve stats */
0208 struct page_pool_stats stats = { 0 };
0209 if (page_pool_get_stats(page_pool, &stats)) {
0210 /* perhaps the driver reports statistics with ethool */
0211 ethtool_print_allocation_stats(&stats.alloc_stats);
0212 ethtool_print_recycle_stats(&stats.recycle_stats);
0213 }
0214 #endif
0215
0216 Driver unload
0217 -------------
0218
0219 .. code-block:: c
0220
0221 /* Driver unload */
0222 page_pool_put_full_page(page_pool, page, false);
0223 xdp_rxq_info_unreg(&xdp_rxq);