Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =============
0004 Page Pool API
0005 =============
0006 
0007 The page_pool allocator is optimized for the XDP mode that uses one frame
0008 per-page, but it can fallback on the regular page allocator APIs.
0009 
0010 Basic use involves replacing alloc_pages() calls with the
0011 page_pool_alloc_pages() call.  Drivers should use page_pool_dev_alloc_pages()
0012 replacing dev_alloc_pages().
0013 
0014 API keeps track of inflight pages, in order to let API user know
0015 when it is safe to free a page_pool object.  Thus, API users
0016 must run page_pool_release_page() when a page is leaving the page_pool or
0017 call page_pool_put_page() where appropriate in order to maintain correct
0018 accounting.
0019 
0020 API user must call page_pool_put_page() once on a page, as it
0021 will either recycle the page, or in case of refcnt > 1, it will
0022 release the DMA mapping and inflight state accounting.
0023 
0024 Architecture overview
0025 =====================
0026 
0027 .. code-block:: none
0028 
0029     +------------------+
0030     |       Driver     |
0031     +------------------+
0032             ^
0033             |
0034             |
0035             |
0036             v
0037     +--------------------------------------------+
0038     |                request memory              |
0039     +--------------------------------------------+
0040         ^                                  ^
0041         |                                  |
0042         | Pool empty                       | Pool has entries
0043         |                                  |
0044         v                                  v
0045     +-----------------------+     +------------------------+
0046     | alloc (and map) pages |     |  get page from cache   |
0047     +-----------------------+     +------------------------+
0048                                     ^                    ^
0049                                     |                    |
0050                                     | cache available    | No entries, refill
0051                                     |                    | from ptr-ring
0052                                     |                    |
0053                                     v                    v
0054                           +-----------------+     +------------------+
0055                           |   Fast cache    |     |  ptr-ring cache  |
0056                           +-----------------+     +------------------+
0057 
0058 API interface
0059 =============
0060 The number of pools created **must** match the number of hardware queues
0061 unless hardware restrictions make that impossible. This would otherwise beat the
0062 purpose of page pool, which is allocate pages fast from cache without locking.
0063 This lockless guarantee naturally comes from running under a NAPI softirq.
0064 The protection doesn't strictly have to be NAPI, any guarantee that allocating
0065 a page will cause no race conditions is enough.
0066 
0067 * page_pool_create(): Create a pool.
0068     * flags:      PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
0069     * order:      2^order pages on allocation
0070     * pool_size:  size of the ptr_ring
0071     * nid:        preferred NUMA node for allocation
0072     * dev:        struct device. Used on DMA operations
0073     * dma_dir:    DMA direction
0074     * max_len:    max DMA sync memory size
0075     * offset:     DMA address offset
0076 
0077 * page_pool_put_page(): The outcome of this depends on the page refcnt. If the
0078   driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
0079   the allocator owns the page and will try to recycle it in one of the pool
0080   caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
0081   using dma_sync_single_range_for_device().
0082 
0083 * page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
0084   for the entire memory area configured in area pool->max_len.
0085 
0086 * page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
0087   must guarantee safe context (e.g NAPI), since it will recycle the page
0088   directly into the pool fast cache.
0089 
0090 * page_pool_release_page(): Unmap the page (if mapped) and account for it on
0091   inflight counters.
0092 
0093 * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
0094   caches.
0095 
0096 * page_pool_get_dma_addr(): Retrieve the stored DMA address.
0097 
0098 * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
0099 
0100 * page_pool_put_page_bulk(): Tries to refill a number of pages into the
0101   ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
0102   page_pool_put_page_bulk() will release leftover pages to the page allocator.
0103   page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
0104   completion loop for the XDP_REDIRECT use case.
0105   Please note the caller must not use data area after running
0106   page_pool_put_page_bulk(), as this function overwrites it.
0107 
0108 * page_pool_get_stats(): Retrieve statistics about the page_pool. This API
0109   is only available if the kernel has been configured with
0110   ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct
0111   page_pool_stats`` structure is passed to this API which is filled in. The
0112   caller can then report those stats to the user (perhaps via ethtool,
0113   debugfs, etc.). See below for an example usage of this API.
0114 
0115 Stats API and structures
0116 ------------------------
0117 If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
0118 ``page_pool_get_stats()`` and structures described below are available. It
0119 takes a  pointer to a ``struct page_pool`` and a pointer to a ``struct
0120 page_pool_stats`` allocated by the caller.
0121 
0122 The API will fill in the provided ``struct page_pool_stats`` with
0123 statistics about the page_pool.
0124 
0125 The stats structure has the following fields::
0126 
0127     struct page_pool_stats {
0128         struct page_pool_alloc_stats alloc_stats;
0129         struct page_pool_recycle_stats recycle_stats;
0130     };
0131 
0132 
0133 The ``struct page_pool_alloc_stats`` has the following fields:
0134   * ``fast``: successful fast path allocations
0135   * ``slow``: slow path order-0 allocations
0136   * ``slow_high_order``: slow path high order allocations
0137   * ``empty``: ptr ring is empty, so a slow path allocation was forced.
0138   * ``refill``: an allocation which triggered a refill of the cache
0139   * ``waive``: pages obtained from the ptr ring that cannot be added to
0140     the cache due to a NUMA mismatch.
0141 
0142 The ``struct page_pool_recycle_stats`` has the following fields:
0143   * ``cached``: recycling placed page in the page pool cache
0144   * ``cache_full``: page pool cache was full
0145   * ``ring``: page placed into the ptr ring
0146   * ``ring_full``: page released from page pool because the ptr ring was full
0147   * ``released_refcnt``: page released (and not recycled) because refcnt > 1
0148 
0149 Coding examples
0150 ===============
0151 
0152 Registration
0153 ------------
0154 
0155 .. code-block:: c
0156 
0157     /* Page pool registration */
0158     struct page_pool_params pp_params = { 0 };
0159     struct xdp_rxq_info xdp_rxq;
0160     int err;
0161 
0162     pp_params.order = 0;
0163     /* internal DMA mapping in page_pool */
0164     pp_params.flags = PP_FLAG_DMA_MAP;
0165     pp_params.pool_size = DESC_NUM;
0166     pp_params.nid = NUMA_NO_NODE;
0167     pp_params.dev = priv->dev;
0168     pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
0169     page_pool = page_pool_create(&pp_params);
0170 
0171     err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
0172     if (err)
0173         goto err_out;
0174 
0175     err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
0176     if (err)
0177         goto err_out;
0178 
0179 NAPI poller
0180 -----------
0181 
0182 
0183 .. code-block:: c
0184 
0185     /* NAPI Rx poller */
0186     enum dma_data_direction dma_dir;
0187 
0188     dma_dir = page_pool_get_dma_dir(dring->page_pool);
0189     while (done < budget) {
0190         if (some error)
0191             page_pool_recycle_direct(page_pool, page);
0192         if (packet_is_xdp) {
0193             if XDP_DROP:
0194                 page_pool_recycle_direct(page_pool, page);
0195         } else (packet_is_skb) {
0196             page_pool_release_page(page_pool, page);
0197             new_page = page_pool_dev_alloc_pages(page_pool);
0198         }
0199     }
0200 
0201 Stats
0202 -----
0203 
0204 .. code-block:: c
0205 
0206         #ifdef CONFIG_PAGE_POOL_STATS
0207         /* retrieve stats */
0208         struct page_pool_stats stats = { 0 };
0209         if (page_pool_get_stats(page_pool, &stats)) {
0210                 /* perhaps the driver reports statistics with ethool */
0211                 ethtool_print_allocation_stats(&stats.alloc_stats);
0212                 ethtool_print_recycle_stats(&stats.recycle_stats);
0213         }
0214         #endif
0215 
0216 Driver unload
0217 -------------
0218 
0219 .. code-block:: c
0220 
0221     /* Driver unload */
0222     page_pool_put_full_page(page_pool, page, false);
0223     xdp_rxq_info_unreg(&xdp_rxq);