Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============================
0004 PCI Peer-to-Peer DMA Support
0005 ============================
0006 
0007 The PCI bus has pretty decent support for performing DMA transfers
0008 between two devices on the bus. This type of transaction is henceforth
0009 called Peer-to-Peer (or P2P). However, there are a number of issues that
0010 make P2P transactions tricky to do in a perfectly safe way.
0011 
0012 One of the biggest issues is that PCI doesn't require forwarding
0013 transactions between hierarchy domains, and in PCIe, each Root Port
0014 defines a separate hierarchy domain. To make things worse, there is no
0015 simple way to determine if a given Root Complex supports this or not.
0016 (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
0017 only supports doing P2P when the endpoints involved are all behind the
0018 same PCI bridge, as such devices are all in the same PCI hierarchy
0019 domain, and the spec guarantees that all transactions within the
0020 hierarchy will be routable, but it does not require routing
0021 between hierarchies.
0022 
0023 The second issue is that to make use of existing interfaces in Linux,
0024 memory that is used for P2P transactions needs to be backed by struct
0025 pages. However, PCI BARs are not typically cache coherent so there are
0026 a few corner case gotchas with these pages so developers need to
0027 be careful about what they do with them.
0028 
0029 
0030 Driver Writer's Guide
0031 =====================
0032 
0033 In a given P2P implementation there may be three or more different
0034 types of kernel drivers in play:
0035 
0036 * Provider - A driver which provides or publishes P2P resources like
0037   memory or doorbell registers to other drivers.
0038 * Client - A driver which makes use of a resource by setting up a
0039   DMA transaction to or from it.
0040 * Orchestrator - A driver which orchestrates the flow of data between
0041   clients and providers.
0042 
0043 In many cases there could be overlap between these three types (i.e.,
0044 it may be typical for a driver to be both a provider and a client).
0045 
0046 For example, in the NVMe Target Copy Offload implementation:
0047 
0048 * The NVMe PCI driver is both a client, provider and orchestrator
0049   in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
0050   resource (provider), it accepts P2P memory pages as buffers in requests
0051   to be used directly (client) and it can also make use of the CMB as
0052   submission queue entries (orchestrator).
0053 * The RDMA driver is a client in this arrangement so that an RNIC
0054   can DMA directly to the memory exposed by the NVMe device.
0055 * The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
0056   to the P2P memory (CMB) and then to the NVMe device (and vice versa).
0057 
0058 This is currently the only arrangement supported by the kernel but
0059 one could imagine slight tweaks to this that would allow for the same
0060 functionality. For example, if a specific RNIC added a BAR with some
0061 memory behind it, its driver could add support as a P2P provider and
0062 then the NVMe Target could use the RNIC's memory instead of the CMB
0063 in cases where the NVMe cards in use do not have CMB support.
0064 
0065 
0066 Provider Drivers
0067 ----------------
0068 
0069 A provider simply needs to register a BAR (or a portion of a BAR)
0070 as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
0071 This will register struct pages for all the specified memory.
0072 
0073 After that it may optionally publish all of its resources as
0074 P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
0075 any orchestrator drivers to find and use the memory. When marked in
0076 this way, the resource must be regular memory with no side effects.
0077 
0078 For the time being this is fairly rudimentary in that all resources
0079 are typically going to be P2P memory. Future work will likely expand
0080 this to include other types of resources like doorbells.
0081 
0082 
0083 Client Drivers
0084 --------------
0085 
0086 A client driver typically only has to conditionally change its DMA map
0087 routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
0088 of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
0089 way does not need to be unmapped.
0090 
0091 The client may also, optionally, make use of
0092 :c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
0093 functions and when to use the regular mapping functions. In some
0094 situations, it may be more appropriate to use a flag to indicate a
0095 given request is P2P memory and map appropriately. It is important to
0096 ensure that struct pages that back P2P memory stay out of code that
0097 does not have support for them as other code may treat the pages as
0098 regular memory which may not be appropriate.
0099 
0100 
0101 Orchestrator Drivers
0102 --------------------
0103 
0104 The first task an orchestrator driver must do is compile a list of
0105 all client devices that will be involved in a given transaction. For
0106 example, the NVMe Target driver creates a list including the namespace
0107 block device and the RNIC in use. If the orchestrator has access to
0108 a specific P2P provider to use it may check compatibility using
0109 :c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
0110 that's compatible with all clients using  :c:func:`pci_p2pmem_find()`.
0111 If more than one provider is supported, the one nearest to all the clients will
0112 be chosen first. If more than one provider is an equal distance away, the
0113 one returned will be chosen at random (it is not an arbitrary but
0114 truly random). This function returns the PCI device to use for the provider
0115 with a reference taken and therefore when it's no longer needed it should be
0116 returned with pci_dev_put().
0117 
0118 Once a provider is selected, the orchestrator can then use
0119 :c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
0120 allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
0121 and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
0122 allocating scatter-gather lists with P2P memory.
0123 
0124 Struct Page Caveats
0125 -------------------
0126 
0127 Driver writers should be very careful about not passing these special
0128 struct pages to code that isn't prepared for it. At this time, the kernel
0129 interfaces do not have any checks for ensuring this. This obviously
0130 precludes passing these pages to userspace.
0131 
0132 P2P memory is also technically IO memory but should never have any side
0133 effects behind it. Thus, the order of loads and stores should not be important
0134 and ioreadX(), iowriteX() and friends should not be necessary.
0135 
0136 
0137 P2P DMA Support Library
0138 =======================
0139 
0140 .. kernel-doc:: drivers/pci/p2pdma.c
0141    :export: