Back to home page

OSCL-LXR

 
 

    


0001 ===========
0002 NTB Drivers
0003 ===========
0004 
0005 NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
0006 the separate memory systems of two or more computers to the same PCI-Express
0007 fabric. Existing NTB hardware supports a common feature set: doorbell
0008 registers and memory translation windows, as well as non common features like
0009 scratchpad and message registers. Scratchpad registers are read-and-writable
0010 registers that are accessible from either side of the device, so that peers can
0011 exchange a small amount of information at a fixed address. Message registers can
0012 be utilized for the same purpose. Additionally they are provided with
0013 special status bits to make sure the information isn't rewritten by another
0014 peer. Doorbell registers provide a way for peers to send interrupt events.
0015 Memory windows allow translated read and write access to the peer memory.
0016 
0017 NTB Core Driver (ntb)
0018 =====================
0019 
0020 The NTB core driver defines an api wrapping the common feature set, and allows
0021 clients interested in NTB features to discover NTB the devices supported by
0022 hardware drivers.  The term "client" is used here to mean an upper layer
0023 component making use of the NTB api.  The term "driver," or "hardware driver,"
0024 is used here to mean a driver for a specific vendor and model of NTB hardware.
0025 
0026 NTB Client Drivers
0027 ==================
0028 
0029 NTB client drivers should register with the NTB core driver.  After
0030 registering, the client probe and remove functions will be called appropriately
0031 as ntb hardware, or hardware drivers, are inserted and removed.  The
0032 registration uses the Linux Device framework, so it should feel familiar to
0033 anyone who has written a pci driver.
0034 
0035 NTB Typical client driver implementation
0036 ----------------------------------------
0037 
0038 Primary purpose of NTB is to share some peace of memory between at least two
0039 systems. So the NTB device features like Scratchpad/Message registers are
0040 mainly used to perform the proper memory window initialization. Typically
0041 there are two types of memory window interfaces supported by the NTB API:
0042 inbound translation configured on the local ntb port and outbound translation
0043 configured by the peer, on the peer ntb port. The first type is
0044 depicted on the next figure::
0045 
0046  Inbound translation:
0047 
0048  Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
0049   ____________
0050  | dma-mapped |-ntb_mw_set_trans(addr)  |
0051  | memory     |        _v____________   |   ______________
0052  | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
0053  |------------|       |--------------|  |  |--------------|
0054 
0055 So typical scenario of the first type memory window initialization looks:
0056 1) allocate a memory region, 2) put translated address to NTB config,
0057 3) somehow notify a peer device of performed initialization, 4) peer device
0058 maps corresponding outbound memory window so to have access to the shared
0059 memory region.
0060 
0061 The second type of interface, that implies the shared windows being
0062 initialized by a peer device, is depicted on the figure::
0063 
0064  Outbound translation:
0065 
0066  Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
0067   ____________                      ______________
0068  | dma-mapped |                |   | MW base addr |<== memory-mapped IO
0069  | memory     |                |   |--------------|
0070  | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
0071  |------------|                |   |--------------|
0072 
0073 Typical scenario of the second type interface initialization would be:
0074 1) allocate a memory region, 2) somehow deliver a translated address to a peer
0075 device, 3) peer puts the translated address to NTB config, 4) peer device maps
0076 outbound memory window so to have access to the shared memory region.
0077 
0078 As one can see the described scenarios can be combined in one portable
0079 algorithm.
0080 
0081  Local device:
0082   1) Allocate memory for a shared window
0083   2) Initialize memory window by translated address of the allocated region
0084      (it may fail if local memory window initialization is unsupported)
0085   3) Send the translated address and memory window index to a peer device
0086 
0087  Peer device:
0088   1) Initialize memory window with retrieved address of the allocated
0089      by another device memory region (it may fail if peer memory window
0090      initialization is unsupported)
0091   2) Map outbound memory window
0092 
0093 In accordance with this scenario, the NTB Memory Window API can be used as
0094 follows:
0095 
0096  Local device:
0097   1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
0098      be allocated for memory windows between local device and peer device
0099      of port with specified index.
0100   2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
0101      shared memory region alignment and size. Then memory can be properly
0102      allocated.
0103   3) Allocate physically contiguous memory region in compliance with
0104      restrictions retrieved in 2).
0105   4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
0106      the memory window with specified index for the defined peer device
0107      (it may fail if local translated address setting is not supported)
0108   5) Send translated base address (usually together with memory window
0109      number) to the peer device using, for instance, scratchpad or message
0110      registers.
0111 
0112  Peer device:
0113   1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
0114      device (related to pidx) translated address for specified memory
0115      window. It may fail if retrieved address, for instance, exceeds
0116      maximum possible address or isn't properly aligned.
0117   2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
0118      window so to have an access to the shared memory.
0119 
0120 Also it is worth to note, that method ntb_mw_count(pidx) should return the
0121 same value as ntb_peer_mw_count() on the peer with port index - pidx.
0122 
0123 NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
0124 ------------------------------------------------------------------
0125 
0126 The primary client for NTB is the Transport client, used in tandem with NTB
0127 Netdev.  These drivers function together to create a logical link to the peer,
0128 across the ntb, to exchange packets of network data.  The Transport client
0129 establishes a logical link to the peer, and creates queue pairs to exchange
0130 messages and data.  The NTB Netdev then creates an ethernet device using a
0131 Transport queue pair.  Network data is copied between socket buffers and the
0132 Transport queue pair buffer.  The Transport client may be used for other things
0133 besides Netdev, however no other applications have yet been written.
0134 
0135 NTB Ping Pong Test Client (ntb\_pingpong)
0136 -----------------------------------------
0137 
0138 The Ping Pong test client serves as a demonstration to exercise the doorbell
0139 and scratchpad registers of NTB hardware, and as an example simple NTB client.
0140 Ping Pong enables the link when started, waits for the NTB link to come up, and
0141 then proceeds to read and write the doorbell scratchpad registers of the NTB.
0142 The peers interrupt each other using a bit mask of doorbell bits, which is
0143 shifted by one in each round, to test the behavior of multiple doorbell bits
0144 and interrupt vectors.  The Ping Pong driver also reads the first local
0145 scratchpad, and writes the value plus one to the first peer scratchpad, each
0146 round before writing the peer doorbell register.
0147 
0148 Module Parameters:
0149 
0150 * unsafe - Some hardware has known issues with scratchpad and doorbell
0151         registers.  By default, Ping Pong will not attempt to exercise such
0152         hardware.  You may override this behavior at your own risk by setting
0153         unsafe=1.
0154 * delay\_ms - Specify the delay between receiving a doorbell
0155         interrupt event and setting the peer doorbell register for the next
0156         round.
0157 * init\_db - Specify the doorbell bits to start new series of rounds.  A new
0158         series begins once all the doorbell bits have been shifted out of
0159         range.
0160 * dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
0161         then to observe debugging output on the console.
0162 
0163 NTB Tool Test Client (ntb\_tool)
0164 --------------------------------
0165 
0166 The Tool test client serves for debugging, primarily, ntb hardware and drivers.
0167 The Tool provides access through debugfs for reading, setting, and clearing the
0168 NTB doorbell, and reading and writing scratchpads.
0169 
0170 The Tool does not currently have any module parameters.
0171 
0172 Debugfs Files:
0173 
0174 * *debugfs*/ntb\_tool/*hw*/
0175         A directory in debugfs will be created for each
0176         NTB device probed by the tool.  This directory is shortened to *hw*
0177         below.
0178 * *hw*/db
0179         This file is used to read, set, and clear the local doorbell.  Not
0180         all operations may be supported by all hardware.  To read the doorbell,
0181         read the file.  To set the doorbell, write `s` followed by the bits to
0182         set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
0183         followed by the bits to clear.
0184 * *hw*/mask
0185         This file is used to read, set, and clear the local doorbell mask.
0186         See *db* for details.
0187 * *hw*/peer\_db
0188         This file is used to read, set, and clear the peer doorbell.
0189         See *db* for details.
0190 * *hw*/peer\_mask
0191         This file is used to read, set, and clear the peer doorbell
0192         mask.  See *db* for details.
0193 * *hw*/spad
0194         This file is used to read and write local scratchpads.  To read
0195         the values of all scratchpads, read the file.  To write values, write a
0196         series of pairs of scratchpad number and value
0197         (eg: `echo '4 0x123 7 0xabc' > spad`
0198         # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
0199 * *hw*/peer\_spad
0200         This file is used to read and write peer scratchpads.  See
0201         *spad* for details.
0202 
0203 NTB MSI Test Client (ntb\_msi\_test)
0204 ------------------------------------
0205 
0206 The MSI test client serves to test and debug the MSI library which
0207 allows for passing MSI interrupts across NTB memory windows. The
0208 test client is interacted with through the debugfs filesystem:
0209 
0210 * *debugfs*/ntb\_tool/*hw*/
0211         A directory in debugfs will be created for each
0212         NTB device probed by the tool.  This directory is shortened to *hw*
0213         below.
0214 * *hw*/port
0215         This file describes the local port number
0216 * *hw*/irq*_occurrences
0217         One occurrences file exists for each interrupt and, when read,
0218         returns the number of times the interrupt has been triggered.
0219 * *hw*/peer*/port
0220         This file describes the port number for each peer
0221 * *hw*/peer*/count
0222         This file describes the number of interrupts that can be
0223         triggered on each peer
0224 * *hw*/peer*/trigger
0225         Writing an interrupt number (any number less than the value
0226         specified in count) will trigger the interrupt on the
0227         specified peer. That peer's interrupt's occurrence file
0228         should be incremented.
0229 
0230 NTB Hardware Drivers
0231 ====================
0232 
0233 NTB hardware drivers should register devices with the NTB core driver.  After
0234 registering, clients probe and remove functions will be called.
0235 
0236 NTB Intel Hardware Driver (ntb\_hw\_intel)
0237 ------------------------------------------
0238 
0239 The Intel hardware driver supports NTB on Xeon and Atom CPUs.
0240 
0241 Module Parameters:
0242 
0243 * b2b\_mw\_idx
0244         If the peer ntb is to be accessed via a memory window, then use
0245         this memory window to access the peer ntb.  A value of zero or positive
0246         starts from the first mw idx, and a negative value starts from the last
0247         mw idx.  Both sides MUST set the same value here!  The default value is
0248         `-1`.
0249 * b2b\_mw\_share
0250         If the peer ntb is to be accessed via a memory window, and if
0251         the memory window is large enough, still allow the client to use the
0252         second half of the memory window for address translation to the peer.
0253 * xeon\_b2b\_usd\_bar2\_addr64
0254         If using B2B topology on Xeon hardware, use
0255         this 64 bit address on the bus between the NTB devices for the window
0256         at BAR2, on the upstream side of the link.
0257 * xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
0258 * xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
0259 * xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
0260 * xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
0261 * xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
0262 * xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
0263 * xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.