Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============================================================
0004 Linux kernel driver for Elastic Network Adapter (ENA) family
0005 ============================================================
0006 
0007 Overview
0008 ========
0009 
0010 ENA is a networking interface designed to make good use of modern CPU
0011 features and system architectures.
0012 
0013 The ENA device exposes a lightweight management interface with a
0014 minimal set of memory mapped registers and extendible command set
0015 through an Admin Queue.
0016 
0017 The driver supports a range of ENA devices, is link-speed independent
0018 (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc), and has
0019 a negotiated and extendible feature set.
0020 
0021 Some ENA devices support SR-IOV. This driver is used for both the
0022 SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
0023 
0024 ENA devices enable high speed and low overhead network traffic
0025 processing by providing multiple Tx/Rx queue pairs (the maximum number
0026 is advertised by the device via the Admin Queue), a dedicated MSI-X
0027 interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
0028 and CPU cacheline optimized data placement.
0029 
0030 The ENA driver supports industry standard TCP/IP offload features such as
0031 checksum offload. Receive-side scaling (RSS) is supported for multi-core
0032 scaling.
0033 
0034 The ENA driver and its corresponding devices implement health
0035 monitoring mechanisms such as watchdog, enabling the device and driver
0036 to recover in a manner transparent to the application, as well as
0037 debug logs.
0038 
0039 Some of the ENA devices support a working mode called Low-latency
0040 Queue (LLQ), which saves several more microseconds.
0041 ENA Source Code Directory Structure
0042 ===================================
0043 
0044 =================   ======================================================
0045 ena_com.[ch]        Management communication layer. This layer is
0046                     responsible for the handling all the management
0047                     (admin) communication between the device and the
0048                     driver.
0049 ena_eth_com.[ch]    Tx/Rx data path.
0050 ena_admin_defs.h    Definition of ENA management interface.
0051 ena_eth_io_defs.h   Definition of ENA data path interface.
0052 ena_common_defs.h   Common definitions for ena_com layer.
0053 ena_regs_defs.h     Definition of ENA PCI memory-mapped (MMIO) registers.
0054 ena_netdev.[ch]     Main Linux kernel driver.
0055 ena_ethtool.c       ethtool callbacks.
0056 ena_pci_id_tbl.h    Supported device IDs.
0057 =================   ======================================================
0058 
0059 Management Interface:
0060 =====================
0061 
0062 ENA management interface is exposed by means of:
0063 
0064 - PCIe Configuration Space
0065 - Device Registers
0066 - Admin Queue (AQ) and Admin Completion Queue (ACQ)
0067 - Asynchronous Event Notification Queue (AENQ)
0068 
0069 ENA device MMIO Registers are accessed only during driver
0070 initialization and are not used during further normal device
0071 operation.
0072 
0073 AQ is used for submitting management commands, and the
0074 results/responses are reported asynchronously through ACQ.
0075 
0076 ENA introduces a small set of management commands with room for
0077 vendor-specific extensions. Most of the management operations are
0078 framed in a generic Get/Set feature command.
0079 
0080 The following admin queue commands are supported:
0081 
0082 - Create I/O submission queue
0083 - Create I/O completion queue
0084 - Destroy I/O submission queue
0085 - Destroy I/O completion queue
0086 - Get feature
0087 - Set feature
0088 - Configure AENQ
0089 - Get statistics
0090 
0091 Refer to ena_admin_defs.h for the list of supported Get/Set Feature
0092 properties.
0093 
0094 The Asynchronous Event Notification Queue (AENQ) is a uni-directional
0095 queue used by the ENA device to send to the driver events that cannot
0096 be reported using ACQ. AENQ events are subdivided into groups. Each
0097 group may have multiple syndromes, as shown below
0098 
0099 The events are:
0100 
0101 ====================    ===============
0102 Group                   Syndrome
0103 ====================    ===============
0104 Link state change       **X**
0105 Fatal error             **X**
0106 Notification            Suspend traffic
0107 Notification            Resume traffic
0108 Keep-Alive              **X**
0109 ====================    ===============
0110 
0111 ACQ and AENQ share the same MSI-X vector.
0112 
0113 Keep-Alive is a special mechanism that allows monitoring the device's health.
0114 A Keep-Alive event is delivered by the device every second.
0115 The driver maintains a watchdog (WD) handler which logs the current state and
0116 statistics. If the keep-alive events aren't delivered as expected the WD resets
0117 the device and the driver.
0118 
0119 Data Path Interface
0120 ===================
0121 
0122 I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
0123 SQ correspondingly). Each SQ has a completion queue (CQ) associated
0124 with it.
0125 
0126 The SQs and CQs are implemented as descriptor rings in contiguous
0127 physical memory.
0128 
0129 The ENA driver supports two Queue Operation modes for Tx SQs:
0130 
0131 - **Regular mode:**
0132   In this mode the Tx SQs reside in the host's memory. The ENA
0133   device fetches the ENA Tx descriptors and packet data from host
0134   memory.
0135 
0136 - **Low Latency Queue (LLQ) mode or "push-mode":**
0137   In this mode the driver pushes the transmit descriptors and the
0138   first 96 bytes of the packet directly to the ENA device memory
0139   space. The rest of the packet payload is fetched by the
0140   device. For this operation mode, the driver uses a dedicated PCI
0141   device memory BAR, which is mapped with write-combine capability.
0142 
0143   **Note that** not all ENA devices support LLQ, and this feature is negotiated
0144   with the device upon initialization. If the ENA device does not
0145   support LLQ mode, the driver falls back to the regular mode.
0146 
0147 The Rx SQs support only the regular mode.
0148 
0149 The driver supports multi-queue for both Tx and Rx. This has various
0150 benefits:
0151 
0152 - Reduced CPU/thread/process contention on a given Ethernet interface.
0153 - Cache miss rate on completion is reduced, particularly for data
0154   cache lines that hold the sk_buff structures.
0155 - Increased process-level parallelism when handling received packets.
0156 - Increased data cache hit rate, by steering kernel processing of
0157   packets to the CPU, where the application thread consuming the
0158   packet is running.
0159 - In hardware interrupt re-direction.
0160 
0161 Interrupt Modes
0162 ===============
0163 
0164 The driver assigns a single MSI-X vector per queue pair (for both Tx
0165 and Rx directions). The driver assigns an additional dedicated MSI-X vector
0166 for management (for ACQ and AENQ).
0167 
0168 Management interrupt registration is performed when the Linux kernel
0169 probes the adapter, and it is de-registered when the adapter is
0170 removed. I/O queue interrupt registration is performed when the Linux
0171 interface of the adapter is opened, and it is de-registered when the
0172 interface is closed.
0173 
0174 The management interrupt is named::
0175 
0176    ena-mgmnt@pci:<PCI domain:bus:slot.function>
0177 
0178 and for each queue pair, an interrupt is named::
0179 
0180    <interface name>-Tx-Rx-<queue index>
0181 
0182 The ENA device operates in auto-mask and auto-clear interrupt
0183 modes. That is, once MSI-X is delivered to the host, its Cause bit is
0184 automatically cleared and the interrupt is masked. The interrupt is
0185 unmasked by the driver after NAPI processing is complete.
0186 
0187 Interrupt Moderation
0188 ====================
0189 
0190 ENA driver and device can operate in conventional or adaptive interrupt
0191 moderation mode.
0192 
0193 **In conventional mode** the driver instructs device to postpone interrupt
0194 posting according to static interrupt delay value. The interrupt delay
0195 value can be configured through `ethtool(8)`. The following `ethtool`
0196 parameters are supported by the driver: ``tx-usecs``, ``rx-usecs``
0197 
0198 **In adaptive interrupt** moderation mode the interrupt delay value is
0199 updated by the driver dynamically and adjusted every NAPI cycle
0200 according to the traffic nature.
0201 
0202 Adaptive coalescing can be switched on/off through `ethtool(8)`'s
0203 :code:`adaptive_rx on|off` parameter.
0204 
0205 More information about Adaptive Interrupt Moderation (DIM) can be found in
0206 Documentation/networking/net_dim.rst
0207 
0208 RX copybreak
0209 ============
0210 The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
0211 and can be configured by the ETHTOOL_STUNABLE command of the
0212 SIOCETHTOOL ioctl.
0213 
0214 Statistics
0215 ==========
0216 
0217 The user can obtain ENA device and driver statistics using `ethtool`.
0218 The driver can collect regular or extended statistics (including
0219 per-queue stats) from the device.
0220 
0221 In addition the driver logs the stats to syslog upon device reset.
0222 
0223 MTU
0224 ===
0225 
0226 The driver supports an arbitrarily large MTU with a maximum that is
0227 negotiated with the device. The driver configures MTU using the
0228 SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
0229 via `ip(8)` and similar legacy tools.
0230 
0231 Stateless Offloads
0232 ==================
0233 
0234 The ENA driver supports:
0235 
0236 - IPv4 header checksum offload
0237 - TCP/UDP over IPv4/IPv6 checksum offloads
0238 
0239 RSS
0240 ===
0241 
0242 - The ENA device supports RSS that allows flexible Rx traffic
0243   steering.
0244 - Toeplitz and CRC32 hash functions are supported.
0245 - Different combinations of L2/L3/L4 fields can be configured as
0246   inputs for hash functions.
0247 - The driver configures RSS settings using the AQ SetFeature command
0248   (ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
0249   ENA_ADMIN_RSS_INDIRECTION_TABLE_CONFIG properties).
0250 - If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
0251   function delivered in the Rx CQ descriptor is set in the received
0252   SKB.
0253 - The user can provide a hash key, hash function, and configure the
0254   indirection table through `ethtool(8)`.
0255 
0256 DATA PATH
0257 =========
0258 
0259 Tx
0260 --
0261 
0262 :code:`ena_start_xmit()` is called by the stack. This function does the following:
0263 
0264 - Maps data buffers (``skb->data`` and frags).
0265 - Populates ``ena_buf`` for the push buffer (if the driver and device are
0266   in push mode).
0267 - Prepares ENA bufs for the remaining frags.
0268 - Allocates a new request ID from the empty ``req_id`` ring. The request
0269   ID is the index of the packet in the Tx info. This is used for
0270   out-of-order Tx completions.
0271 - Adds the packet to the proper place in the Tx ring.
0272 - Calls :code:`ena_com_prepare_tx()`, an ENA communication layer that converts
0273   the ``ena_bufs`` to ENA descriptors (and adds meta ENA descriptors as
0274   needed).
0275 
0276   * This function also copies the ENA descriptors and the push buffer
0277     to the Device memory space (if in push mode).
0278 
0279 - Writes a doorbell to the ENA device.
0280 - When the ENA device finishes sending the packet, a completion
0281   interrupt is raised.
0282 - The interrupt handler schedules NAPI.
0283 - The :code:`ena_clean_tx_irq()` function is called. This function handles the
0284   completion descriptors generated by the ENA, with a single
0285   completion descriptor per completed packet.
0286 
0287   * ``req_id`` is retrieved from the completion descriptor. The ``tx_info`` of
0288     the packet is retrieved via the ``req_id``. The data buffers are
0289     unmapped and ``req_id`` is returned to the empty ``req_id`` ring.
0290   * The function stops when the completion descriptors are completed or
0291     the budget is reached.
0292 
0293 Rx
0294 --
0295 
0296 - When a packet is received from the ENA device.
0297 - The interrupt handler schedules NAPI.
0298 - The :code:`ena_clean_rx_irq()` function is called. This function calls
0299   :code:`ena_com_rx_pkt()`, an ENA communication layer function, which returns the
0300   number of descriptors used for a new packet, and zero if
0301   no new packet is found.
0302 - :code:`ena_rx_skb()` checks packet length:
0303 
0304   * If the packet is small (len < rx_copybreak), the driver allocates
0305     a SKB for the new packet, and copies the packet payload into the
0306     SKB data buffer.
0307 
0308     - In this way the original data buffer is not passed to the stack
0309       and is reused for future Rx packets.
0310 
0311   * Otherwise the function unmaps the Rx buffer, sets the first
0312     descriptor as `skb`'s linear part and the other descriptors as the
0313     `skb`'s frags.
0314 
0315 - The new SKB is updated with the necessary information (protocol,
0316   checksum hw verify result, etc), and then passed to the network
0317   stack, using the NAPI interface function :code:`napi_gro_receive()`.