0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ============================================================
0004 Linux kernel driver for Elastic Network Adapter (ENA) family
0005 ============================================================
0006
0007 Overview
0008 ========
0009
0010 ENA is a networking interface designed to make good use of modern CPU
0011 features and system architectures.
0012
0013 The ENA device exposes a lightweight management interface with a
0014 minimal set of memory mapped registers and extendible command set
0015 through an Admin Queue.
0016
0017 The driver supports a range of ENA devices, is link-speed independent
0018 (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc), and has
0019 a negotiated and extendible feature set.
0020
0021 Some ENA devices support SR-IOV. This driver is used for both the
0022 SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
0023
0024 ENA devices enable high speed and low overhead network traffic
0025 processing by providing multiple Tx/Rx queue pairs (the maximum number
0026 is advertised by the device via the Admin Queue), a dedicated MSI-X
0027 interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation,
0028 and CPU cacheline optimized data placement.
0029
0030 The ENA driver supports industry standard TCP/IP offload features such as
0031 checksum offload. Receive-side scaling (RSS) is supported for multi-core
0032 scaling.
0033
0034 The ENA driver and its corresponding devices implement health
0035 monitoring mechanisms such as watchdog, enabling the device and driver
0036 to recover in a manner transparent to the application, as well as
0037 debug logs.
0038
0039 Some of the ENA devices support a working mode called Low-latency
0040 Queue (LLQ), which saves several more microseconds.
0041 ENA Source Code Directory Structure
0042 ===================================
0043
0044 ================= ======================================================
0045 ena_com.[ch] Management communication layer. This layer is
0046 responsible for the handling all the management
0047 (admin) communication between the device and the
0048 driver.
0049 ena_eth_com.[ch] Tx/Rx data path.
0050 ena_admin_defs.h Definition of ENA management interface.
0051 ena_eth_io_defs.h Definition of ENA data path interface.
0052 ena_common_defs.h Common definitions for ena_com layer.
0053 ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers.
0054 ena_netdev.[ch] Main Linux kernel driver.
0055 ena_ethtool.c ethtool callbacks.
0056 ena_pci_id_tbl.h Supported device IDs.
0057 ================= ======================================================
0058
0059 Management Interface:
0060 =====================
0061
0062 ENA management interface is exposed by means of:
0063
0064 - PCIe Configuration Space
0065 - Device Registers
0066 - Admin Queue (AQ) and Admin Completion Queue (ACQ)
0067 - Asynchronous Event Notification Queue (AENQ)
0068
0069 ENA device MMIO Registers are accessed only during driver
0070 initialization and are not used during further normal device
0071 operation.
0072
0073 AQ is used for submitting management commands, and the
0074 results/responses are reported asynchronously through ACQ.
0075
0076 ENA introduces a small set of management commands with room for
0077 vendor-specific extensions. Most of the management operations are
0078 framed in a generic Get/Set feature command.
0079
0080 The following admin queue commands are supported:
0081
0082 - Create I/O submission queue
0083 - Create I/O completion queue
0084 - Destroy I/O submission queue
0085 - Destroy I/O completion queue
0086 - Get feature
0087 - Set feature
0088 - Configure AENQ
0089 - Get statistics
0090
0091 Refer to ena_admin_defs.h for the list of supported Get/Set Feature
0092 properties.
0093
0094 The Asynchronous Event Notification Queue (AENQ) is a uni-directional
0095 queue used by the ENA device to send to the driver events that cannot
0096 be reported using ACQ. AENQ events are subdivided into groups. Each
0097 group may have multiple syndromes, as shown below
0098
0099 The events are:
0100
0101 ==================== ===============
0102 Group Syndrome
0103 ==================== ===============
0104 Link state change **X**
0105 Fatal error **X**
0106 Notification Suspend traffic
0107 Notification Resume traffic
0108 Keep-Alive **X**
0109 ==================== ===============
0110
0111 ACQ and AENQ share the same MSI-X vector.
0112
0113 Keep-Alive is a special mechanism that allows monitoring the device's health.
0114 A Keep-Alive event is delivered by the device every second.
0115 The driver maintains a watchdog (WD) handler which logs the current state and
0116 statistics. If the keep-alive events aren't delivered as expected the WD resets
0117 the device and the driver.
0118
0119 Data Path Interface
0120 ===================
0121
0122 I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
0123 SQ correspondingly). Each SQ has a completion queue (CQ) associated
0124 with it.
0125
0126 The SQs and CQs are implemented as descriptor rings in contiguous
0127 physical memory.
0128
0129 The ENA driver supports two Queue Operation modes for Tx SQs:
0130
0131 - **Regular mode:**
0132 In this mode the Tx SQs reside in the host's memory. The ENA
0133 device fetches the ENA Tx descriptors and packet data from host
0134 memory.
0135
0136 - **Low Latency Queue (LLQ) mode or "push-mode":**
0137 In this mode the driver pushes the transmit descriptors and the
0138 first 96 bytes of the packet directly to the ENA device memory
0139 space. The rest of the packet payload is fetched by the
0140 device. For this operation mode, the driver uses a dedicated PCI
0141 device memory BAR, which is mapped with write-combine capability.
0142
0143 **Note that** not all ENA devices support LLQ, and this feature is negotiated
0144 with the device upon initialization. If the ENA device does not
0145 support LLQ mode, the driver falls back to the regular mode.
0146
0147 The Rx SQs support only the regular mode.
0148
0149 The driver supports multi-queue for both Tx and Rx. This has various
0150 benefits:
0151
0152 - Reduced CPU/thread/process contention on a given Ethernet interface.
0153 - Cache miss rate on completion is reduced, particularly for data
0154 cache lines that hold the sk_buff structures.
0155 - Increased process-level parallelism when handling received packets.
0156 - Increased data cache hit rate, by steering kernel processing of
0157 packets to the CPU, where the application thread consuming the
0158 packet is running.
0159 - In hardware interrupt re-direction.
0160
0161 Interrupt Modes
0162 ===============
0163
0164 The driver assigns a single MSI-X vector per queue pair (for both Tx
0165 and Rx directions). The driver assigns an additional dedicated MSI-X vector
0166 for management (for ACQ and AENQ).
0167
0168 Management interrupt registration is performed when the Linux kernel
0169 probes the adapter, and it is de-registered when the adapter is
0170 removed. I/O queue interrupt registration is performed when the Linux
0171 interface of the adapter is opened, and it is de-registered when the
0172 interface is closed.
0173
0174 The management interrupt is named::
0175
0176 ena-mgmnt@pci:<PCI domain:bus:slot.function>
0177
0178 and for each queue pair, an interrupt is named::
0179
0180 <interface name>-Tx-Rx-<queue index>
0181
0182 The ENA device operates in auto-mask and auto-clear interrupt
0183 modes. That is, once MSI-X is delivered to the host, its Cause bit is
0184 automatically cleared and the interrupt is masked. The interrupt is
0185 unmasked by the driver after NAPI processing is complete.
0186
0187 Interrupt Moderation
0188 ====================
0189
0190 ENA driver and device can operate in conventional or adaptive interrupt
0191 moderation mode.
0192
0193 **In conventional mode** the driver instructs device to postpone interrupt
0194 posting according to static interrupt delay value. The interrupt delay
0195 value can be configured through `ethtool(8)`. The following `ethtool`
0196 parameters are supported by the driver: ``tx-usecs``, ``rx-usecs``
0197
0198 **In adaptive interrupt** moderation mode the interrupt delay value is
0199 updated by the driver dynamically and adjusted every NAPI cycle
0200 according to the traffic nature.
0201
0202 Adaptive coalescing can be switched on/off through `ethtool(8)`'s
0203 :code:`adaptive_rx on|off` parameter.
0204
0205 More information about Adaptive Interrupt Moderation (DIM) can be found in
0206 Documentation/networking/net_dim.rst
0207
0208 RX copybreak
0209 ============
0210 The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
0211 and can be configured by the ETHTOOL_STUNABLE command of the
0212 SIOCETHTOOL ioctl.
0213
0214 Statistics
0215 ==========
0216
0217 The user can obtain ENA device and driver statistics using `ethtool`.
0218 The driver can collect regular or extended statistics (including
0219 per-queue stats) from the device.
0220
0221 In addition the driver logs the stats to syslog upon device reset.
0222
0223 MTU
0224 ===
0225
0226 The driver supports an arbitrarily large MTU with a maximum that is
0227 negotiated with the device. The driver configures MTU using the
0228 SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
0229 via `ip(8)` and similar legacy tools.
0230
0231 Stateless Offloads
0232 ==================
0233
0234 The ENA driver supports:
0235
0236 - IPv4 header checksum offload
0237 - TCP/UDP over IPv4/IPv6 checksum offloads
0238
0239 RSS
0240 ===
0241
0242 - The ENA device supports RSS that allows flexible Rx traffic
0243 steering.
0244 - Toeplitz and CRC32 hash functions are supported.
0245 - Different combinations of L2/L3/L4 fields can be configured as
0246 inputs for hash functions.
0247 - The driver configures RSS settings using the AQ SetFeature command
0248 (ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
0249 ENA_ADMIN_RSS_INDIRECTION_TABLE_CONFIG properties).
0250 - If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash
0251 function delivered in the Rx CQ descriptor is set in the received
0252 SKB.
0253 - The user can provide a hash key, hash function, and configure the
0254 indirection table through `ethtool(8)`.
0255
0256 DATA PATH
0257 =========
0258
0259 Tx
0260 --
0261
0262 :code:`ena_start_xmit()` is called by the stack. This function does the following:
0263
0264 - Maps data buffers (``skb->data`` and frags).
0265 - Populates ``ena_buf`` for the push buffer (if the driver and device are
0266 in push mode).
0267 - Prepares ENA bufs for the remaining frags.
0268 - Allocates a new request ID from the empty ``req_id`` ring. The request
0269 ID is the index of the packet in the Tx info. This is used for
0270 out-of-order Tx completions.
0271 - Adds the packet to the proper place in the Tx ring.
0272 - Calls :code:`ena_com_prepare_tx()`, an ENA communication layer that converts
0273 the ``ena_bufs`` to ENA descriptors (and adds meta ENA descriptors as
0274 needed).
0275
0276 * This function also copies the ENA descriptors and the push buffer
0277 to the Device memory space (if in push mode).
0278
0279 - Writes a doorbell to the ENA device.
0280 - When the ENA device finishes sending the packet, a completion
0281 interrupt is raised.
0282 - The interrupt handler schedules NAPI.
0283 - The :code:`ena_clean_tx_irq()` function is called. This function handles the
0284 completion descriptors generated by the ENA, with a single
0285 completion descriptor per completed packet.
0286
0287 * ``req_id`` is retrieved from the completion descriptor. The ``tx_info`` of
0288 the packet is retrieved via the ``req_id``. The data buffers are
0289 unmapped and ``req_id`` is returned to the empty ``req_id`` ring.
0290 * The function stops when the completion descriptors are completed or
0291 the budget is reached.
0292
0293 Rx
0294 --
0295
0296 - When a packet is received from the ENA device.
0297 - The interrupt handler schedules NAPI.
0298 - The :code:`ena_clean_rx_irq()` function is called. This function calls
0299 :code:`ena_com_rx_pkt()`, an ENA communication layer function, which returns the
0300 number of descriptors used for a new packet, and zero if
0301 no new packet is found.
0302 - :code:`ena_rx_skb()` checks packet length:
0303
0304 * If the packet is small (len < rx_copybreak), the driver allocates
0305 a SKB for the new packet, and copies the packet payload into the
0306 SKB data buffer.
0307
0308 - In this way the original data buffer is not passed to the stack
0309 and is reused for future Rx packets.
0310
0311 * Otherwise the function unmaps the Rx buffer, sets the first
0312 descriptor as `skb`'s linear part and the other descriptors as the
0313 `skb`'s frags.
0314
0315 - The new SKB is updated with the necessary information (protocol,
0316 checksum hw verify result, etc), and then passed to the network
0317 stack, using the NAPI interface function :code:`napi_gro_receive()`.