Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =====================================================
0004 Netdev features mess and how to get out from it alive
0005 =====================================================
0006 
0007 Author:
0008         Michał Mirosław <mirq-linux@rere.qmqm.pl>
0009 
0010 
0011 
0012 Part I: Feature sets
0013 ====================
0014 
0015 Long gone are the days when a network card would just take and give packets
0016 verbatim.  Today's devices add multiple features and bugs (read: offloads)
0017 that relieve an OS of various tasks like generating and checking checksums,
0018 splitting packets, classifying them.  Those capabilities and their state
0019 are commonly referred to as netdev features in Linux kernel world.
0020 
0021 There are currently three sets of features relevant to the driver, and
0022 one used internally by network core:
0023 
0024  1. netdev->hw_features set contains features whose state may possibly
0025     be changed (enabled or disabled) for a particular device by user's
0026     request.  This set should be initialized in ndo_init callback and not
0027     changed later.
0028 
0029  2. netdev->features set contains features which are currently enabled
0030     for a device.  This should be changed only by network core or in
0031     error paths of ndo_set_features callback.
0032 
0033  3. netdev->vlan_features set contains features whose state is inherited
0034     by child VLAN devices (limits netdev->features set).  This is currently
0035     used for all VLAN devices whether tags are stripped or inserted in
0036     hardware or software.
0037 
0038  4. netdev->wanted_features set contains feature set requested by user.
0039     This set is filtered by ndo_fix_features callback whenever it or
0040     some device-specific conditions change. This set is internal to
0041     networking core and should not be referenced in drivers.
0042 
0043 
0044 
0045 Part II: Controlling enabled features
0046 =====================================
0047 
0048 When current feature set (netdev->features) is to be changed, new set
0049 is calculated and filtered by calling ndo_fix_features callback
0050 and netdev_fix_features(). If the resulting set differs from current
0051 set, it is passed to ndo_set_features callback and (if the callback
0052 returns success) replaces value stored in netdev->features.
0053 NETDEV_FEAT_CHANGE notification is issued after that whenever current
0054 set might have changed.
0055 
0056 The following events trigger recalculation:
0057  1. device's registration, after ndo_init returned success
0058  2. user requested changes in features state
0059  3. netdev_update_features() is called
0060 
0061 ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
0062 are treated as always returning success.
0063 
0064 A driver that wants to trigger recalculation must do so by calling
0065 netdev_update_features() while holding rtnl_lock. This should not be done
0066 from ndo_*_features callbacks. netdev->features should not be modified by
0067 driver except by means of ndo_fix_features callback.
0068 
0069 
0070 
0071 Part III: Implementation hints
0072 ==============================
0073 
0074  * ndo_fix_features:
0075 
0076 All dependencies between features should be resolved here. The resulting
0077 set can be reduced further by networking core imposed limitations (as coded
0078 in netdev_fix_features()). For this reason it is safer to disable a feature
0079 when its dependencies are not met instead of forcing the dependency on.
0080 
0081 This callback should not modify hardware nor driver state (should be
0082 stateless).  It can be called multiple times between successive
0083 ndo_set_features calls.
0084 
0085 Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
0086 NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
0087 care must be taken as the change won't affect already configured VLANs.
0088 
0089  * ndo_set_features:
0090 
0091 Hardware should be reconfigured to match passed feature set. The set
0092 should not be altered unless some error condition happens that can't
0093 be reliably detected in ndo_fix_features. In this case, the callback
0094 should update netdev->features to match resulting hardware state.
0095 Errors returned are not (and cannot be) propagated anywhere except dmesg.
0096 (Note: successful return is zero, >0 means silent error.)
0097 
0098 
0099 
0100 Part IV: Features
0101 =================
0102 
0103 For current list of features, see include/linux/netdev_features.h.
0104 This section describes semantics of some of them.
0105 
0106  * Transmit checksumming
0107 
0108 For complete description, see comments near the top of include/linux/skbuff.h.
0109 
0110 Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
0111 It means that device can fill TCP/UDP-like checksum anywhere in the packets
0112 whatever headers there might be.
0113 
0114  * Transmit TCP segmentation offload
0115 
0116 NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
0117 set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
0118 
0119  * Transmit UDP segmentation offload
0120 
0121 NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds
0122 gso_size. On segmentation, it segments the payload on gso_size boundaries and
0123 replicates the network and UDP headers (fixing up the last one if less than
0124 gso_size).
0125 
0126  * Transmit DMA from high memory
0127 
0128 On platforms where this is relevant, NETIF_F_HIGHDMA signals that
0129 ndo_start_xmit can handle skbs with frags in high memory.
0130 
0131  * Transmit scatter-gather
0132 
0133 Those features say that ndo_start_xmit can handle fragmented skbs:
0134 NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
0135 chained skbs (skb->next/prev list).
0136 
0137  * Software features
0138 
0139 Features contained in NETIF_F_SOFT_FEATURES are features of networking
0140 stack. Driver should not change behaviour based on them.
0141 
0142  * LLTX driver (deprecated for hardware drivers)
0143 
0144 NETIF_F_LLTX is meant to be used by drivers that don't need locking at all,
0145 e.g. software tunnels.
0146 
0147 This is also used in a few legacy drivers that implement their
0148 own locking, don't use it for new (hardware) drivers.
0149 
0150  * netns-local device
0151 
0152 NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
0153 network namespaces (e.g. loopback).
0154 
0155 Don't use it in drivers.
0156 
0157  * VLAN challenged
0158 
0159 NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
0160 headers. Some drivers set this because the cards can't handle the bigger MTU.
0161 [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
0162 VLANs. This may be not useful, though.]
0163 
0164 *  rx-fcs
0165 
0166 This requests that the NIC append the Ethernet Frame Checksum (FCS)
0167 to the end of the skb data.  This allows sniffers and other tools to
0168 read the CRC recorded by the NIC on receipt of the packet.
0169 
0170 *  rx-all
0171 
0172 This requests that the NIC receive all possible frames, including errored
0173 frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
0174 bad packets on it.  Some NICs may receive more packets if also put into normal
0175 PROMISC mode.
0176 
0177 *  rx-gro-hw
0178 
0179 This requests that the NIC enables Hardware GRO (generic receive offload).
0180 Hardware GRO is basically the exact reverse of TSO, and is generally
0181 stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
0182 be re-segmentable by GSO or TSO back to the exact original packet stream.
0183 Hardware GRO is dependent on RXCSUM since every packet successfully merged
0184 by hardware must also have the checksum verified by hardware.
0185 
0186 * hsr-tag-ins-offload
0187 
0188 This should be set for devices which insert an HSR (High-availability Seamless
0189 Redundancy) or PRP (Parallel Redundancy Protocol) tag automatically.
0190 
0191 * hsr-tag-rm-offload
0192 
0193 This should be set for devices which remove HSR (High-availability Seamless
0194 Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically.
0195 
0196 * hsr-fwd-offload
0197 
0198 This should be set for devices which forward HSR (High-availability Seamless
0199 Redundancy) frames from one port to another in hardware.
0200 
0201 * hsr-dup-offload
0202 
0203 This should be set for devices which duplicate outgoing HSR (High-availability
0204 Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically
0205 frames in hardware.