Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =================
0004 Checksum Offloads
0005 =================
0006 
0007 
0008 Introduction
0009 ============
0010 
0011 This document describes a set of techniques in the Linux networking stack to
0012 take advantage of checksum offload capabilities of various NICs.
0013 
0014 The following technologies are described:
0015 
0016 * TX Checksum Offload
0017 * LCO: Local Checksum Offload
0018 * RCO: Remote Checksum Offload
0019 
0020 Things that should be documented here but aren't yet:
0021 
0022 * RX Checksum Offload
0023 * CHECKSUM_UNNECESSARY conversion
0024 
0025 
0026 TX Checksum Offload
0027 ===================
0028 
0029 The interface for offloading a transmit checksum to a device is explained in
0030 detail in comments near the top of include/linux/skbuff.h.
0031 
0032 In brief, it allows to request the device fill in a single ones-complement
0033 checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
0034 The device should compute the 16-bit ones-complement checksum (i.e. the
0035 'IP-style' checksum) from csum_start to the end of the packet, and fill in the
0036 result at (csum_start + csum_offset).
0037 
0038 Because csum_offset cannot be negative, this ensures that the previous value of
0039 the checksum field is included in the checksum computation, thus it can be used
0040 to supply any needed corrections to the checksum (such as the sum of the
0041 pseudo-header for UDP or TCP).
0042 
0043 This interface only allows a single checksum to be offloaded.  Where
0044 encapsulation is used, the packet may have multiple checksum fields in
0045 different header layers, and the rest will have to be handled by another
0046 mechanism such as LCO or RCO.
0047 
0048 CRC32c can also be offloaded using this interface, by means of filling
0049 skb->csum_start and skb->csum_offset as described above, and setting
0050 skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
0051 
0052 No offloading of the IP header checksum is performed; it is always done in
0053 software.  This is OK because when we build the IP header, we obviously have it
0054 in cache, so summing it isn't expensive.  It's also rather short.
0055 
0056 The requirements for GSO are more complicated, because when segmenting an
0057 encapsulated packet both the inner and outer checksums may need to be edited or
0058 recomputed for each resulting segment.  See the skbuff.h comment (section 'E')
0059 for more details.
0060 
0061 A driver declares its offload capabilities in netdev->hw_features; see
0062 Documentation/networking/netdev-features.rst for more.  Note that a device
0063 which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
0064 csum_offset given in the SKB; if it tries to deduce these itself in hardware
0065 (as some NICs do) the driver should check that the values in the SKB match
0066 those which the hardware will deduce, and if not, fall back to checksumming in
0067 software instead (with skb_csum_hwoffload_help() or one of the
0068 skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
0069 include/linux/skbuff.h).
0070 
0071 The stack should, for the most part, assume that checksum offload is supported
0072 by the underlying device.  The only place that should check is
0073 validate_xmit_skb(), and the functions it calls directly or indirectly.  That
0074 function compares the offload features requested by the SKB (which may include
0075 other offloads besides TX Checksum Offload) and, if they are not supported or
0076 enabled on the device (determined by netdev->features), performs the
0077 corresponding offload in software.  In the case of TX Checksum Offload, that
0078 means calling skb_csum_hwoffload_help(skb, features).
0079 
0080 
0081 LCO: Local Checksum Offload
0082 ===========================
0083 
0084 LCO is a technique for efficiently computing the outer checksum of an
0085 encapsulated datagram when the inner checksum is due to be offloaded.
0086 
0087 The ones-complement sum of a correctly checksummed TCP or UDP packet is equal
0088 to the complement of the sum of the pseudo header, because everything else gets
0089 'cancelled out' by the checksum field.  This is because the sum was
0090 complemented before being written to the checksum field.
0091 
0092 More generally, this holds in any case where the 'IP-style' ones complement
0093 checksum is used, and thus any checksum that TX Checksum Offload supports.
0094 
0095 That is, if we have set up TX Checksum Offload with a start/offset pair, we
0096 know that after the device has filled in that checksum, the ones complement sum
0097 from csum_start to the end of the packet will be equal to the complement of
0098 whatever value we put in the checksum field beforehand.  This allows us to
0099 compute the outer checksum without looking at the payload: we simply stop
0100 summing when we get to csum_start, then add the complement of the 16-bit word
0101 at (csum_start + csum_offset).
0102 
0103 Then, when the true inner checksum is filled in (either by hardware or by
0104 skb_checksum_help()), the outer checksum will become correct by virtue of the
0105 arithmetic.
0106 
0107 LCO is performed by the stack when constructing an outer UDP header for an
0108 encapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
0109 IPv6 equivalents, in udp6_set_csum().
0110 
0111 It is also performed when constructing an IPv4 GRE header, in
0112 net/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
0113 constructing an IPv6 GRE header; the GRE checksum is computed over the whole
0114 packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
0115 LCO here as IPv6 GRE still uses an IP-style checksum.
0116 
0117 All of the LCO implementations use a helper function lco_csum(), in
0118 include/linux/skbuff.h.
0119 
0120 LCO can safely be used for nested encapsulations; in this case, the outer
0121 encapsulation layer will sum over both its own header and the 'middle' header.
0122 This does mean that the 'middle' header will get summed multiple times, but
0123 there doesn't seem to be a way to avoid that without incurring bigger costs
0124 (e.g. in SKB bloat).
0125 
0126 
0127 RCO: Remote Checksum Offload
0128 ============================
0129 
0130 RCO is a technique for eliding the inner checksum of an encapsulated datagram,
0131 allowing the outer checksum to be offloaded.  It does, however, involve a
0132 change to the encapsulation protocols, which the receiver must also support.
0133 For this reason, it is disabled by default.
0134 
0135 RCO is detailed in the following Internet-Drafts:
0136 
0137 * https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
0138 * https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
0139 
0140 In Linux, RCO is implemented individually in each encapsulation protocol, and
0141 most tunnel types have flags controlling its use.  For instance, VXLAN has the
0142 flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
0143 used when transmitting to a given remote destination.