0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =============
0004 DCCP protocol
0005 =============
0006
0007
0008 .. Contents
0009 - Introduction
0010 - Missing features
0011 - Socket options
0012 - Sysctl variables
0013 - IOCTLs
0014 - Other tunables
0015 - Notes
0016
0017
0018 Introduction
0019 ============
0020 Datagram Congestion Control Protocol (DCCP) is an unreliable, connection
0021 oriented protocol designed to solve issues present in UDP and TCP, particularly
0022 for real-time and multimedia (streaming) traffic.
0023 It divides into a base protocol (RFC 4340) and pluggable congestion control
0024 modules called CCIDs. Like pluggable TCP congestion control, at least one CCID
0025 needs to be enabled in order for the protocol to function properly. In the Linux
0026 implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as
0027 the TCP-friendly CCID3 (RFC 4342), are optional.
0028 For a brief introduction to CCIDs and suggestions for choosing a CCID to match
0029 given applications, see section 10 of RFC 4340.
0030
0031 It has a base protocol and pluggable congestion control IDs (CCIDs).
0032
0033 DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol
0034 is at http://www.ietf.org/html.charters/dccp-charter.html
0035
0036
0037 Missing features
0038 ================
0039 The Linux DCCP implementation does not currently support all the features that are
0040 specified in RFCs 4340...42.
0041
0042 The known bugs are at:
0043
0044 http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP
0045
0046 For more up-to-date versions of the DCCP implementation, please consider using
0047 the experimental DCCP test tree; instructions for checking this out are on:
0048 http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree
0049
0050
0051 Socket options
0052 ==============
0053 DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes
0054 a policy ID as argument and can only be set before the connection (i.e. changes
0055 during an established connection are not supported). Currently, two policies are
0056 defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special,
0057 and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an
0058 u32 priority value as ancillary data to sendmsg(), where higher numbers indicate
0059 a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to
0060 be formatted using a cmsg(3) message header filled in as follows::
0061
0062 cmsg->cmsg_level = SOL_DCCP;
0063 cmsg->cmsg_type = DCCP_SCM_PRIORITY;
0064 cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */
0065
0066 DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero
0067 value is always interpreted as unbounded queue length. If different from zero,
0068 the interpretation of this parameter depends on the current dequeuing policy
0069 (see above): the "simple" policy will enforce a fixed queue size by returning
0070 EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the
0071 lowest-priority packet first. The default value for this parameter is
0072 initialised from /proc/sys/net/dccp/default/tx_qlen.
0073
0074 DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
0075 service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
0076 the socket will fall back to 0 (which means that no meaningful service code
0077 is present). On active sockets this is set before connect(); specifying more
0078 than one code has no effect (all subsequent service codes are ignored). The
0079 case is different for passive sockets, where multiple service codes (up to 32)
0080 can be set before calling bind().
0081
0082 DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
0083 size (application payload size) in bytes, see RFC 4340, section 14.
0084
0085 DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs
0086 supported by the endpoint. The option value is an array of type uint8_t whose
0087 size is passed as option length. The minimum array size is 4 elements, the
0088 value returned in the optlen argument always reflects the true number of
0089 built-in CCIDs.
0090
0091 DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same
0092 time, combining the operation of the next two socket options. This option is
0093 preferable over the latter two, since often applications will use the same
0094 type of CCID for both directions; and mixed use of CCIDs is not currently well
0095 understood. This socket option takes as argument at least one uint8_t value, or
0096 an array of uint8_t values, which must match available CCIDS (see above). CCIDs
0097 must be registered on the socket before calling connect() or listen().
0098
0099 DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
0100 the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
0101 Please note that the getsockopt argument type here is ``int``, not uint8_t.
0102
0103 DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
0104
0105 DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
0106 timewait state when closing the connection (RFC 4340, 8.3). The usual case is
0107 that the closing server sends a CloseReq, whereupon the client holds timewait
0108 state. When this boolean socket option is on, the server sends a Close instead
0109 and will enter TIMEWAIT. This option must be set after accept() returns.
0110
0111 DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
0112 partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
0113 always cover the entire packet and that only fully covered application data is
0114 accepted by the receiver. Hence, when using this feature on the sender, it must
0115 be enabled at the receiver, too with suitable choice of CsCov.
0116
0117 DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
0118 range 0..15 are acceptable. The default setting is 0 (full coverage),
0119 values between 1..15 indicate partial coverage.
0120
0121 DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
0122 sets a threshold, where again values 0..15 are acceptable. The default
0123 of 0 means that all packets with a partial coverage will be discarded.
0124 Values in the range 1..15 indicate that packets with minimally such a
0125 coverage value are also acceptable. The higher the number, the more
0126 restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage
0127 settings are inherited to the child socket after accept().
0128
0129 The following two options apply to CCID 3 exclusively and are getsockopt()-only.
0130 In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
0131
0132 DCCP_SOCKOPT_CCID_RX_INFO
0133 Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and
0134 optlen must be set to at least sizeof(struct tfrc_rx_info).
0135
0136 DCCP_SOCKOPT_CCID_TX_INFO
0137 Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and
0138 optlen must be set to at least sizeof(struct tfrc_tx_info).
0139
0140 On unidirectional connections it is useful to close the unused half-connection
0141 via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs.
0142
0143
0144 Sysctl variables
0145 ================
0146 Several DCCP default parameters can be managed by the following sysctls
0147 (sysctl net.dccp.default or /proc/sys/net/dccp/default):
0148
0149 request_retries
0150 The number of active connection initiation retries (the number of
0151 Requests minus one) before timing out. In addition, it also governs
0152 the behaviour of the other, passive side: this variable also sets
0153 the number of times DCCP repeats sending a Response when the initial
0154 handshake does not progress from RESPOND to OPEN (i.e. when no Ack
0155 is received after the initial Request). This value should be greater
0156 than 0, suggested is less than 10. Analogue of tcp_syn_retries.
0157
0158 retries1
0159 How often a DCCP Response is retransmitted until the listening DCCP
0160 side considers its connecting peer dead. Analogue of tcp_retries1.
0161
0162 retries2
0163 The number of times a general DCCP packet is retransmitted. This has
0164 importance for retransmitted acknowledgments and feature negotiation,
0165 data packets are never retransmitted. Analogue of tcp_retries2.
0166
0167 tx_ccid = 2
0168 Default CCID for the sender-receiver half-connection. Depending on the
0169 choice of CCID, the Send Ack Vector feature is enabled automatically.
0170
0171 rx_ccid = 2
0172 Default CCID for the receiver-sender half-connection; see tx_ccid.
0173
0174 seq_window = 100
0175 The initial sequence window (sec. 7.5.2) of the sender. This influences
0176 the local ackno validity and the remote seqno validity windows (7.5.1).
0177 Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set.
0178
0179 tx_qlen = 5
0180 The size of the transmit buffer in packets. A value of 0 corresponds
0181 to an unbounded transmit buffer.
0182
0183 sync_ratelimit = 125 ms
0184 The timeout between subsequent DCCP-Sync packets sent in response to
0185 sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit
0186 of this parameter is milliseconds; a value of 0 disables rate-limiting.
0187
0188
0189 IOCTLS
0190 ======
0191 FIONREAD
0192 Works as in udp(7): returns in the ``int`` argument pointer the size of
0193 the next pending datagram in bytes, or 0 when no datagram is pending.
0194
0195 SIOCOUTQ
0196 Returns the number of unsent data bytes in the socket send queue as ``int``
0197 into the buffer specified by the argument pointer.
0198
0199 Other tunables
0200 ==============
0201 Per-route rto_min support
0202 CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
0203 of the RTO timer. This setting can be modified via the 'rto_min' option
0204 of iproute2; for example::
0205
0206 > ip route change 10.0.0.0/24 rto_min 250j dev wlan0
0207 > ip route add 10.0.0.254/32 rto_min 800j dev wlan0
0208 > ip route show dev wlan0
0209
0210 CCID-3 also supports the rto_min setting: it is used to define the lower
0211 bound for the expiry of the nofeedback timer. This can be useful on LANs
0212 with very low RTTs (e.g., loopback, Gbit ethernet).
0213
0214
0215 Notes
0216 =====
0217 DCCP does not travel through NAT successfully at present on many boxes. This is
0218 because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT
0219 support for DCCP has been added.