Back to home page

OSCL-LXR

 
 

    


0001 .. _kernel_tls:
0002 
0003 ==========
0004 Kernel TLS
0005 ==========
0006 
0007 Overview
0008 ========
0009 
0010 Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
0011 TCP. TLS provides end-to-end data integrity and confidentiality.
0012 
0013 User interface
0014 ==============
0015 
0016 Creating a TLS connection
0017 -------------------------
0018 
0019 First create a new TCP socket and set the TLS ULP.
0020 
0021 .. code-block:: c
0022 
0023   sock = socket(AF_INET, SOCK_STREAM, 0);
0024   setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
0025 
0026 Setting the TLS ULP allows us to set/get TLS socket options. Currently
0027 only the symmetric encryption is handled in the kernel.  After the TLS
0028 handshake is complete, we have all the parameters required to move the
0029 data-path to the kernel. There is a separate socket option for moving
0030 the transmit and the receive into the kernel.
0031 
0032 .. code-block:: c
0033 
0034   /* From linux/tls.h */
0035   struct tls_crypto_info {
0036           unsigned short version;
0037           unsigned short cipher_type;
0038   };
0039 
0040   struct tls12_crypto_info_aes_gcm_128 {
0041           struct tls_crypto_info info;
0042           unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
0043           unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
0044           unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
0045           unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
0046   };
0047 
0048 
0049   struct tls12_crypto_info_aes_gcm_128 crypto_info;
0050 
0051   crypto_info.info.version = TLS_1_2_VERSION;
0052   crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
0053   memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
0054   memcpy(crypto_info.rec_seq, seq_number_write,
0055                                         TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
0056   memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
0057   memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
0058 
0059   setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
0060 
0061 Transmit and receive are set separately, but the setup is the same, using either
0062 TLS_TX or TLS_RX.
0063 
0064 Sending TLS application data
0065 ----------------------------
0066 
0067 After setting the TLS_TX socket option all application data sent over this
0068 socket is encrypted using TLS and the parameters provided in the socket option.
0069 For example, we can send an encrypted hello world record as follows:
0070 
0071 .. code-block:: c
0072 
0073   const char *msg = "hello world\n";
0074   send(sock, msg, strlen(msg));
0075 
0076 send() data is directly encrypted from the userspace buffer provided
0077 to the encrypted kernel send buffer if possible.
0078 
0079 The sendfile system call will send the file's data over TLS records of maximum
0080 length (2^14).
0081 
0082 .. code-block:: c
0083 
0084   file = open(filename, O_RDONLY);
0085   fstat(file, &stat);
0086   sendfile(sock, file, &offset, stat.st_size);
0087 
0088 TLS records are created and sent after each send() call, unless
0089 MSG_MORE is passed.  MSG_MORE will delay creation of a record until
0090 MSG_MORE is not passed, or the maximum record size is reached.
0091 
0092 The kernel will need to allocate a buffer for the encrypted data.
0093 This buffer is allocated at the time send() is called, such that
0094 either the entire send() call will return -ENOMEM (or block waiting
0095 for memory), or the encryption will always succeed.  If send() returns
0096 -ENOMEM and some data was left on the socket buffer from a previous
0097 call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
0098 
0099 Receiving TLS application data
0100 ------------------------------
0101 
0102 After setting the TLS_RX socket option, all recv family socket calls
0103 are decrypted using TLS parameters provided.  A full TLS record must
0104 be received before decryption can happen.
0105 
0106 .. code-block:: c
0107 
0108   char buffer[16384];
0109   recv(sock, buffer, 16384);
0110 
0111 Received data is decrypted directly in to the user buffer if it is
0112 large enough, and no additional allocations occur.  If the userspace
0113 buffer is too small, data is decrypted in the kernel and copied to
0114 userspace.
0115 
0116 ``EINVAL`` is returned if the TLS version in the received message does not
0117 match the version passed in setsockopt.
0118 
0119 ``EMSGSIZE`` is returned if the received message is too big.
0120 
0121 ``EBADMSG`` is returned if decryption failed for any other reason.
0122 
0123 Send TLS control messages
0124 -------------------------
0125 
0126 Other than application data, TLS has control messages such as alert
0127 messages (record type 21) and handshake messages (record type 22), etc.
0128 These messages can be sent over the socket by providing the TLS record type
0129 via a CMSG. For example the following function sends @data of @length bytes
0130 using a record of type @record_type.
0131 
0132 .. code-block:: c
0133 
0134   /* send TLS control message using record_type */
0135   static int klts_send_ctrl_message(int sock, unsigned char record_type,
0136                                     void *data, size_t length)
0137   {
0138         struct msghdr msg = {0};
0139         int cmsg_len = sizeof(record_type);
0140         struct cmsghdr *cmsg;
0141         char buf[CMSG_SPACE(cmsg_len)];
0142         struct iovec msg_iov;   /* Vector of data to send/receive into.  */
0143 
0144         msg.msg_control = buf;
0145         msg.msg_controllen = sizeof(buf);
0146         cmsg = CMSG_FIRSTHDR(&msg);
0147         cmsg->cmsg_level = SOL_TLS;
0148         cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
0149         cmsg->cmsg_len = CMSG_LEN(cmsg_len);
0150         *CMSG_DATA(cmsg) = record_type;
0151         msg.msg_controllen = cmsg->cmsg_len;
0152 
0153         msg_iov.iov_base = data;
0154         msg_iov.iov_len = length;
0155         msg.msg_iov = &msg_iov;
0156         msg.msg_iovlen = 1;
0157 
0158         return sendmsg(sock, &msg, 0);
0159   }
0160 
0161 Control message data should be provided unencrypted, and will be
0162 encrypted by the kernel.
0163 
0164 Receiving TLS control messages
0165 ------------------------------
0166 
0167 TLS control messages are passed in the userspace buffer, with message
0168 type passed via cmsg.  If no cmsg buffer is provided, an error is
0169 returned if a control message is received.  Data messages may be
0170 received without a cmsg buffer set.
0171 
0172 .. code-block:: c
0173 
0174   char buffer[16384];
0175   char cmsg[CMSG_SPACE(sizeof(unsigned char))];
0176   struct msghdr msg = {0};
0177   msg.msg_control = cmsg;
0178   msg.msg_controllen = sizeof(cmsg);
0179 
0180   struct iovec msg_iov;
0181   msg_iov.iov_base = buffer;
0182   msg_iov.iov_len = 16384;
0183 
0184   msg.msg_iov = &msg_iov;
0185   msg.msg_iovlen = 1;
0186 
0187   int ret = recvmsg(sock, &msg, 0 /* flags */);
0188 
0189   struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
0190   if (cmsg->cmsg_level == SOL_TLS &&
0191       cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
0192       int record_type = *((unsigned char *)CMSG_DATA(cmsg));
0193       // Do something with record_type, and control message data in
0194       // buffer.
0195       //
0196       // Note that record_type may be == to application data (23).
0197   } else {
0198       // Buffer contains application data.
0199   }
0200 
0201 recv will never return data from mixed types of TLS records.
0202 
0203 Integrating in to userspace TLS library
0204 ---------------------------------------
0205 
0206 At a high level, the kernel TLS ULP is a replacement for the record
0207 layer of a userspace TLS library.
0208 
0209 A patchset to OpenSSL to use ktls as the record layer is
0210 `here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
0211 
0212 `An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
0213 of calling send directly after a handshake using gnutls.
0214 Since it doesn't implement a full record layer, control
0215 messages are not supported.
0216 
0217 Optional optimizations
0218 ----------------------
0219 
0220 There are certain condition-specific optimizations the TLS ULP can make,
0221 if requested. Those optimizations are either not universally beneficial
0222 or may impact correctness, hence they require an opt-in.
0223 All options are set per-socket using setsockopt(), and their
0224 state can be checked using getsockopt() and via socket diag (``ss``).
0225 
0226 TLS_TX_ZEROCOPY_RO
0227 ~~~~~~~~~~~~~~~~~~
0228 
0229 For device offload only. Allow sendfile() data to be transmitted directly
0230 to the NIC without making an in-kernel copy. This allows true zero-copy
0231 behavior when device offload is enabled.
0232 
0233 The application must make sure that the data is not modified between being
0234 submitted and transmission completing. In other words this is mostly
0235 applicable if the data sent on a socket via sendfile() is read-only.
0236 
0237 Modifying the data may result in different versions of the data being used
0238 for the original TCP transmission and TCP retransmissions. To the receiver
0239 this will look like TLS records had been tampered with and will result
0240 in record authentication failures.
0241 
0242 TLS_RX_EXPECT_NO_PAD
0243 ~~~~~~~~~~~~~~~~~~~~
0244 
0245 TLS 1.3 only. Expect the sender to not pad records. This allows the data
0246 to be decrypted directly into user space buffers with TLS 1.3.
0247 
0248 This optimization is safe to enable only if the remote end is trusted,
0249 otherwise it is an attack vector to doubling the TLS processing cost.
0250 
0251 If the record decrypted turns out to had been padded or is not a data
0252 record it will be decrypted again into a kernel buffer without zero copy.
0253 Such events are counted in the ``TlsDecryptRetry`` statistic.
0254 
0255 Statistics
0256 ==========
0257 
0258 TLS implementation exposes the following per-namespace statistics
0259 (``/proc/net/tls_stat``):
0260 
0261 - ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
0262   number of TX and RX sessions currently installed where host handles
0263   cryptography
0264 
0265 - ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
0266   number of TX and RX sessions currently installed where NIC handles
0267   cryptography
0268 
0269 - ``TlsTxSw``, ``TlsRxSw`` -
0270   number of TX and RX sessions opened with host cryptography
0271 
0272 - ``TlsTxDevice``, ``TlsRxDevice`` -
0273   number of TX and RX sessions opened with NIC cryptography
0274 
0275 - ``TlsDecryptError`` -
0276   record decryption failed (e.g. due to incorrect authentication tag)
0277 
0278 - ``TlsDeviceRxResync`` -
0279   number of RX resyncs sent to NICs handling cryptography
0280 
0281 - ``TlsDecryptRetry`` -
0282   number of RX records which had to be re-decrypted due to
0283   ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
0284   also increment for non-data records.
0285 
0286 - ``TlsRxNoPadViolation`` -
0287   number of data RX records which had to be re-decrypted due to
0288   ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.