0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ======================
0004 RxRPC Network Protocol
0005 ======================
0006
0007 The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
0008 that can be used to perform RxRPC remote operations. This is done over sockets
0009 of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and
0010 receive data, aborts and errors.
0011
0012 Contents of this document:
0013
0014 (#) Overview.
0015
0016 (#) RxRPC protocol summary.
0017
0018 (#) AF_RXRPC driver model.
0019
0020 (#) Control messages.
0021
0022 (#) Socket options.
0023
0024 (#) Security.
0025
0026 (#) Example client usage.
0027
0028 (#) Example server usage.
0029
0030 (#) AF_RXRPC kernel interface.
0031
0032 (#) Configurable parameters.
0033
0034
0035 Overview
0036 ========
0037
0038 RxRPC is a two-layer protocol. There is a session layer which provides
0039 reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
0040 layer, but implements a real network protocol; and there's the presentation
0041 layer which renders structured data to binary blobs and back again using XDR
0042 (as does SunRPC)::
0043
0044 +-------------+
0045 | Application |
0046 +-------------+
0047 | XDR | Presentation
0048 +-------------+
0049 | RxRPC | Session
0050 +-------------+
0051 | UDP | Transport
0052 +-------------+
0053
0054
0055 AF_RXRPC provides:
0056
0057 (1) Part of an RxRPC facility for both kernel and userspace applications by
0058 making the session part of it a Linux network protocol (AF_RXRPC).
0059
0060 (2) A two-phase protocol. The client transmits a blob (the request) and then
0061 receives a blob (the reply), and the server receives the request and then
0062 transmits the reply.
0063
0064 (3) Retention of the reusable bits of the transport system set up for one call
0065 to speed up subsequent calls.
0066
0067 (4) A secure protocol, using the Linux kernel's key retention facility to
0068 manage security on the client end. The server end must of necessity be
0069 more active in security negotiations.
0070
0071 AF_RXRPC does not provide XDR marshalling/presentation facilities. That is
0072 left to the application. AF_RXRPC only deals in blobs. Even the operation ID
0073 is just the first four bytes of the request blob, and as such is beyond the
0074 kernel's interest.
0075
0076
0077 Sockets of AF_RXRPC family are:
0078
0079 (1) created as type SOCK_DGRAM;
0080
0081 (2) provided with a protocol of the type of underlying transport they're going
0082 to use - currently only PF_INET is supported.
0083
0084
0085 The Andrew File System (AFS) is an example of an application that uses this and
0086 that has both kernel (filesystem) and userspace (utility) components.
0087
0088
0089 RxRPC Protocol Summary
0090 ======================
0091
0092 An overview of the RxRPC protocol:
0093
0094 (#) RxRPC sits on top of another networking protocol (UDP is the only option
0095 currently), and uses this to provide network transport. UDP ports, for
0096 example, provide transport endpoints.
0097
0098 (#) RxRPC supports multiple virtual "connections" from any given transport
0099 endpoint, thus allowing the endpoints to be shared, even to the same
0100 remote endpoint.
0101
0102 (#) Each connection goes to a particular "service". A connection may not go
0103 to multiple services. A service may be considered the RxRPC equivalent of
0104 a port number. AF_RXRPC permits multiple services to share an endpoint.
0105
0106 (#) Client-originating packets are marked, thus a transport endpoint can be
0107 shared between client and server connections (connections have a
0108 direction).
0109
0110 (#) Up to a billion connections may be supported concurrently between one
0111 local transport endpoint and one service on one remote endpoint. An RxRPC
0112 connection is described by seven numbers::
0113
0114 Local address }
0115 Local port } Transport (UDP) address
0116 Remote address }
0117 Remote port }
0118 Direction
0119 Connection ID
0120 Service ID
0121
0122 (#) Each RxRPC operation is a "call". A connection may make up to four
0123 billion calls, but only up to four calls may be in progress on a
0124 connection at any one time.
0125
0126 (#) Calls are two-phase and asymmetric: the client sends its request data,
0127 which the service receives; then the service transmits the reply data
0128 which the client receives.
0129
0130 (#) The data blobs are of indefinite size, the end of a phase is marked with a
0131 flag in the packet. The number of packets of data making up one blob may
0132 not exceed 4 billion, however, as this would cause the sequence number to
0133 wrap.
0134
0135 (#) The first four bytes of the request data are the service operation ID.
0136
0137 (#) Security is negotiated on a per-connection basis. The connection is
0138 initiated by the first data packet on it arriving. If security is
0139 requested, the server then issues a "challenge" and then the client
0140 replies with a "response". If the response is successful, the security is
0141 set for the lifetime of that connection, and all subsequent calls made
0142 upon it use that same security. In the event that the server lets a
0143 connection lapse before the client, the security will be renegotiated if
0144 the client uses the connection again.
0145
0146 (#) Calls use ACK packets to handle reliability. Data packets are also
0147 explicitly sequenced per call.
0148
0149 (#) There are two types of positive acknowledgment: hard-ACKs and soft-ACKs.
0150 A hard-ACK indicates to the far side that all the data received to a point
0151 has been received and processed; a soft-ACK indicates that the data has
0152 been received but may yet be discarded and re-requested. The sender may
0153 not discard any transmittable packets until they've been hard-ACK'd.
0154
0155 (#) Reception of a reply data packet implicitly hard-ACK's all the data
0156 packets that make up the request.
0157
0158 (#) An call is complete when the request has been sent, the reply has been
0159 received and the final hard-ACK on the last packet of the reply has
0160 reached the server.
0161
0162 (#) An call may be aborted by either end at any time up to its completion.
0163
0164
0165 AF_RXRPC Driver Model
0166 =====================
0167
0168 About the AF_RXRPC driver:
0169
0170 (#) The AF_RXRPC protocol transparently uses internal sockets of the transport
0171 protocol to represent transport endpoints.
0172
0173 (#) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
0174 connections are handled transparently. One client socket may be used to
0175 make multiple simultaneous calls to the same service. One server socket
0176 may handle calls from many clients.
0177
0178 (#) Additional parallel client connections will be initiated to support extra
0179 concurrent calls, up to a tunable limit.
0180
0181 (#) Each connection is retained for a certain amount of time [tunable] after
0182 the last call currently using it has completed in case a new call is made
0183 that could reuse it.
0184
0185 (#) Each internal UDP socket is retained [tunable] for a certain amount of
0186 time [tunable] after the last connection using it discarded, in case a new
0187 connection is made that could use it.
0188
0189 (#) A client-side connection is only shared between calls if they have
0190 the same key struct describing their security (and assuming the calls
0191 would otherwise share the connection). Non-secured calls would also be
0192 able to share connections with each other.
0193
0194 (#) A server-side connection is shared if the client says it is.
0195
0196 (#) ACK'ing is handled by the protocol driver automatically, including ping
0197 replying.
0198
0199 (#) SO_KEEPALIVE automatically pings the other side to keep the connection
0200 alive [TODO].
0201
0202 (#) If an ICMP error is received, all calls affected by that error will be
0203 aborted with an appropriate network error passed through recvmsg().
0204
0205
0206 Interaction with the user of the RxRPC socket:
0207
0208 (#) A socket is made into a server socket by binding an address with a
0209 non-zero service ID.
0210
0211 (#) In the client, sending a request is achieved with one or more sendmsgs,
0212 followed by the reply being received with one or more recvmsgs.
0213
0214 (#) The first sendmsg for a request to be sent from a client contains a tag to
0215 be used in all other sendmsgs or recvmsgs associated with that call. The
0216 tag is carried in the control data.
0217
0218 (#) connect() is used to supply a default destination address for a client
0219 socket. This may be overridden by supplying an alternate address to the
0220 first sendmsg() of a call (struct msghdr::msg_name).
0221
0222 (#) If connect() is called on an unbound client, a random local port will
0223 bound before the operation takes place.
0224
0225 (#) A server socket may also be used to make client calls. To do this, the
0226 first sendmsg() of the call must specify the target address. The server's
0227 transport endpoint is used to send the packets.
0228
0229 (#) Once the application has received the last message associated with a call,
0230 the tag is guaranteed not to be seen again, and so it can be used to pin
0231 client resources. A new call can then be initiated with the same tag
0232 without fear of interference.
0233
0234 (#) In the server, a request is received with one or more recvmsgs, then the
0235 the reply is transmitted with one or more sendmsgs, and then the final ACK
0236 is received with a last recvmsg.
0237
0238 (#) When sending data for a call, sendmsg is given MSG_MORE if there's more
0239 data to come on that call.
0240
0241 (#) When receiving data for a call, recvmsg flags MSG_MORE if there's more
0242 data to come for that call.
0243
0244 (#) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
0245 to indicate the terminal message for that call.
0246
0247 (#) A call may be aborted by adding an abort control message to the control
0248 data. Issuing an abort terminates the kernel's use of that call's tag.
0249 Any messages waiting in the receive queue for that call will be discarded.
0250
0251 (#) Aborts, busy notifications and challenge packets are delivered by recvmsg,
0252 and control data messages will be set to indicate the context. Receiving
0253 an abort or a busy message terminates the kernel's use of that call's tag.
0254
0255 (#) The control data part of the msghdr struct is used for a number of things:
0256
0257 (#) The tag of the intended or affected call.
0258
0259 (#) Sending or receiving errors, aborts and busy notifications.
0260
0261 (#) Notifications of incoming calls.
0262
0263 (#) Sending debug requests and receiving debug replies [TODO].
0264
0265 (#) When the kernel has received and set up an incoming call, it sends a
0266 message to server application to let it know there's a new call awaiting
0267 its acceptance [recvmsg reports a special control message]. The server
0268 application then uses sendmsg to assign a tag to the new call. Once that
0269 is done, the first part of the request data will be delivered by recvmsg.
0270
0271 (#) The server application has to provide the server socket with a keyring of
0272 secret keys corresponding to the security types it permits. When a secure
0273 connection is being set up, the kernel looks up the appropriate secret key
0274 in the keyring and then sends a challenge packet to the client and
0275 receives a response packet. The kernel then checks the authorisation of
0276 the packet and either aborts the connection or sets up the security.
0277
0278 (#) The name of the key a client will use to secure its communications is
0279 nominated by a socket option.
0280
0281
0282 Notes on sendmsg:
0283
0284 (#) MSG_WAITALL can be set to tell sendmsg to ignore signals if the peer is
0285 making progress at accepting packets within a reasonable time such that we
0286 manage to queue up all the data for transmission. This requires the
0287 client to accept at least one packet per 2*RTT time period.
0288
0289 If this isn't set, sendmsg() will return immediately, either returning
0290 EINTR/ERESTARTSYS if nothing was consumed or returning the amount of data
0291 consumed.
0292
0293
0294 Notes on recvmsg:
0295
0296 (#) If there's a sequence of data messages belonging to a particular call on
0297 the receive queue, then recvmsg will keep working through them until:
0298
0299 (a) it meets the end of that call's received data,
0300
0301 (b) it meets a non-data message,
0302
0303 (c) it meets a message belonging to a different call, or
0304
0305 (d) it fills the user buffer.
0306
0307 If recvmsg is called in blocking mode, it will keep sleeping, awaiting the
0308 reception of further data, until one of the above four conditions is met.
0309
0310 (2) MSG_PEEK operates similarly, but will return immediately if it has put any
0311 data in the buffer rather than sleeping until it can fill the buffer.
0312
0313 (3) If a data message is only partially consumed in filling a user buffer,
0314 then the remainder of that message will be left on the front of the queue
0315 for the next taker. MSG_TRUNC will never be flagged.
0316
0317 (4) If there is more data to be had on a call (it hasn't copied the last byte
0318 of the last data message in that phase yet), then MSG_MORE will be
0319 flagged.
0320
0321
0322 Control Messages
0323 ================
0324
0325 AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex
0326 calls, to invoke certain actions and to report certain conditions. These are:
0327
0328 ======================= === =========== ===============================
0329 MESSAGE ID SRT DATA MEANING
0330 ======================= === =========== ===============================
0331 RXRPC_USER_CALL_ID sr- User ID App's call specifier
0332 RXRPC_ABORT srt Abort code Abort code to issue/received
0333 RXRPC_ACK -rt n/a Final ACK received
0334 RXRPC_NET_ERROR -rt error num Network error on call
0335 RXRPC_BUSY -rt n/a Call rejected (server busy)
0336 RXRPC_LOCAL_ERROR -rt error num Local error encountered
0337 RXRPC_NEW_CALL -r- n/a New call received
0338 RXRPC_ACCEPT s-- n/a Accept new call
0339 RXRPC_EXCLUSIVE_CALL s-- n/a Make an exclusive client call
0340 RXRPC_UPGRADE_SERVICE s-- n/a Client call can be upgraded
0341 RXRPC_TX_LENGTH s-- data len Total length of Tx data
0342 ======================= === =========== ===============================
0343
0344 (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
0345
0346 (#) RXRPC_USER_CALL_ID
0347
0348 This is used to indicate the application's call ID. It's an unsigned long
0349 that the app specifies in the client by attaching it to the first data
0350 message or in the server by passing it in association with an RXRPC_ACCEPT
0351 message. recvmsg() passes it in conjunction with all messages except
0352 those of the RXRPC_NEW_CALL message.
0353
0354 (#) RXRPC_ABORT
0355
0356 This is can be used by an application to abort a call by passing it to
0357 sendmsg, or it can be delivered by recvmsg to indicate a remote abort was
0358 received. Either way, it must be associated with an RXRPC_USER_CALL_ID to
0359 specify the call affected. If an abort is being sent, then error EBADSLT
0360 will be returned if there is no call with that user ID.
0361
0362 (#) RXRPC_ACK
0363
0364 This is delivered to a server application to indicate that the final ACK
0365 of a call was received from the client. It will be associated with an
0366 RXRPC_USER_CALL_ID to indicate the call that's now complete.
0367
0368 (#) RXRPC_NET_ERROR
0369
0370 This is delivered to an application to indicate that an ICMP error message
0371 was encountered in the process of trying to talk to the peer. An
0372 errno-class integer value will be included in the control message data
0373 indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
0374 affected.
0375
0376 (#) RXRPC_BUSY
0377
0378 This is delivered to a client application to indicate that a call was
0379 rejected by the server due to the server being busy. It will be
0380 associated with an RXRPC_USER_CALL_ID to indicate the rejected call.
0381
0382 (#) RXRPC_LOCAL_ERROR
0383
0384 This is delivered to an application to indicate that a local error was
0385 encountered and that a call has been aborted because of it. An
0386 errno-class integer value will be included in the control message data
0387 indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
0388 affected.
0389
0390 (#) RXRPC_NEW_CALL
0391
0392 This is delivered to indicate to a server application that a new call has
0393 arrived and is awaiting acceptance. No user ID is associated with this,
0394 as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT.
0395
0396 (#) RXRPC_ACCEPT
0397
0398 This is used by a server application to attempt to accept a call and
0399 assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID
0400 to indicate the user ID to be assigned. If there is no call to be
0401 accepted (it may have timed out, been aborted, etc.), then sendmsg will
0402 return error ENODATA. If the user ID is already in use by another call,
0403 then error EBADSLT will be returned.
0404
0405 (#) RXRPC_EXCLUSIVE_CALL
0406
0407 This is used to indicate that a client call should be made on a one-off
0408 connection. The connection is discarded once the call has terminated.
0409
0410 (#) RXRPC_UPGRADE_SERVICE
0411
0412 This is used to make a client call to probe if the specified service ID
0413 may be upgraded by the server. The caller must check msg_name returned to
0414 recvmsg() for the service ID actually in use. The operation probed must
0415 be one that takes the same arguments in both services.
0416
0417 Once this has been used to establish the upgrade capability (or lack
0418 thereof) of the server, the service ID returned should be used for all
0419 future communication to that server and RXRPC_UPGRADE_SERVICE should no
0420 longer be set.
0421
0422 (#) RXRPC_TX_LENGTH
0423
0424 This is used to inform the kernel of the total amount of data that is
0425 going to be transmitted by a call (whether in a client request or a
0426 service response). If given, it allows the kernel to encrypt from the
0427 userspace buffer directly to the packet buffers, rather than copying into
0428 the buffer and then encrypting in place. This may only be given with the
0429 first sendmsg() providing data for a call. EMSGSIZE will be generated if
0430 the amount of data actually given is different.
0431
0432 This takes a parameter of __s64 type that indicates how much will be
0433 transmitted. This may not be less than zero.
0434
0435 The symbol RXRPC__SUPPORTED is defined as one more than the highest control
0436 message type supported. At run time this can be queried by means of the
0437 RXRPC_SUPPORTED_CMSG socket option (see below).
0438
0439
0440 ==============
0441 SOCKET OPTIONS
0442 ==============
0443
0444 AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
0445
0446 (#) RXRPC_SECURITY_KEY
0447
0448 This is used to specify the description of the key to be used. The key is
0449 extracted from the calling process's keyrings with request_key() and
0450 should be of "rxrpc" type.
0451
0452 The optval pointer points to the description string, and optlen indicates
0453 how long the string is, without the NUL terminator.
0454
0455 (#) RXRPC_SECURITY_KEYRING
0456
0457 Similar to above but specifies a keyring of server secret keys to use (key
0458 type "keyring"). See the "Security" section.
0459
0460 (#) RXRPC_EXCLUSIVE_CONNECTION
0461
0462 This is used to request that new connections should be used for each call
0463 made subsequently on this socket. optval should be NULL and optlen 0.
0464
0465 (#) RXRPC_MIN_SECURITY_LEVEL
0466
0467 This is used to specify the minimum security level required for calls on
0468 this socket. optval must point to an int containing one of the following
0469 values:
0470
0471 (a) RXRPC_SECURITY_PLAIN
0472
0473 Encrypted checksum only.
0474
0475 (b) RXRPC_SECURITY_AUTH
0476
0477 Encrypted checksum plus packet padded and first eight bytes of packet
0478 encrypted - which includes the actual packet length.
0479
0480 (c) RXRPC_SECURITY_ENCRYPT
0481
0482 Encrypted checksum plus entire packet padded and encrypted, including
0483 actual packet length.
0484
0485 (#) RXRPC_UPGRADEABLE_SERVICE
0486
0487 This is used to indicate that a service socket with two bindings may
0488 upgrade one bound service to the other if requested by the client. optval
0489 must point to an array of two unsigned short ints. The first is the
0490 service ID to upgrade from and the second the service ID to upgrade to.
0491
0492 (#) RXRPC_SUPPORTED_CMSG
0493
0494 This is a read-only option that writes an int into the buffer indicating
0495 the highest control message type supported.
0496
0497
0498 ========
0499 SECURITY
0500 ========
0501
0502 Currently, only the kerberos 4 equivalent protocol has been implemented
0503 (security index 2 - rxkad). This requires the rxkad module to be loaded and,
0504 on the client, tickets of the appropriate type to be obtained from the AFS
0505 kaserver or the kerberos server and installed as "rxrpc" type keys. This is
0506 normally done using the klog program. An example simple klog program can be
0507 found at:
0508
0509 http://people.redhat.com/~dhowells/rxrpc/klog.c
0510
0511 The payload provided to add_key() on the client should be of the following
0512 form::
0513
0514 struct rxrpc_key_sec2_v1 {
0515 uint16_t security_index; /* 2 */
0516 uint16_t ticket_length; /* length of ticket[] */
0517 uint32_t expiry; /* time at which expires */
0518 uint8_t kvno; /* key version number */
0519 uint8_t __pad[3];
0520 uint8_t session_key[8]; /* DES session key */
0521 uint8_t ticket[0]; /* the encrypted ticket */
0522 };
0523
0524 Where the ticket blob is just appended to the above structure.
0525
0526
0527 For the server, keys of type "rxrpc_s" must be made available to the server.
0528 They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an
0529 rxkad key for the AFS VL service). When such a key is created, it should be
0530 given the server's secret key as the instantiation data (see the example
0531 below).
0532
0533 add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
0534
0535 A keyring is passed to the server socket by naming it in a sockopt. The server
0536 socket then looks the server secret keys up in this keyring when secure
0537 incoming connections are made. This can be seen in an example program that can
0538 be found at:
0539
0540 http://people.redhat.com/~dhowells/rxrpc/listen.c
0541
0542
0543 ====================
0544 EXAMPLE CLIENT USAGE
0545 ====================
0546
0547 A client would issue an operation by:
0548
0549 (1) An RxRPC socket is set up by::
0550
0551 client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
0552
0553 Where the third parameter indicates the protocol family of the transport
0554 socket used - usually IPv4 but it can also be IPv6 [TODO].
0555
0556 (2) A local address can optionally be bound::
0557
0558 struct sockaddr_rxrpc srx = {
0559 .srx_family = AF_RXRPC,
0560 .srx_service = 0, /* we're a client */
0561 .transport_type = SOCK_DGRAM, /* type of transport socket */
0562 .transport.sin_family = AF_INET,
0563 .transport.sin_port = htons(7000), /* AFS callback */
0564 .transport.sin_address = 0, /* all local interfaces */
0565 };
0566 bind(client, &srx, sizeof(srx));
0567
0568 This specifies the local UDP port to be used. If not given, a random
0569 non-privileged port will be used. A UDP port may be shared between
0570 several unrelated RxRPC sockets. Security is handled on a basis of
0571 per-RxRPC virtual connection.
0572
0573 (3) The security is set::
0574
0575 const char *key = "AFS:cambridge.redhat.com";
0576 setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
0577
0578 This issues a request_key() to get the key representing the security
0579 context. The minimum security level can be set::
0580
0581 unsigned int sec = RXRPC_SECURITY_ENCRYPT;
0582 setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
0583 &sec, sizeof(sec));
0584
0585 (4) The server to be contacted can then be specified (alternatively this can
0586 be done through sendmsg)::
0587
0588 struct sockaddr_rxrpc srx = {
0589 .srx_family = AF_RXRPC,
0590 .srx_service = VL_SERVICE_ID,
0591 .transport_type = SOCK_DGRAM, /* type of transport socket */
0592 .transport.sin_family = AF_INET,
0593 .transport.sin_port = htons(7005), /* AFS volume manager */
0594 .transport.sin_address = ...,
0595 };
0596 connect(client, &srx, sizeof(srx));
0597
0598 (5) The request data should then be posted to the server socket using a series
0599 of sendmsg() calls, each with the following control message attached:
0600
0601 ================== ===================================
0602 RXRPC_USER_CALL_ID specifies the user ID for this call
0603 ================== ===================================
0604
0605 MSG_MORE should be set in msghdr::msg_flags on all but the last part of
0606 the request. Multiple requests may be made simultaneously.
0607
0608 An RXRPC_TX_LENGTH control message can also be specified on the first
0609 sendmsg() call.
0610
0611 If a call is intended to go to a destination other than the default
0612 specified through connect(), then msghdr::msg_name should be set on the
0613 first request message of that call.
0614
0615 (6) The reply data will then be posted to the server socket for recvmsg() to
0616 pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data
0617 for a particular call to be read. MSG_EOR will be set on the terminal
0618 read for a call.
0619
0620 All data will be delivered with the following control message attached:
0621
0622 RXRPC_USER_CALL_ID - specifies the user ID for this call
0623
0624 If an abort or error occurred, this will be returned in the control data
0625 buffer instead, and MSG_EOR will be flagged to indicate the end of that
0626 call.
0627
0628 A client may ask for a service ID it knows and ask that this be upgraded to a
0629 better service if one is available by supplying RXRPC_UPGRADE_SERVICE on the
0630 first sendmsg() of a call. The client should then check srx_service in the
0631 msg_name filled in by recvmsg() when collecting the result. srx_service will
0632 hold the same value as given to sendmsg() if the upgrade request was ignored by
0633 the service - otherwise it will be altered to indicate the service ID the
0634 server upgraded to. Note that the upgraded service ID is chosen by the server.
0635 The caller has to wait until it sees the service ID in the reply before sending
0636 any more calls (further calls to the same destination will be blocked until the
0637 probe is concluded).
0638
0639
0640 Example Server Usage
0641 ====================
0642
0643 A server would be set up to accept operations in the following manner:
0644
0645 (1) An RxRPC socket is created by::
0646
0647 server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
0648
0649 Where the third parameter indicates the address type of the transport
0650 socket used - usually IPv4.
0651
0652 (2) Security is set up if desired by giving the socket a keyring with server
0653 secret keys in it::
0654
0655 keyring = add_key("keyring", "AFSkeys", NULL, 0,
0656 KEY_SPEC_PROCESS_KEYRING);
0657
0658 const char secret_key[8] = {
0659 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 };
0660 add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
0661
0662 setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);
0663
0664 The keyring can be manipulated after it has been given to the socket. This
0665 permits the server to add more keys, replace keys, etc. while it is live.
0666
0667 (3) A local address must then be bound::
0668
0669 struct sockaddr_rxrpc srx = {
0670 .srx_family = AF_RXRPC,
0671 .srx_service = VL_SERVICE_ID, /* RxRPC service ID */
0672 .transport_type = SOCK_DGRAM, /* type of transport socket */
0673 .transport.sin_family = AF_INET,
0674 .transport.sin_port = htons(7000), /* AFS callback */
0675 .transport.sin_address = 0, /* all local interfaces */
0676 };
0677 bind(server, &srx, sizeof(srx));
0678
0679 More than one service ID may be bound to a socket, provided the transport
0680 parameters are the same. The limit is currently two. To do this, bind()
0681 should be called twice.
0682
0683 (4) If service upgrading is required, first two service IDs must have been
0684 bound and then the following option must be set::
0685
0686 unsigned short service_ids[2] = { from_ID, to_ID };
0687 setsockopt(server, SOL_RXRPC, RXRPC_UPGRADEABLE_SERVICE,
0688 service_ids, sizeof(service_ids));
0689
0690 This will automatically upgrade connections on service from_ID to service
0691 to_ID if they request it. This will be reflected in msg_name obtained
0692 through recvmsg() when the request data is delivered to userspace.
0693
0694 (5) The server is then set to listen out for incoming calls::
0695
0696 listen(server, 100);
0697
0698 (6) The kernel notifies the server of pending incoming connections by sending
0699 it a message for each. This is received with recvmsg() on the server
0700 socket. It has no data, and has a single dataless control message
0701 attached::
0702
0703 RXRPC_NEW_CALL
0704
0705 The address that can be passed back by recvmsg() at this point should be
0706 ignored since the call for which the message was posted may have gone by
0707 the time it is accepted - in which case the first call still on the queue
0708 will be accepted.
0709
0710 (7) The server then accepts the new call by issuing a sendmsg() with two
0711 pieces of control data and no actual data:
0712
0713 ================== ==============================
0714 RXRPC_ACCEPT indicate connection acceptance
0715 RXRPC_USER_CALL_ID specify user ID for this call
0716 ================== ==============================
0717
0718 (8) The first request data packet will then be posted to the server socket for
0719 recvmsg() to pick up. At that point, the RxRPC address for the call can
0720 be read from the address fields in the msghdr struct.
0721
0722 Subsequent request data will be posted to the server socket for recvmsg()
0723 to collect as it arrives. All but the last piece of the request data will
0724 be delivered with MSG_MORE flagged.
0725
0726 All data will be delivered with the following control message attached:
0727
0728
0729 ================== ===================================
0730 RXRPC_USER_CALL_ID specifies the user ID for this call
0731 ================== ===================================
0732
0733 (9) The reply data should then be posted to the server socket using a series
0734 of sendmsg() calls, each with the following control messages attached:
0735
0736 ================== ===================================
0737 RXRPC_USER_CALL_ID specifies the user ID for this call
0738 ================== ===================================
0739
0740 MSG_MORE should be set in msghdr::msg_flags on all but the last message
0741 for a particular call.
0742
0743 (10) The final ACK from the client will be posted for retrieval by recvmsg()
0744 when it is received. It will take the form of a dataless message with two
0745 control messages attached:
0746
0747 ================== ===================================
0748 RXRPC_USER_CALL_ID specifies the user ID for this call
0749 RXRPC_ACK indicates final ACK (no data)
0750 ================== ===================================
0751
0752 MSG_EOR will be flagged to indicate that this is the final message for
0753 this call.
0754
0755 (11) Up to the point the final packet of reply data is sent, the call can be
0756 aborted by calling sendmsg() with a dataless message with the following
0757 control messages attached:
0758
0759 ================== ===================================
0760 RXRPC_USER_CALL_ID specifies the user ID for this call
0761 RXRPC_ABORT indicates abort code (4 byte data)
0762 ================== ===================================
0763
0764 Any packets waiting in the socket's receive queue will be discarded if
0765 this is issued.
0766
0767 Note that all the communications for a particular service take place through
0768 the one server socket, using control messages on sendmsg() and recvmsg() to
0769 determine the call affected.
0770
0771
0772 AF_RXRPC Kernel Interface
0773 =========================
0774
0775 The AF_RXRPC module also provides an interface for use by in-kernel utilities
0776 such as the AFS filesystem. This permits such a utility to:
0777
0778 (1) Use different keys directly on individual client calls on one socket
0779 rather than having to open a whole slew of sockets, one for each key it
0780 might want to use.
0781
0782 (2) Avoid having RxRPC call request_key() at the point of issue of a call or
0783 opening of a socket. Instead the utility is responsible for requesting a
0784 key at the appropriate point. AFS, for instance, would do this during VFS
0785 operations such as open() or unlink(). The key is then handed through
0786 when the call is initiated.
0787
0788 (3) Request the use of something other than GFP_KERNEL to allocate memory.
0789
0790 (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be
0791 intercepted before they get put into the socket Rx queue and the socket
0792 buffers manipulated directly.
0793
0794 To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
0795 bind an address as appropriate and listen if it's to be a server socket, but
0796 then it passes this to the kernel interface functions.
0797
0798 The kernel interface functions are as follows:
0799
0800 (#) Begin a new client call::
0801
0802 struct rxrpc_call *
0803 rxrpc_kernel_begin_call(struct socket *sock,
0804 struct sockaddr_rxrpc *srx,
0805 struct key *key,
0806 unsigned long user_call_ID,
0807 s64 tx_total_len,
0808 gfp_t gfp,
0809 rxrpc_notify_rx_t notify_rx,
0810 bool upgrade,
0811 bool intr,
0812 unsigned int debug_id);
0813
0814 This allocates the infrastructure to make a new RxRPC call and assigns
0815 call and connection numbers. The call will be made on the UDP port that
0816 the socket is bound to. The call will go to the destination address of a
0817 connected client socket unless an alternative is supplied (srx is
0818 non-NULL).
0819
0820 If a key is supplied then this will be used to secure the call instead of
0821 the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls
0822 secured in this way will still share connections if at all possible.
0823
0824 The user_call_ID is equivalent to that supplied to sendmsg() in the
0825 control data buffer. It is entirely feasible to use this to point to a
0826 kernel data structure.
0827
0828 tx_total_len is the amount of data the caller is intending to transmit
0829 with this call (or -1 if unknown at this point). Setting the data size
0830 allows the kernel to encrypt directly to the packet buffers, thereby
0831 saving a copy. The value may not be less than -1.
0832
0833 notify_rx is a pointer to a function to be called when events such as
0834 incoming data packets or remote aborts happen.
0835
0836 upgrade should be set to true if a client operation should request that
0837 the server upgrade the service to a better one. The resultant service ID
0838 is returned by rxrpc_kernel_recv_data().
0839
0840 intr should be set to true if the call should be interruptible. If this
0841 is not set, this function may not return until a channel has been
0842 allocated; if it is set, the function may return -ERESTARTSYS.
0843
0844 debug_id is the call debugging ID to be used for tracing. This can be
0845 obtained by atomically incrementing rxrpc_debug_id.
0846
0847 If this function is successful, an opaque reference to the RxRPC call is
0848 returned. The caller now holds a reference on this and it must be
0849 properly ended.
0850
0851 (#) End a client call::
0852
0853 void rxrpc_kernel_end_call(struct socket *sock,
0854 struct rxrpc_call *call);
0855
0856 This is used to end a previously begun call. The user_call_ID is expunged
0857 from AF_RXRPC's knowledge and will not be seen again in association with
0858 the specified call.
0859
0860 (#) Send data through a call::
0861
0862 typedef void (*rxrpc_notify_end_tx_t)(struct sock *sk,
0863 unsigned long user_call_ID,
0864 struct sk_buff *skb);
0865
0866 int rxrpc_kernel_send_data(struct socket *sock,
0867 struct rxrpc_call *call,
0868 struct msghdr *msg,
0869 size_t len,
0870 rxrpc_notify_end_tx_t notify_end_rx);
0871
0872 This is used to supply either the request part of a client call or the
0873 reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the
0874 data buffers to be used. msg_iov may not be NULL and must point
0875 exclusively to in-kernel virtual addresses. msg.msg_flags may be given
0876 MSG_MORE if there will be subsequent data sends for this call.
0877
0878 The msg must not specify a destination address, control data or any flags
0879 other than MSG_MORE. len is the total amount of data to transmit.
0880
0881 notify_end_rx can be NULL or it can be used to specify a function to be
0882 called when the call changes state to end the Tx phase. This function is
0883 called with the call-state spinlock held to prevent any reply or final ACK
0884 from being delivered first.
0885
0886 (#) Receive data from a call::
0887
0888 int rxrpc_kernel_recv_data(struct socket *sock,
0889 struct rxrpc_call *call,
0890 void *buf,
0891 size_t size,
0892 size_t *_offset,
0893 bool want_more,
0894 u32 *_abort,
0895 u16 *_service)
0896
0897 This is used to receive data from either the reply part of a client call
0898 or the request part of a service call. buf and size specify how much
0899 data is desired and where to store it. *_offset is added on to buf and
0900 subtracted from size internally; the amount copied into the buffer is
0901 added to *_offset before returning.
0902
0903 want_more should be true if further data will be required after this is
0904 satisfied and false if this is the last item of the receive phase.
0905
0906 There are three normal returns: 0 if the buffer was filled and want_more
0907 was true; 1 if the buffer was filled, the last DATA packet has been
0908 emptied and want_more was false; and -EAGAIN if the function needs to be
0909 called again.
0910
0911 If the last DATA packet is processed but the buffer contains less than
0912 the amount requested, EBADMSG is returned. If want_more wasn't set, but
0913 more data was available, EMSGSIZE is returned.
0914
0915 If a remote ABORT is detected, the abort code received will be stored in
0916 ``*_abort`` and ECONNABORTED will be returned.
0917
0918 The service ID that the call ended up with is returned into *_service.
0919 This can be used to see if a call got a service upgrade.
0920
0921 (#) Abort a call??
0922
0923 ::
0924
0925 void rxrpc_kernel_abort_call(struct socket *sock,
0926 struct rxrpc_call *call,
0927 u32 abort_code);
0928
0929 This is used to abort a call if it's still in an abortable state. The
0930 abort code specified will be placed in the ABORT message sent.
0931
0932 (#) Intercept received RxRPC messages::
0933
0934 typedef void (*rxrpc_interceptor_t)(struct sock *sk,
0935 unsigned long user_call_ID,
0936 struct sk_buff *skb);
0937
0938 void
0939 rxrpc_kernel_intercept_rx_messages(struct socket *sock,
0940 rxrpc_interceptor_t interceptor);
0941
0942 This installs an interceptor function on the specified AF_RXRPC socket.
0943 All messages that would otherwise wind up in the socket's Rx queue are
0944 then diverted to this function. Note that care must be taken to process
0945 the messages in the right order to maintain DATA message sequentiality.
0946
0947 The interceptor function itself is provided with the address of the socket
0948 and handling the incoming message, the ID assigned by the kernel utility
0949 to the call and the socket buffer containing the message.
0950
0951 The skb->mark field indicates the type of message:
0952
0953 =============================== =======================================
0954 Mark Meaning
0955 =============================== =======================================
0956 RXRPC_SKB_MARK_DATA Data message
0957 RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call
0958 RXRPC_SKB_MARK_BUSY Client call rejected as server busy
0959 RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer
0960 RXRPC_SKB_MARK_NET_ERROR Network error detected
0961 RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered
0962 RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance
0963 =============================== =======================================
0964
0965 The remote abort message can be probed with rxrpc_kernel_get_abort_code().
0966 The two error messages can be probed with rxrpc_kernel_get_error_number().
0967 A new call can be accepted with rxrpc_kernel_accept_call().
0968
0969 Data messages can have their contents extracted with the usual bunch of
0970 socket buffer manipulation functions. A data message can be determined to
0971 be the last one in a sequence with rxrpc_kernel_is_data_last(). When a
0972 data message has been used up, rxrpc_kernel_data_consumed() should be
0973 called on it.
0974
0975 Messages should be handled to rxrpc_kernel_free_skb() to dispose of. It
0976 is possible to get extra refs on all types of message for later freeing,
0977 but this may pin the state of a call until the message is finally freed.
0978
0979 (#) Accept an incoming call::
0980
0981 struct rxrpc_call *
0982 rxrpc_kernel_accept_call(struct socket *sock,
0983 unsigned long user_call_ID);
0984
0985 This is used to accept an incoming call and to assign it a call ID. This
0986 function is similar to rxrpc_kernel_begin_call() and calls accepted must
0987 be ended in the same way.
0988
0989 If this function is successful, an opaque reference to the RxRPC call is
0990 returned. The caller now holds a reference on this and it must be
0991 properly ended.
0992
0993 (#) Reject an incoming call::
0994
0995 int rxrpc_kernel_reject_call(struct socket *sock);
0996
0997 This is used to reject the first incoming call on the socket's queue with
0998 a BUSY message. -ENODATA is returned if there were no incoming calls.
0999 Other errors may be returned if the call had been aborted (-ECONNABORTED)
1000 or had timed out (-ETIME).
1001
1002 (#) Allocate a null key for doing anonymous security::
1003
1004 struct key *rxrpc_get_null_key(const char *keyname);
1005
1006 This is used to allocate a null RxRPC key that can be used to indicate
1007 anonymous security for a particular domain.
1008
1009 (#) Get the peer address of a call::
1010
1011 void rxrpc_kernel_get_peer(struct socket *sock, struct rxrpc_call *call,
1012 struct sockaddr_rxrpc *_srx);
1013
1014 This is used to find the remote peer address of a call.
1015
1016 (#) Set the total transmit data size on a call::
1017
1018 void rxrpc_kernel_set_tx_length(struct socket *sock,
1019 struct rxrpc_call *call,
1020 s64 tx_total_len);
1021
1022 This sets the amount of data that the caller is intending to transmit on a
1023 call. It's intended to be used for setting the reply size as the request
1024 size should be set when the call is begun. tx_total_len may not be less
1025 than zero.
1026
1027 (#) Get call RTT::
1028
1029 u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
1030
1031 Get the RTT time to the peer in use by a call. The value returned is in
1032 nanoseconds.
1033
1034 (#) Check call still alive::
1035
1036 bool rxrpc_kernel_check_life(struct socket *sock,
1037 struct rxrpc_call *call,
1038 u32 *_life);
1039 void rxrpc_kernel_probe_life(struct socket *sock,
1040 struct rxrpc_call *call);
1041
1042 The first function passes back in ``*_life`` a number that is updated when
1043 ACKs are received from the peer (notably including PING RESPONSE ACKs
1044 which we can elicit by sending PING ACKs to see if the call still exists
1045 on the server). The caller should compare the numbers of two calls to see
1046 if the call is still alive after waiting for a suitable interval. It also
1047 returns true as long as the call hasn't yet reached the completed state.
1048
1049 This allows the caller to work out if the server is still contactable and
1050 if the call is still alive on the server while waiting for the server to
1051 process a client operation.
1052
1053 The second function causes a ping ACK to be transmitted to try to provoke
1054 the peer into responding, which would then cause the value returned by the
1055 first function to change. Note that this must be called in TASK_RUNNING
1056 state.
1057
1058 (#) Get remote client epoch::
1059
1060 u32 rxrpc_kernel_get_epoch(struct socket *sock,
1061 struct rxrpc_call *call)
1062
1063 This allows the epoch that's contained in packets of an incoming client
1064 call to be queried. This value is returned. The function always
1065 successful if the call is still in progress. It shouldn't be called once
1066 the call has expired. Note that calling this on a local client call only
1067 returns the local epoch.
1068
1069 This value can be used to determine if the remote client has been
1070 restarted as it shouldn't change otherwise.
1071
1072 (#) Set the maxmimum lifespan on a call::
1073
1074 void rxrpc_kernel_set_max_life(struct socket *sock,
1075 struct rxrpc_call *call,
1076 unsigned long hard_timeout)
1077
1078 This sets the maximum lifespan on a call to hard_timeout (which is in
1079 jiffies). In the event of the timeout occurring, the call will be
1080 aborted and -ETIME or -ETIMEDOUT will be returned.
1081
1082 (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the
1083 kernel::
1084
1085 int rxrpc_sock_set_min_security_level(struct sock *sk,
1086 unsigned int val);
1087
1088 This specifies the minimum security level required for calls on this
1089 socket.
1090
1091
1092 Configurable Parameters
1093 =======================
1094
1095 The RxRPC protocol driver has a number of configurable parameters that can be
1096 adjusted through sysctls in /proc/net/rxrpc/:
1097
1098 (#) req_ack_delay
1099
1100 The amount of time in milliseconds after receiving a packet with the
1101 request-ack flag set before we honour the flag and actually send the
1102 requested ack.
1103
1104 Usually the other side won't stop sending packets until the advertised
1105 reception window is full (to a maximum of 255 packets), so delaying the
1106 ACK permits several packets to be ACK'd in one go.
1107
1108 (#) soft_ack_delay
1109
1110 The amount of time in milliseconds after receiving a new packet before we
1111 generate a soft-ACK to tell the sender that it doesn't need to resend.
1112
1113 (#) idle_ack_delay
1114
1115 The amount of time in milliseconds after all the packets currently in the
1116 received queue have been consumed before we generate a hard-ACK to tell
1117 the sender it can free its buffers, assuming no other reason occurs that
1118 we would send an ACK.
1119
1120 (#) resend_timeout
1121
1122 The amount of time in milliseconds after transmitting a packet before we
1123 transmit it again, assuming no ACK is received from the receiver telling
1124 us they got it.
1125
1126 (#) max_call_lifetime
1127
1128 The maximum amount of time in seconds that a call may be in progress
1129 before we preemptively kill it.
1130
1131 (#) dead_call_expiry
1132
1133 The amount of time in seconds before we remove a dead call from the call
1134 list. Dead calls are kept around for a little while for the purpose of
1135 repeating ACK and ABORT packets.
1136
1137 (#) connection_expiry
1138
1139 The amount of time in seconds after a connection was last used before we
1140 remove it from the connection list. While a connection is in existence,
1141 it serves as a placeholder for negotiated security; when it is deleted,
1142 the security must be renegotiated.
1143
1144 (#) transport_expiry
1145
1146 The amount of time in seconds after a transport was last used before we
1147 remove it from the transport list. While a transport is in existence, it
1148 serves to anchor the peer data and keeps the connection ID counter.
1149
1150 (#) rxrpc_rx_window_size
1151
1152 The size of the receive window in packets. This is the maximum number of
1153 unconsumed received packets we're willing to hold in memory for any
1154 particular call.
1155
1156 (#) rxrpc_rx_mtu
1157
1158 The maximum packet MTU size that we're willing to receive in bytes. This
1159 indicates to the peer whether we're willing to accept jumbo packets.
1160
1161 (#) rxrpc_rx_jumbo_max
1162
1163 The maximum number of packets that we're willing to accept in a jumbo
1164 packet. Non-terminal packets in a jumbo packet must contain a four byte
1165 header plus exactly 1412 bytes of data. The terminal packet must contain
1166 a four byte header plus any amount of data. In any event, a jumbo packet
1167 may not exceed rxrpc_rx_mtu in size.