Back to home page

OSCL-LXR

 
 

    


0001 ==================
0002 Tag matching logic
0003 ==================
0004 
0005 The MPI standard defines a set of rules, known as tag-matching, for matching
0006 source send operations to destination receives.  The following parameters must
0007 match the following source and destination parameters:
0008 
0009 *       Communicator
0010 *       User tag - wild card may be specified by the receiver
0011 *       Source rank – wild car may be specified by the receiver
0012 *       Destination rank – wild
0013 
0014 The ordering rules require that when more than one pair of send and receive
0015 message envelopes may match, the pair that includes the earliest posted-send
0016 and the earliest posted-receive is the pair that must be used to satisfy the
0017 matching operation. However, this doesn’t imply that tags are consumed in
0018 the order they are created, e.g., a later generated tag may be consumed, if
0019 earlier tags can’t be used to satisfy the matching rules.
0020 
0021 When a message is sent from the sender to the receiver, the communication
0022 library may attempt to process the operation either after or before the
0023 corresponding matching receive is posted.  If a matching receive is posted,
0024 this is an expected message, otherwise it is called an unexpected message.
0025 Implementations frequently use different matching schemes for these two
0026 different matching instances.
0027 
0028 To keep MPI library memory footprint down, MPI implementations typically use
0029 two different protocols for this purpose:
0030 
0031 1.      The Eager protocol- the complete message is sent when the send is
0032 processed by the sender. A completion send is received in the send_cq
0033 notifying that the buffer can be reused.
0034 
0035 2.      The Rendezvous Protocol - the sender sends the tag-matching header,
0036 and perhaps a portion of data when first notifying the receiver. When the
0037 corresponding buffer is posted, the responder will use the information from
0038 the header to initiate an RDMA READ operation directly to the matching buffer.
0039 A fin message needs to be received in order for the buffer to be reused.
0040 
0041 Tag matching implementation
0042 ===========================
0043 
0044 There are two types of matching objects used, the posted receive list and the
0045 unexpected message list. The application posts receive buffers through calls
0046 to the MPI receive routines in the posted receive list and posts send messages
0047 using the MPI send routines. The head of the posted receive list may be
0048 maintained by the hardware, with the software expected to shadow this list.
0049 
0050 When send is initiated and arrives at the receive side, if there is no
0051 pre-posted receive for this arriving message, it is passed to the software and
0052 placed in the unexpected message list. Otherwise the match is processed,
0053 including rendezvous processing, if appropriate, delivering the data to the
0054 specified receive buffer. This allows overlapping receive-side MPI tag
0055 matching with computation.
0056 
0057 When a receive-message is posted, the communication library will first check
0058 the software unexpected message list for a matching receive. If a match is
0059 found, data is delivered to the user buffer, using a software controlled
0060 protocol. The UCX implementation uses either an eager or rendezvous protocol,
0061 depending on data size. If no match is found, the entire pre-posted receive
0062 list is maintained by the hardware, and there is space to add one more
0063 pre-posted receive to this list, this receive is passed to the hardware.
0064 Software is expected to shadow this list, to help with processing MPI cancel
0065 operations. In addition, because hardware and software are not expected to be
0066 tightly synchronized with respect to the tag-matching operation, this shadow
0067 list is used to detect the case that a pre-posted receive is passed to the
0068 hardware, as the matching unexpected message is being passed from the hardware
0069 to the software.