Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0+
0002 
0003 ======================================================
0004 IBM Virtual Management Channel Kernel Driver (IBMVMC)
0005 ======================================================
0006 
0007 :Authors:
0008         Dave Engebretsen <engebret@us.ibm.com>,
0009         Adam Reznechek <adreznec@linux.vnet.ibm.com>,
0010         Steven Royer <seroyer@linux.vnet.ibm.com>,
0011         Bryant G. Ly <bryantly@linux.vnet.ibm.com>,
0012 
0013 Introduction
0014 ============
0015 
0016 Note: Knowledge of virtualization technology is required to understand
0017 this document.
0018 
0019 A good reference document would be:
0020 
0021 https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
0022 
0023 The Virtual Management Channel (VMC) is a logical device which provides an
0024 interface between the hypervisor and a management partition. This interface
0025 is like a message passing interface. This management partition is intended
0026 to provide an alternative to systems that use a Hardware Management
0027 Console (HMC) - based system management.
0028 
0029 The primary hardware management solution that is developed by IBM relies
0030 on an appliance server named the Hardware Management Console (HMC),
0031 packaged as an external tower or rack-mounted personal computer. In a
0032 Power Systems environment, a single HMC can manage multiple POWER
0033 processor-based systems.
0034 
0035 Management Application
0036 ----------------------
0037 
0038 In the management partition, a management application exists which enables
0039 a system administrator to configure the system’s partitioning
0040 characteristics via a command line interface (CLI) or Representational
0041 State Transfer Application (REST API's).
0042 
0043 The management application runs on a Linux logical partition on a
0044 POWER8 or newer processor-based server that is virtualized by PowerVM.
0045 System configuration, maintenance, and control functions which
0046 traditionally require an HMC can be implemented in the management
0047 application using a combination of HMC to hypervisor interfaces and
0048 existing operating system methods. This tool provides a subset of the
0049 functions implemented by the HMC and enables basic partition configuration.
0050 The set of HMC to hypervisor messages supported by the management
0051 application component are passed to the hypervisor over a VMC interface,
0052 which is defined below.
0053 
0054 The VMC enables the management partition to provide basic partitioning
0055 functions:
0056 
0057 - Logical Partitioning Configuration
0058 - Start, and stop actions for individual partitions
0059 - Display of partition status
0060 - Management of virtual Ethernet
0061 - Management of virtual Storage
0062 - Basic system management
0063 
0064 Virtual Management Channel (VMC)
0065 --------------------------------
0066 
0067 A logical device, called the Virtual Management Channel (VMC), is defined
0068 for communicating between the management application and the hypervisor. It
0069 basically creates the pipes that enable virtualization management
0070 software. This device is presented to a designated management partition as
0071 a virtual device.
0072 
0073 This communication device uses Command/Response Queue (CRQ) and the
0074 Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is
0075 defined that must take place to establish that both the hypervisor and
0076 management partition sides of the channel are running prior to
0077 sending/receiving any of the protocol messages.
0078 
0079 This driver also utilizes Transport Event CRQs. CRQ messages are sent
0080 when the hypervisor detects one of the peer partitions has abnormally
0081 terminated, or one side has called H_FREE_CRQ to close their CRQ.
0082 Two new classes of CRQ messages are introduced for the VMC device. VMC
0083 Administrative messages are used for each partition using the VMC to
0084 communicate capabilities to their partner. HMC Interface messages are used
0085 for the actual flow of HMC messages between the management partition and
0086 the hypervisor. As most HMC messages far exceed the size of a CRQ buffer,
0087 a virtual DMA (RMDA) of the HMC message data is done prior to each HMC
0088 Interface CRQ message. Only the management partition drives RDMA
0089 operations; hypervisors never directly cause the movement of message data.
0090 
0091 
0092 Terminology
0093 -----------
0094 RDMA
0095         Remote Direct Memory Access is DMA transfer from the server to its
0096         client or from the server to its partner partition. DMA refers
0097         to both physical I/O to and from memory operations and to memory
0098         to memory move operations.
0099 CRQ
0100         Command/Response Queue a facility which is used to communicate
0101         between partner partitions. Transport events which are signaled
0102         from the hypervisor to partition are also reported in this queue.
0103 
0104 Example Management Partition VMC Driver Interface
0105 =================================================
0106 
0107 This section provides an example for the management application
0108 implementation where a device driver is used to interface to the VMC
0109 device. This driver consists of a new device, for example /dev/ibmvmc,
0110 which provides interfaces to open, close, read, write, and perform
0111 ioctl’s against the VMC device.
0112 
0113 VMC Interface Initialization
0114 ----------------------------
0115 
0116 The device driver is responsible for initializing the VMC when the driver
0117 is loaded. It first creates and initializes the CRQ. Next, an exchange of
0118 VMC capabilities is performed to indicate the code version and number of
0119 resources available in both the management partition and the hypervisor.
0120 Finally, the hypervisor requests that the management partition create an
0121 initial pool of VMC buffers, one buffer for each possible HMC connection,
0122 which will be used for management application  session initialization.
0123 Prior to completion of this initialization sequence, the device returns
0124 EBUSY to open() calls. EIO is returned for all open() failures.
0125 
0126 ::
0127 
0128         Management Partition            Hypervisor
0129                         CRQ INIT
0130         ---------------------------------------->
0131                    CRQ INIT COMPLETE
0132         <----------------------------------------
0133                       CAPABILITIES
0134         ---------------------------------------->
0135                  CAPABILITIES RESPONSE
0136         <----------------------------------------
0137               ADD BUFFER (HMC IDX=0,1,..)         _
0138         <----------------------------------------  |
0139                   ADD BUFFER RESPONSE              | - Perform # HMCs Iterations
0140         ----------------------------------------> -
0141 
0142 VMC Interface Open
0143 ------------------
0144 
0145 After the basic VMC channel has been initialized, an HMC session level
0146 connection can be established. The application layer performs an open() to
0147 the VMC device and executes an ioctl() against it, indicating the HMC ID
0148 (32 bytes of data) for this session. If the VMC device is in an invalid
0149 state, EIO will be returned for the ioctl(). The device driver creates a
0150 new HMC session value (ranging from 1 to 255) and HMC index value (starting
0151 at index 0 and ranging to 254) for this HMC ID. The driver then does an
0152 RDMA of the HMC ID to the hypervisor, and then sends an Interface Open
0153 message to the hypervisor to establish the session over the VMC. After the
0154 hypervisor receives this information, it sends Add Buffer messages to the
0155 management partition to seed an initial pool of buffers for the new HMC
0156 connection. Finally, the hypervisor sends an Interface Open Response
0157 message, to indicate that it is ready for normal runtime messaging. The
0158 following illustrates this VMC flow:
0159 
0160 ::
0161 
0162         Management Partition             Hypervisor
0163                       RDMA HMC ID
0164         ---------------------------------------->
0165                     Interface Open
0166         ---------------------------------------->
0167                       Add Buffer                  _
0168         <----------------------------------------  |
0169                   Add Buffer Response              | - Perform N Iterations
0170         ----------------------------------------> -
0171                 Interface Open Response
0172         <----------------------------------------
0173 
0174 VMC Interface Runtime
0175 ---------------------
0176 
0177 During normal runtime, the management application and the hypervisor
0178 exchange HMC messages via the Signal VMC message and RDMA operations. When
0179 sending data to the hypervisor, the management application performs a
0180 write() to the VMC device, and the driver RDMA’s the data to the hypervisor
0181 and then sends a Signal Message. If a write() is attempted before VMC
0182 device buffers have been made available by the hypervisor, or no buffers
0183 are currently available, EBUSY is returned in response to the write(). A
0184 write() will return EIO for all other errors, such as an invalid device
0185 state. When the hypervisor sends a message to the management, the data is
0186 put into a VMC buffer and an Signal Message is sent to the VMC driver in
0187 the management partition. The driver RDMA’s the buffer into the partition
0188 and passes the data up to the appropriate management application via a
0189 read() to the VMC device. The read() request blocks if there is no buffer
0190 available to read. The management application may use select() to wait for
0191 the VMC device to become ready with data to read.
0192 
0193 ::
0194 
0195         Management Partition             Hypervisor
0196                         MSG RDMA
0197         ---------------------------------------->
0198                         SIGNAL MSG
0199         ---------------------------------------->
0200                         SIGNAL MSG
0201         <----------------------------------------
0202                         MSG RDMA
0203         <----------------------------------------
0204 
0205 VMC Interface Close
0206 -------------------
0207 
0208 HMC session level connections are closed by the management partition when
0209 the application layer performs a close() against the device. This action
0210 results in an Interface Close message flowing to the hypervisor, which
0211 causes the session to be terminated. The device driver must free any
0212 storage allocated for buffers for this HMC connection.
0213 
0214 ::
0215 
0216         Management Partition             Hypervisor
0217                      INTERFACE CLOSE
0218         ---------------------------------------->
0219                 INTERFACE CLOSE RESPONSE
0220         <----------------------------------------
0221 
0222 Additional Information
0223 ======================
0224 
0225 For more information on the documentation for CRQ Messages, VMC Messages,
0226 HMC interface Buffers, and signal messages please refer to the Linux on
0227 Power Architecture Platform Reference. Section F.