Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ===========================
0004 Hypercall Op-codes (hcalls)
0005 ===========================
0006 
0007 Overview
0008 =========
0009 
0010 Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
0011 specification [1]_ which describes the run-time environment for a guest
0012 operating system and how it should interact with the hypervisor for
0013 privileged operations. Currently there are two PAPR compliant hypervisors:
0014 
0015 - **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
0016   IBM-i and  Linux as supported guests (termed as Logical Partitions
0017   or LPARS). It supports the full PAPR specification.
0018 
0019 - **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
0020   Though it only implements a subset of PAPR specification called LoPAPR [2]_.
0021 
0022 On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
0023 a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
0024 issue hypercalls to the hypervisor whenever it needs to perform an action
0025 that is hypervisor priviledged [3]_ or for other services managed by the
0026 hypervisor.
0027 
0028 Hence a Hypercall (hcall) is essentially a request by the pseries guest
0029 asking hypervisor to perform a privileged operation on behalf of the guest. The
0030 guest issues a with necessary input operands. The hypervisor after performing
0031 the privilege operation returns a status code and output operands back to the
0032 guest.
0033 
0034 HCALL ABI
0035 =========
0036 The ABI specification for a hcall between a pseries guest and PAPR hypervisor
0037 is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
0038 done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
0039 and any in-arguments for the hcall are provided in registers *r4-r12*. If values
0040 have to be passed through a memory buffer, the data stored in that buffer should be
0041 in Big-endian byte order.
0042 
0043 Once control returns back to the guest after hypervisor has serviced the
0044 'HVCS' instruction the return value of the hcall is available in *r3* and any
0045 out values are returned in registers *r4-r12*. Again like in case of in-arguments,
0046 any out values stored in a memory buffer will be in Big-endian byte order.
0047 
0048 Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
0049 in a arch specific header [4]_ to issue hcalls from the linux kernel
0050 running as pseries guest.
0051 
0052 Register Conventions
0053 ====================
0054 
0055 Any hcall should follow same register convention as described in section 2.2.1.1
0056 of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
0057 summarizes these conventions:
0058 
0059 +----------+----------+-------------------------------------------+
0060 | Register |Volatile  |  Purpose                                  |
0061 | Range    |(Y/N)     |                                           |
0062 +==========+==========+===========================================+
0063 |   r0     |    Y     |  Optional-usage                           |
0064 +----------+----------+-------------------------------------------+
0065 |   r1     |    N     |  Stack Pointer                            |
0066 +----------+----------+-------------------------------------------+
0067 |   r2     |    N     |  TOC                                      |
0068 +----------+----------+-------------------------------------------+
0069 |   r3     |    Y     |  hcall opcode/return value                |
0070 +----------+----------+-------------------------------------------+
0071 |  r4-r10  |    Y     |  in and out values                        |
0072 +----------+----------+-------------------------------------------+
0073 |   r11    |    Y     |  Optional-usage/Environmental pointer     |
0074 +----------+----------+-------------------------------------------+
0075 |   r12    |    Y     |  Optional-usage/Function entry address at |
0076 |          |          |  global entry point                       |
0077 +----------+----------+-------------------------------------------+
0078 |   r13    |    N     |  Thread-Pointer                           |
0079 +----------+----------+-------------------------------------------+
0080 |  r14-r31 |    N     |  Local Variables                          |
0081 +----------+----------+-------------------------------------------+
0082 |    LR    |    Y     |  Link Register                            |
0083 +----------+----------+-------------------------------------------+
0084 |   CTR    |    Y     |  Loop Counter                             |
0085 +----------+----------+-------------------------------------------+
0086 |   XER    |    Y     |  Fixed-point exception register.          |
0087 +----------+----------+-------------------------------------------+
0088 |  CR0-1   |    Y     |  Condition register fields.               |
0089 +----------+----------+-------------------------------------------+
0090 |  CR2-4   |    N     |  Condition register fields.               |
0091 +----------+----------+-------------------------------------------+
0092 |  CR5-7   |    Y     |  Condition register fields.               |
0093 +----------+----------+-------------------------------------------+
0094 |  Others  |    N     |                                           |
0095 +----------+----------+-------------------------------------------+
0096 
0097 DRC & DRC Indexes
0098 =================
0099 ::
0100 
0101      DR1                                  Guest
0102      +--+        +------------+         +---------+
0103      |  | <----> |            |         |  User   |
0104      +--+  DRC1  |            |   DRC   |  Space  |
0105                  |    PAPR    |  Index  +---------+
0106      DR2         | Hypervisor |         |         |
0107      +--+        |            | <-----> |  Kernel |
0108      |  | <----> |            |  Hcall  |         |
0109      +--+  DRC2  +------------+         +---------+
0110 
0111 PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
0112 available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
0113 an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
0114 to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
0115 called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
0116 where its present as an attribute in the device tree node associated with the
0117 DR.
0118 
0119 HCALL Return-values
0120 ===================
0121 
0122 After servicing the hcall, hypervisor sets the return-value in *r3* indicating
0123 success or failure of the hcall. In case of a failure an error code indicates
0124 the cause for error. These codes are defined and documented in arch specific
0125 header [4]_.
0126 
0127 In some cases a hcall can potentially take a long time and need to be issued
0128 multiple times in order to be completely serviced. These hcalls will usually
0129 accept an opaque value *continue-token* within there argument list and a
0130 return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
0131 servicing the hcall yet.
0132 
0133 To make such hcalls the guest need to set *continue-token == 0* for the
0134 initial call and use the hypervisor returned value of *continue-token*
0135 for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
0136 return value.
0137 
0138 HCALL Op-codes
0139 ==============
0140 
0141 Below is a partial list of HCALLs that are supported by PHYP. For the
0142 corresponding opcode values please look into the arch specific header [4]_:
0143 
0144 **H_SCM_READ_METADATA**
0145 
0146 | Input: *drcIndex, offset, buffer-address, numBytesToRead*
0147 | Out: *numBytesRead*
0148 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
0149 
0150 Given a DRC Index of an NVDIMM, read N-bytes from the metadata area
0151 associated with it, at a specified offset and copy it to provided buffer.
0152 The metadata area stores configuration information such as label information,
0153 bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
0154 area hence a separate access semantics is provided.
0155 
0156 **H_SCM_WRITE_METADATA**
0157 
0158 | Input: *drcIndex, offset, data, numBytesToWrite*
0159 | Out: *None*
0160 | Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
0161 
0162 Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
0163 associated with it, at the specified offset and from the provided buffer.
0164 
0165 **H_SCM_BIND_MEM**
0166 
0167 | Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
0168 | *targetLogicalMemoryAddress, continue-token*
0169 | Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
0170 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
0171 | *H_Too_Big, H_P5, H_Busy*
0172 
0173 Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
0174 *(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
0175 at *targetLogicalMemoryAddress* within guest physical address space. In
0176 case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
0177 assigns a target address to the guest. The HCALL can fail if the Guest has
0178 an active PTE entry to the SCM block being bound.
0179 
0180 **H_SCM_UNBIND_MEM**
0181 | Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
0182 | Out: numScmBlocksUnbound
0183 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
0184 | *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
0185 
0186 Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
0187 at *startingScmLogicalMemoryAddress* from guest physical address space. The
0188 HCALL can fail if the Guest has an active PTE entry to the SCM block being
0189 unbound.
0190 
0191 **H_SCM_QUERY_BLOCK_MEM_BINDING**
0192 
0193 | Input: *drcIndex, scmBlockIndex*
0194 | Out: *Guest-Physical-Address*
0195 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
0196 
0197 Given a DRC-Index and an SCM Block index return the guest physical address to
0198 which the SCM block is mapped to.
0199 
0200 **H_SCM_QUERY_LOGICAL_MEM_BINDING**
0201 
0202 | Input: *Guest-Physical-Address*
0203 | Out: *drcIndex, scmBlockIndex*
0204 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
0205 
0206 Given a guest physical address return which DRC Index and SCM block is mapped
0207 to that address.
0208 
0209 **H_SCM_UNBIND_ALL**
0210 
0211 | Input: *scmTargetScope, drcIndex*
0212 | Out: *None*
0213 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
0214 | *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
0215 
0216 Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
0217 or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
0218 from the LPAR memory.
0219 
0220 **H_SCM_HEALTH**
0221 
0222 | Input: drcIndex
0223 | Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
0224 | Return Value: *H_Success, H_Parameter, H_Hardware*
0225 
0226 Given a DRC Index return the info on predictive failure and overall health of
0227 the PMEM device. The asserted bits in the health-bitmap indicate one or more states
0228 (described in table below) of the PMEM device and health-bit-valid-bitmap indicate
0229 which bits in health-bitmap are valid. The bits are reported in
0230 reverse bit ordering for example a value of 0xC400000000000000
0231 indicates bits 0, 1, and 5 are valid.
0232 
0233 Health Bitmap Flags:
0234 
0235 +------+-----------------------------------------------------------------------+
0236 |  Bit |               Definition                                              |
0237 +======+=======================================================================+
0238 |  00  | PMEM device is unable to persist memory contents.                     |
0239 |      | If the system is powered down, nothing will be saved.                 |
0240 +------+-----------------------------------------------------------------------+
0241 |  01  | PMEM device failed to persist memory contents. Either contents were   |
0242 |      | not saved successfully on power down or were not restored properly on |
0243 |      | power up.                                                             |
0244 +------+-----------------------------------------------------------------------+
0245 |  02  | PMEM device contents are persisted from previous IPL. The data from   |
0246 |      | the last boot were successfully restored.                             |
0247 +------+-----------------------------------------------------------------------+
0248 |  03  | PMEM device contents are not persisted from previous IPL. There was no|
0249 |      | data to restore from the last boot.                                   |
0250 +------+-----------------------------------------------------------------------+
0251 |  04  | PMEM device memory life remaining is critically low                   |
0252 +------+-----------------------------------------------------------------------+
0253 |  05  | PMEM device will be garded off next IPL due to failure                |
0254 +------+-----------------------------------------------------------------------+
0255 |  06  | PMEM device contents cannot persist due to current platform health    |
0256 |      | status. A hardware failure may prevent data from being saved or       |
0257 |      | restored.                                                             |
0258 +------+-----------------------------------------------------------------------+
0259 |  07  | PMEM device is unable to persist memory contents in certain conditions|
0260 +------+-----------------------------------------------------------------------+
0261 |  08  | PMEM device is encrypted                                              |
0262 +------+-----------------------------------------------------------------------+
0263 |  09  | PMEM device has successfully completed a requested erase or secure    |
0264 |      | erase procedure.                                                      |
0265 +------+-----------------------------------------------------------------------+
0266 |10:63 | Reserved / Unused                                                     |
0267 +------+-----------------------------------------------------------------------+
0268 
0269 **H_SCM_PERFORMANCE_STATS**
0270 
0271 | Input: drcIndex, resultBuffer Addr
0272 | Out: None
0273 | Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
0274 
0275 Given a DRC Index collect the performance statistics for NVDIMM and copy them
0276 to the resultBuffer.
0277 
0278 **H_SCM_FLUSH**
0279 
0280 | Input: *drcIndex, continue-token*
0281 | Out: *continue-token*
0282 | Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
0283 
0284 Given a DRC Index Flush the data to backend NVDIMM device.
0285 
0286 The hcall returns H_BUSY when the flush takes longer time and the hcall needs
0287 to be issued multiple times in order to be completely serviced. The
0288 *continue-token* from the output to be passed in the argument list of
0289 subsequent hcalls to the hypervisor until the hcall is completely serviced
0290 at which point H_SUCCESS or other error is returned by the hypervisor.
0291 
0292 References
0293 ==========
0294 .. [1] "Power Architecture Platform Reference"
0295        https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
0296 .. [2] "Linux on Power Architecture Platform Reference"
0297        https://members.openpowerfoundation.org/document/dl/469
0298 .. [3] "Definitions and Notation" Book III-Section 14.5.3
0299        https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
0300 .. [4] arch/powerpc/include/asm/hvcall.h
0301 .. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
0302        https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture