0001 ===============================
0002 Adjunct Processor (AP) facility
0003 ===============================
0004
0005
0006 Introduction
0007 ============
0008 The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
0009 of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
0010 The AP devices provide cryptographic functions to all CPUs assigned to a
0011 linux system running in an IBM Z system LPAR.
0012
0013 The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
0014 is to make AP cards available to KVM guests using the VFIO mediated device
0015 framework. This implementation relies considerably on the s390 virtualization
0016 facilities which do most of the hard work of providing direct access to AP
0017 devices.
0018
0019 AP Architectural Overview
0020 =========================
0021 To facilitate the comprehension of the design, let's start with some
0022 definitions:
0023
0024 * AP adapter
0025
0026 An AP adapter is an IBM Z adapter card that can perform cryptographic
0027 functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
0028 assigned to the LPAR in which a linux host is running will be available to
0029 the linux host. Each adapter is identified by a number from 0 to 255; however,
0030 the maximum adapter number is determined by machine model and/or adapter type.
0031 When installed, an AP adapter is accessed by AP instructions executed by any
0032 CPU.
0033
0034 The AP adapter cards are assigned to a given LPAR via the system's Activation
0035 Profile which can be edited via the HMC. When the linux host system is IPL'd
0036 in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and
0037 creates a sysfs device for each assigned adapter. For example, if AP adapters
0038 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following
0039 sysfs device entries::
0040
0041 /sys/devices/ap/card04
0042 /sys/devices/ap/card0a
0043
0044 Symbolic links to these devices will also be created in the AP bus devices
0045 sub-directory::
0046
0047 /sys/bus/ap/devices/[card04]
0048 /sys/bus/ap/devices/[card04]
0049
0050 * AP domain
0051
0052 An adapter is partitioned into domains. An adapter can hold up to 256 domains
0053 depending upon the adapter type and hardware configuration. A domain is
0054 identified by a number from 0 to 255; however, the maximum domain number is
0055 determined by machine model and/or adapter type.. A domain can be thought of
0056 as a set of hardware registers and memory used for processing AP commands. A
0057 domain can be configured with a secure private key used for clear key
0058 encryption. A domain is classified in one of two ways depending upon how it
0059 may be accessed:
0060
0061 * Usage domains are domains that are targeted by an AP instruction to
0062 process an AP command.
0063
0064 * Control domains are domains that are changed by an AP command sent to a
0065 usage domain; for example, to set the secure private key for the control
0066 domain.
0067
0068 The AP usage and control domains are assigned to a given LPAR via the system's
0069 Activation Profile which can be edited via the HMC. When a linux host system
0070 is IPL'd in the LPAR, the AP bus module detects the AP usage and control
0071 domains assigned to the LPAR. The domain number of each usage domain and
0072 adapter number of each AP adapter are combined to create AP queue devices
0073 (see AP Queue section below). The domain number of each control domain will be
0074 represented in a bitmask and stored in a sysfs file
0075 /sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least
0076 significant bit, correspond to domains 0-255.
0077
0078 * AP Queue
0079
0080 An AP queue is the means by which an AP command is sent to a usage domain
0081 inside a specific adapter. An AP queue is identified by a tuple
0082 comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
0083 APQI corresponds to a given usage domain number within the adapter. This tuple
0084 forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
0085 instructions include a field containing the APQN to identify the AP queue to
0086 which the AP command is to be sent for processing.
0087
0088 The AP bus will create a sysfs device for each APQN that can be derived from
0089 the cross product of the AP adapter and usage domain numbers detected when the
0090 AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
0091 domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
0092 following sysfs entries::
0093
0094 /sys/devices/ap/card04/04.0006
0095 /sys/devices/ap/card04/04.0047
0096 /sys/devices/ap/card0a/0a.0006
0097 /sys/devices/ap/card0a/0a.0047
0098
0099 The following symbolic links to these devices will be created in the AP bus
0100 devices subdirectory::
0101
0102 /sys/bus/ap/devices/[04.0006]
0103 /sys/bus/ap/devices/[04.0047]
0104 /sys/bus/ap/devices/[0a.0006]
0105 /sys/bus/ap/devices/[0a.0047]
0106
0107 * AP Instructions:
0108
0109 There are three AP instructions:
0110
0111 * NQAP: to enqueue an AP command-request message to a queue
0112 * DQAP: to dequeue an AP command-reply message from a queue
0113 * PQAP: to administer the queues
0114
0115 AP instructions identify the domain that is targeted to process the AP
0116 command; this must be one of the usage domains. An AP command may modify a
0117 domain that is not one of the usage domains, but the modified domain
0118 must be one of the control domains.
0119
0120 AP and SIE
0121 ==========
0122 Let's now take a look at how AP instructions executed on a guest are interpreted
0123 by the hardware.
0124
0125 A satellite control block called the Crypto Control Block (CRYCB) is attached to
0126 our main hardware virtualization control block. The CRYCB contains an AP Control
0127 Block (APCB) that has three fields to identify the adapters, usage domains and
0128 control domains assigned to the KVM guest:
0129
0130 * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
0131 to the KVM guest. Each bit in the mask, from left to right, corresponds to
0132 an APID from 0-255. If a bit is set, the corresponding adapter is valid for
0133 use by the KVM guest.
0134
0135 * The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
0136 assigned to the KVM guest. Each bit in the mask, from left to right,
0137 corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the
0138 corresponding queue is valid for use by the KVM guest.
0139
0140 * The AP Domain Mask field is a bit mask that identifies the AP control domains
0141 assigned to the KVM guest. The ADM bit mask controls which domains can be
0142 changed by an AP command-request message sent to a usage domain from the
0143 guest. Each bit in the mask, from left to right, corresponds to a domain from
0144 0-255. If a bit is set, the corresponding domain can be modified by an AP
0145 command-request message sent to a usage domain.
0146
0147 If you recall from the description of an AP Queue, AP instructions include
0148 an APQN to identify the AP queue to which an AP command-request message is to be
0149 sent (NQAP and PQAP instructions), or from which a command-reply message is to
0150 be received (DQAP instruction). The validity of an APQN is defined by the matrix
0151 calculated from the APM and AQM; it is the Cartesian product of all assigned
0152 adapter numbers (APM) with all assigned queue indexes (AQM). For example, if
0153 adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs
0154 (1,5), (1,6), (2,5) and (2,6) will be valid for the guest.
0155
0156 The APQNs can provide secure key functionality - i.e., a private key is stored
0157 on the adapter card for each of its domains - so each APQN must be assigned to
0158 at most one guest or to the linux host::
0159
0160 Example 1: Valid configuration:
0161 ------------------------------
0162 Guest1: adapters 1,2 domains 5,6
0163 Guest2: adapter 1,2 domain 7
0164
0165 This is valid because both guests have a unique set of APQNs:
0166 Guest1 has APQNs (1,5), (1,6), (2,5), (2,6);
0167 Guest2 has APQNs (1,7), (2,7)
0168
0169 Example 2: Valid configuration:
0170 ------------------------------
0171 Guest1: adapters 1,2 domains 5,6
0172 Guest2: adapters 3,4 domains 5,6
0173
0174 This is also valid because both guests have a unique set of APQNs:
0175 Guest1 has APQNs (1,5), (1,6), (2,5), (2,6);
0176 Guest2 has APQNs (3,5), (3,6), (4,5), (4,6)
0177
0178 Example 3: Invalid configuration:
0179 --------------------------------
0180 Guest1: adapters 1,2 domains 5,6
0181 Guest2: adapter 1 domains 6,7
0182
0183 This is an invalid configuration because both guests have access to
0184 APQN (1,6).
0185
0186 The Design
0187 ==========
0188 The design introduces three new objects:
0189
0190 1. AP matrix device
0191 2. VFIO AP device driver (vfio_ap.ko)
0192 3. VFIO AP mediated pass-through device
0193
0194 The VFIO AP device driver
0195 -------------------------
0196 The VFIO AP (vfio_ap) device driver serves the following purposes:
0197
0198 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
0199
0200 2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
0201 device and creates the sysfs interfaces for assigning adapters, usage
0202 domains, and control domains comprising the matrix for a KVM guest.
0203
0204 3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB referenced
0205 by a KVM guest's SIE state description to grant the guest access to a matrix
0206 of AP devices
0207
0208 Reserve APQNs for exclusive use of KVM guests
0209 ---------------------------------------------
0210 The following block diagram illustrates the mechanism by which APQNs are
0211 reserved::
0212
0213 +------------------+
0214 7 remove | |
0215 +--------------------> cex4queue driver |
0216 | | |
0217 | +------------------+
0218 |
0219 |
0220 | +------------------+ +----------------+
0221 | 5 register driver | | 3 create | |
0222 | +----------------> Device core +----------> matrix device |
0223 | | | | | |
0224 | | +--------^---------+ +----------------+
0225 | | |
0226 | | +-------------------+
0227 | | +-----------------------------------+ |
0228 | | | 4 register AP driver | | 2 register device
0229 | | | | |
0230 +--------+---+-v---+ +--------+-------+-+
0231 | | | |
0232 | ap_bus +--------------------- > vfio_ap driver |
0233 | | 8 probe | |
0234 +--------^---------+ +--^--^------------+
0235 6 edit | | |
0236 apmask | +-----------------------------+ | 11 mdev create
0237 aqmask | | 1 modprobe |
0238 +--------+-----+---+ +----------------+-+ +----------------+
0239 | | | |10 create| mediated |
0240 | admin | | VFIO device core |---------> matrix |
0241 | + | | | device |
0242 +------+-+---------+ +--------^---------+ +--------^-------+
0243 | | | |
0244 | | 9 create vfio_ap-passthrough | |
0245 | +------------------------------+ |
0246 +-------------------------------------------------------------+
0247 12 assign adapter/domain/control domain
0248
0249 The process for reserving an AP queue for use by a KVM guest is:
0250
0251 1. The administrator loads the vfio_ap device driver
0252 2. The vfio-ap driver during its initialization will register a single 'matrix'
0253 device with the device core. This will serve as the parent device for
0254 all vfio_ap mediated devices used to configure an AP matrix for a guest.
0255 3. The /sys/devices/vfio_ap/matrix device is created by the device core
0256 4. The vfio_ap device driver will register with the AP bus for AP queue devices
0257 of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
0258 driver's probe and remove callback interfaces. Devices older than CEX4 queues
0259 are not supported to simplify the implementation by not needlessly
0260 complicating the design by supporting older devices that will go out of
0261 service in the relatively near future, and for which there are few older
0262 systems around on which to test.
0263 5. The AP bus registers the vfio_ap device driver with the device core
0264 6. The administrator edits the AP adapter and queue masks to reserve AP queues
0265 for use by the vfio_ap device driver.
0266 7. The AP bus removes the AP queues reserved for the vfio_ap driver from the
0267 default zcrypt cex4queue driver.
0268 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
0269 it.
0270 9. The administrator creates a passthrough type vfio_ap mediated device to be
0271 used by a guest
0272 10. The administrator assigns the adapters, usage domains and control domains
0273 to be exclusively used by a guest.
0274
0275 Set up the VFIO mediated device interfaces
0276 ------------------------------------------
0277 The VFIO AP device driver utilizes the common interfaces of the VFIO mediated
0278 device core driver to:
0279
0280 * Register an AP mediated bus driver to add a vfio_ap mediated device to and
0281 remove it from a VFIO group.
0282 * Create and destroy a vfio_ap mediated device
0283 * Add a vfio_ap mediated device to and remove it from the AP mediated bus driver
0284 * Add a vfio_ap mediated device to and remove it from an IOMMU group
0285
0286 The following high-level block diagram shows the main components and interfaces
0287 of the VFIO AP mediated device driver::
0288
0289 +-------------+
0290 | |
0291 | +---------+ | mdev_register_driver() +--------------+
0292 | | Mdev | +<-----------------------+ |
0293 | | bus | | | vfio_mdev.ko |
0294 | | driver | +----------------------->+ |<-> VFIO user
0295 | +---------+ | probe()/remove() +--------------+ APIs
0296 | |
0297 | MDEV CORE |
0298 | MODULE |
0299 | mdev.ko |
0300 | +---------+ | mdev_register_device() +--------------+
0301 | |Physical | +<-----------------------+ |
0302 | | device | | | vfio_ap.ko |<-> matrix
0303 | |interface| +----------------------->+ | device
0304 | +---------+ | callback +--------------+
0305 +-------------+
0306
0307 During initialization of the vfio_ap module, the matrix device is registered
0308 with an 'mdev_parent_ops' structure that provides the sysfs attribute
0309 structures, mdev functions and callback interfaces for managing the mediated
0310 matrix device.
0311
0312 * sysfs attribute structures:
0313
0314 supported_type_groups
0315 The VFIO mediated device framework supports creation of user-defined
0316 mediated device types. These mediated device types are specified
0317 via the 'supported_type_groups' structure when a device is registered
0318 with the mediated device framework. The registration process creates the
0319 sysfs structures for each mediated device type specified in the
0320 'mdev_supported_types' sub-directory of the device being registered. Along
0321 with the device type, the sysfs attributes of the mediated device type are
0322 provided.
0323
0324 The VFIO AP device driver will register one mediated device type for
0325 passthrough devices:
0326
0327 /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough
0328
0329 Only the read-only attributes required by the VFIO mdev framework will
0330 be provided::
0331
0332 ... name
0333 ... device_api
0334 ... available_instances
0335 ... device_api
0336
0337 Where:
0338
0339 * name:
0340 specifies the name of the mediated device type
0341 * device_api:
0342 the mediated device type's API
0343 * available_instances:
0344 the number of vfio_ap mediated passthrough devices
0345 that can be created
0346 * device_api:
0347 specifies the VFIO API
0348 mdev_attr_groups
0349 This attribute group identifies the user-defined sysfs attributes of the
0350 mediated device. When a device is registered with the VFIO mediated device
0351 framework, the sysfs attribute files identified in the 'mdev_attr_groups'
0352 structure will be created in the vfio_ap mediated device's directory. The
0353 sysfs attributes for a vfio_ap mediated device are:
0354
0355 assign_adapter / unassign_adapter:
0356 Write-only attributes for assigning/unassigning an AP adapter to/from the
0357 vfio_ap mediated device. To assign/unassign an adapter, the APID of the
0358 adapter is echoed into the respective attribute file.
0359 assign_domain / unassign_domain:
0360 Write-only attributes for assigning/unassigning an AP usage domain to/from
0361 the vfio_ap mediated device. To assign/unassign a domain, the domain
0362 number of the usage domain is echoed into the respective attribute
0363 file.
0364 matrix:
0365 A read-only file for displaying the APQNs derived from the Cartesian
0366 product of the adapter and domain numbers assigned to the vfio_ap mediated
0367 device.
0368 guest_matrix:
0369 A read-only file for displaying the APQNs derived from the Cartesian
0370 product of the adapter and domain numbers assigned to the APM and AQM
0371 fields respectively of the KVM guest's CRYCB. This may differ from the
0372 the APQNs assigned to the vfio_ap mediated device if any APQN does not
0373 reference a queue device bound to the vfio_ap device driver (i.e., the
0374 queue is not in the host's AP configuration).
0375 assign_control_domain / unassign_control_domain:
0376 Write-only attributes for assigning/unassigning an AP control domain
0377 to/from the vfio_ap mediated device. To assign/unassign a control domain,
0378 the ID of the domain to be assigned/unassigned is echoed into the
0379 respective attribute file.
0380 control_domains:
0381 A read-only file for displaying the control domain numbers assigned to the
0382 vfio_ap mediated device.
0383
0384 * functions:
0385
0386 create:
0387 allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
0388
0389 * Store the reference to the KVM structure for the guest using the mdev
0390 * Store the AP matrix configuration for the adapters, domains, and control
0391 domains assigned via the corresponding sysfs attributes files
0392 * Store the AP matrix configuration for the adapters, domains and control
0393 domains available to a guest. A guest may not be provided access to APQNs
0394 referencing queue devices that do not exist, or are not bound to the
0395 vfio_ap device driver.
0396
0397 remove:
0398 deallocates the vfio_ap mediated device's ap_matrix_mdev structure.
0399 This will be allowed only if a running guest is not using the mdev.
0400
0401 * callback interfaces
0402
0403 open_device:
0404 The vfio_ap driver uses this callback to register a
0405 VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the matrix mdev
0406 devices. The open_device callback is invoked by userspace to connect the
0407 VFIO iommu group for the matrix mdev device to the MDEV bus. Access to the
0408 KVM structure used to configure the KVM guest is provided via this callback.
0409 The KVM structure, is used to configure the guest's access to the AP matrix
0410 defined via the vfio_ap mediated device's sysfs attribute files.
0411
0412 close_device:
0413 unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
0414 matrix mdev device and deconfigures the guest's AP matrix.
0415
0416 ioctl:
0417 this callback handles the VFIO_DEVICE_GET_INFO and VFIO_DEVICE_RESET ioctls
0418 defined by the vfio framework.
0419
0420 Configure the guest's AP resources
0421 ----------------------------------
0422 Configuring the AP resources for a KVM guest will be performed when the
0423 VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
0424 function is called when userspace connects to KVM. The guest's AP resources are
0425 configured via it's APCB by:
0426
0427 * Setting the bits in the APM corresponding to the APIDs assigned to the
0428 vfio_ap mediated device via its 'assign_adapter' interface.
0429 * Setting the bits in the AQM corresponding to the domains assigned to the
0430 vfio_ap mediated device via its 'assign_domain' interface.
0431 * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
0432 vfio_ap mediated device via its 'assign_control_domains' interface.
0433
0434 The linux device model precludes passing a device through to a KVM guest that
0435 is not bound to the device driver facilitating its pass-through. Consequently,
0436 an APQN that does not reference a queue device bound to the vfio_ap device
0437 driver will not be assigned to a KVM guest's matrix. The AP architecture,
0438 however, does not provide a means to filter individual APQNs from the guest's
0439 matrix, so the adapters, domains and control domains assigned to vfio_ap
0440 mediated device via its sysfs 'assign_adapter', 'assign_domain' and
0441 'assign_control_domain' interfaces will be filtered before providing the AP
0442 configuration to a guest:
0443
0444 * The APIDs of the adapters, the APQIs of the domains and the domain numbers of
0445 the control domains assigned to the matrix mdev that are not also assigned to
0446 the host's AP configuration will be filtered.
0447
0448 * Each APQN derived from the Cartesian product of the APIDs and APQIs assigned
0449 to the vfio_ap mdev is examined and if any one of them does not reference a
0450 queue device bound to the vfio_ap device driver, the adapter will not be
0451 plugged into the guest (i.e., the bit corresponding to its APID will not be
0452 set in the APM of the guest's APCB).
0453
0454 The CPU model features for AP
0455 -----------------------------
0456 The AP stack relies on the presence of the AP instructions as well as three
0457 facilities: The AP Facilities Test (APFT) facility; the AP Query
0458 Configuration Information (QCI) facility; and the AP Queue Interruption Control
0459 facility. These features/facilities are made available to a KVM guest via the
0460 following CPU model features:
0461
0462 1. ap: Indicates whether the AP instructions are installed on the guest. This
0463 feature will be enabled by KVM only if the AP instructions are installed
0464 on the host.
0465
0466 2. apft: Indicates the APFT facility is available on the guest. This facility
0467 can be made available to the guest only if it is available on the host (i.e.,
0468 facility bit 15 is set).
0469
0470 3. apqci: Indicates the AP QCI facility is available on the guest. This facility
0471 can be made available to the guest only if it is available on the host (i.e.,
0472 facility bit 12 is set).
0473
0474 4. apqi: Indicates AP Queue Interruption Control faclity is available on the
0475 guest. This facility can be made available to the guest only if it is
0476 available on the host (i.e., facility bit 65 is set).
0477
0478 Note: If the user chooses to specify a CPU model different than the 'host'
0479 model to QEMU, the CPU model features and facilities need to be turned on
0480 explicitly; for example::
0481
0482 /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on,apqi=on
0483
0484 A guest can be precluded from using AP features/facilities by turning them off
0485 explicitly; for example::
0486
0487 /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
0488
0489 Note: If the APFT facility is turned off (apft=off) for the guest, the guest
0490 will not see any AP devices. The zcrypt device drivers on the guest that
0491 register for type 10 and newer AP devices - i.e., the cex4card and cex4queue
0492 device drivers - need the APFT facility to ascertain the facilities installed on
0493 a given AP device. If the APFT facility is not installed on the guest, then no
0494 adapter or domain devices will get created by the AP bus running on the
0495 guest because only type 10 and newer devices can be configured for guest use.
0496
0497 Example
0498 =======
0499 Let's now provide an example to illustrate how KVM guests may be given
0500 access to AP facilities. For this example, we will show how to configure
0501 three guests such that executing the lszcrypt command on the guests would
0502 look like this:
0503
0504 Guest1
0505 ------
0506 =========== ===== ============
0507 CARD.DOMAIN TYPE MODE
0508 =========== ===== ============
0509 05 CEX5C CCA-Coproc
0510 05.0004 CEX5C CCA-Coproc
0511 05.00ab CEX5C CCA-Coproc
0512 06 CEX5A Accelerator
0513 06.0004 CEX5A Accelerator
0514 06.00ab CEX5A Accelerator
0515 =========== ===== ============
0516
0517 Guest2
0518 ------
0519 =========== ===== ============
0520 CARD.DOMAIN TYPE MODE
0521 =========== ===== ============
0522 05 CEX5C CCA-Coproc
0523 05.0047 CEX5C CCA-Coproc
0524 05.00ff CEX5C CCA-Coproc
0525 =========== ===== ============
0526
0527 Guest3
0528 ------
0529 =========== ===== ============
0530 CARD.DOMAIN TYPE MODE
0531 =========== ===== ============
0532 06 CEX5A Accelerator
0533 06.0047 CEX5A Accelerator
0534 06.00ff CEX5A Accelerator
0535 =========== ===== ============
0536
0537 These are the steps:
0538
0539 1. Install the vfio_ap module on the linux host. The dependency chain for the
0540 vfio_ap module is:
0541 * iommu
0542 * s390
0543 * zcrypt
0544 * vfio
0545 * vfio_mdev
0546 * vfio_mdev_device
0547 * KVM
0548
0549 To build the vfio_ap module, the kernel build must be configured with the
0550 following Kconfig elements selected:
0551 * IOMMU_SUPPORT
0552 * S390
0553 * ZCRYPT
0554 * S390_AP_IOMMU
0555 * VFIO
0556 * VFIO_MDEV
0557 * KVM
0558
0559 If using make menuconfig select the following to build the vfio_ap module::
0560
0561 -> Device Drivers
0562 -> IOMMU Hardware Support
0563 select S390 AP IOMMU Support
0564 -> VFIO Non-Privileged userspace driver framework
0565 -> Mediated device driver frramework
0566 -> VFIO driver for Mediated devices
0567 -> I/O subsystem
0568 -> VFIO support for AP devices
0569
0570 2. Secure the AP queues to be used by the three guests so that the host can not
0571 access them. To secure them, there are two sysfs files that specify
0572 bitmasks marking a subset of the APQN range as usable only by the default AP
0573 queue device drivers. All remaining APQNs are available for use by
0574 any other device driver. The vfio_ap device driver is currently the only
0575 non-default device driver. The location of the sysfs files containing the
0576 masks are::
0577
0578 /sys/bus/ap/apmask
0579 /sys/bus/ap/aqmask
0580
0581 The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
0582 (APID). Each bit in the mask, from left to right, corresponds to an APID from
0583 0-255. If a bit is set, the APID belongs to the subset of APQNs marked as
0584 available only to the default AP queue device drivers.
0585
0586 The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
0587 (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
0588 0-255. If a bit is set, the APQI belongs to the subset of APQNs marked as
0589 available only to the default AP queue device drivers.
0590
0591 The Cartesian product of the APIDs corresponding to the bits set in the
0592 apmask and the APQIs corresponding to the bits set in the aqmask comprise
0593 the subset of APQNs that can be used only by the host default device drivers.
0594 All other APQNs are available to the non-default device drivers such as the
0595 vfio_ap driver.
0596
0597 Take, for example, the following masks::
0598
0599 apmask:
0600 0x7d00000000000000000000000000000000000000000000000000000000000000
0601
0602 aqmask:
0603 0x8000000000000000000000000000000000000000000000000000000000000000
0604
0605 The masks indicate:
0606
0607 * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
0608 device drivers.
0609
0610 * Domain 0 is available for use by the host default device drivers
0611
0612 * The subset of APQNs available for use only by the default host device
0613 drivers are:
0614
0615 (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
0616
0617 * All other APQNs are available for use by the non-default device drivers.
0618
0619 The APQN of each AP queue device assigned to the linux host is checked by the
0620 AP bus against the set of APQNs derived from the Cartesian product of APIDs
0621 and APQIs marked as available to the default AP queue device drivers. If a
0622 match is detected, only the default AP queue device drivers will be probed;
0623 otherwise, the vfio_ap device driver will be probed.
0624
0625 By default, the two masks are set to reserve all APQNs for use by the default
0626 AP queue device drivers. There are two ways the default masks can be changed:
0627
0628 1. The sysfs mask files can be edited by echoing a string into the
0629 respective sysfs mask file in one of two formats:
0630
0631 * An absolute hex string starting with 0x - like "0x12345678" - sets
0632 the mask. If the given string is shorter than the mask, it is padded
0633 with 0s on the right; for example, specifying a mask value of 0x41 is
0634 the same as specifying::
0635
0636 0x4100000000000000000000000000000000000000000000000000000000000000
0637
0638 Keep in mind that the mask reads from left to right, so the mask
0639 above identifies device numbers 1 and 7 (01000001).
0640
0641 If the string is longer than the mask, the operation is terminated with
0642 an error (EINVAL).
0643
0644 * Individual bits in the mask can be switched on and off by specifying
0645 each bit number to be switched in a comma separated list. Each bit
0646 number string must be prepended with a ('+') or minus ('-') to indicate
0647 the corresponding bit is to be switched on ('+') or off ('-'). Some
0648 valid values are:
0649
0650 - "+0" switches bit 0 on
0651 - "-13" switches bit 13 off
0652 - "+0x41" switches bit 65 on
0653 - "-0xff" switches bit 255 off
0654
0655 The following example:
0656
0657 +0,-6,+0x47,-0xf0
0658
0659 Switches bits 0 and 71 (0x47) on
0660
0661 Switches bits 6 and 240 (0xf0) off
0662
0663 Note that the bits not specified in the list remain as they were before
0664 the operation.
0665
0666 2. The masks can also be changed at boot time via parameters on the kernel
0667 command line like this:
0668
0669 ap.apmask=0xffff ap.aqmask=0x40
0670
0671 This would create the following masks::
0672
0673 apmask:
0674 0xffff000000000000000000000000000000000000000000000000000000000000
0675
0676 aqmask:
0677 0x4000000000000000000000000000000000000000000000000000000000000000
0678
0679 Resulting in these two pools::
0680
0681 default drivers pool: adapter 0-15, domain 1
0682 alternate drivers pool: adapter 16-255, domains 0, 2-255
0683
0684 **Note:**
0685 Changing a mask such that one or more APQNs will be taken from a vfio_ap
0686 mediated device (see below) will fail with an error (EBUSY). A message
0687 is logged to the kernel ring buffer which can be viewed with the 'dmesg'
0688 command. The output identifies each APQN flagged as 'in use' and identifies
0689 the vfio_ap mediated device to which it is assigned; for example:
0690
0691 Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
0692 Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
0693
0694 Securing the APQNs for our example
0695 ----------------------------------
0696 To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
0697 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
0698 APQNs can be removed from the default masks using either of the following
0699 commands::
0700
0701 echo -5,-6 > /sys/bus/ap/apmask
0702
0703 echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask
0704
0705 Or the masks can be set as follows::
0706
0707 echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \
0708 > apmask
0709
0710 echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \
0711 > aqmask
0712
0713 This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004,
0714 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The
0715 sysfs directory for the vfio_ap device driver will now contain symbolic links
0716 to the AP queue devices bound to it::
0717
0718 /sys/bus/ap
0719 ... [drivers]
0720 ...... [vfio_ap]
0721 ......... [05.0004]
0722 ......... [05.0047]
0723 ......... [05.00ab]
0724 ......... [05.00ff]
0725 ......... [06.0004]
0726 ......... [06.0047]
0727 ......... [06.00ab]
0728 ......... [06.00ff]
0729
0730 Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later)
0731 can be bound to the vfio_ap device driver. The reason for this is to
0732 simplify the implementation by not needlessly complicating the design by
0733 supporting older devices that will go out of service in the relatively near
0734 future and for which there are few older systems on which to test.
0735
0736 The administrator, therefore, must take care to secure only AP queues that
0737 can be bound to the vfio_ap device driver. The device type for a given AP
0738 queue device can be read from the parent card's sysfs directory. For example,
0739 to see the hardware type of the queue 05.0004:
0740
0741 cat /sys/bus/ap/devices/card05/hwtype
0742
0743 The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the
0744 vfio_ap device driver.
0745
0746 3. Create the mediated devices needed to configure the AP matrixes for the
0747 three guests and to provide an interface to the vfio_ap driver for
0748 use by the guests::
0749
0750 /sys/devices/vfio_ap/matrix/
0751 --- [mdev_supported_types]
0752 ------ [vfio_ap-passthrough] (passthrough vfio_ap mediated device type)
0753 --------- create
0754 --------- [devices]
0755
0756 To create the mediated devices for the three guests::
0757
0758 uuidgen > create
0759 uuidgen > create
0760 uuidgen > create
0761
0762 or
0763
0764 echo $uuid1 > create
0765 echo $uuid2 > create
0766 echo $uuid3 > create
0767
0768 This will create three mediated devices in the [devices] subdirectory named
0769 after the UUID written to the create attribute file. We call them $uuid1,
0770 $uuid2 and $uuid3 and this is the sysfs directory structure after creation::
0771
0772 /sys/devices/vfio_ap/matrix/
0773 --- [mdev_supported_types]
0774 ------ [vfio_ap-passthrough]
0775 --------- [devices]
0776 ------------ [$uuid1]
0777 --------------- assign_adapter
0778 --------------- assign_control_domain
0779 --------------- assign_domain
0780 --------------- matrix
0781 --------------- unassign_adapter
0782 --------------- unassign_control_domain
0783 --------------- unassign_domain
0784
0785 ------------ [$uuid2]
0786 --------------- assign_adapter
0787 --------------- assign_control_domain
0788 --------------- assign_domain
0789 --------------- matrix
0790 --------------- unassign_adapter
0791 ----------------unassign_control_domain
0792 ----------------unassign_domain
0793
0794 ------------ [$uuid3]
0795 --------------- assign_adapter
0796 --------------- assign_control_domain
0797 --------------- assign_domain
0798 --------------- matrix
0799 --------------- unassign_adapter
0800 ----------------unassign_control_domain
0801 ----------------unassign_domain
0802
0803 Note *****: The vfio_ap mdevs do not persist across reboots unless the
0804 mdevctl tool is used to create and persist them.
0805
0806 4. The administrator now needs to configure the matrixes for the mediated
0807 devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3).
0808
0809 This is how the matrix is configured for Guest1::
0810
0811 echo 5 > assign_adapter
0812 echo 6 > assign_adapter
0813 echo 4 > assign_domain
0814 echo 0xab > assign_domain
0815
0816 Control domains can similarly be assigned using the assign_control_domain
0817 sysfs file.
0818
0819 If a mistake is made configuring an adapter, domain or control domain,
0820 you can use the unassign_xxx files to unassign the adapter, domain or
0821 control domain.
0822
0823 To display the matrix configuration for Guest1::
0824
0825 cat matrix
0826
0827 To display the matrix that is or will be assigned to Guest1::
0828
0829 cat guest_matrix
0830
0831 This is how the matrix is configured for Guest2::
0832
0833 echo 5 > assign_adapter
0834 echo 0x47 > assign_domain
0835 echo 0xff > assign_domain
0836
0837 This is how the matrix is configured for Guest3::
0838
0839 echo 6 > assign_adapter
0840 echo 0x47 > assign_domain
0841 echo 0xff > assign_domain
0842
0843 In order to successfully assign an adapter:
0844
0845 * The adapter number specified must represent a value from 0 up to the
0846 maximum adapter number configured for the system. If an adapter number
0847 higher than the maximum is specified, the operation will terminate with
0848 an error (ENODEV).
0849
0850 Note: The maximum adapter number can be obtained via the sysfs
0851 /sys/bus/ap/ap_max_adapter_id attribute file.
0852
0853 * Each APQN derived from the Cartesian product of the APID of the adapter
0854 being assigned and the APQIs of the domains previously assigned:
0855
0856 - Must only be available to the vfio_ap device driver as specified in the
0857 sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
0858 one APQN is reserved for use by the host device driver, the operation
0859 will terminate with an error (EADDRNOTAVAIL).
0860
0861 - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
0862 is assigned to another vfio_ap mediated device, the operation will
0863 terminate with an error (EBUSY).
0864
0865 - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
0866 sys/bus/ap/aqmask attribute files are being edited or the operation may
0867 terminate with an error (EBUSY).
0868
0869 In order to successfully assign a domain:
0870
0871 * The domain number specified must represent a value from 0 up to the
0872 maximum domain number configured for the system. If a domain number
0873 higher than the maximum is specified, the operation will terminate with
0874 an error (ENODEV).
0875
0876 Note: The maximum domain number can be obtained via the sysfs
0877 /sys/bus/ap/ap_max_domain_id attribute file.
0878
0879 * Each APQN derived from the Cartesian product of the APQI of the domain
0880 being assigned and the APIDs of the adapters previously assigned:
0881
0882 - Must only be available to the vfio_ap device driver as specified in the
0883 sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
0884 one APQN is reserved for use by the host device driver, the operation
0885 will terminate with an error (EADDRNOTAVAIL).
0886
0887 - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
0888 is assigned to another vfio_ap mediated device, the operation will
0889 terminate with an error (EBUSY).
0890
0891 - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
0892 sys/bus/ap/aqmask attribute files are being edited or the operation may
0893 terminate with an error (EBUSY).
0894
0895 In order to successfully assign a control domain:
0896
0897 * The domain number specified must represent a value from 0 up to the maximum
0898 domain number configured for the system. If a control domain number higher
0899 than the maximum is specified, the operation will terminate with an
0900 error (ENODEV).
0901
0902 5. Start Guest1::
0903
0904 /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
0905 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
0906
0907 7. Start Guest2::
0908
0909 /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
0910 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
0911
0912 7. Start Guest3::
0913
0914 /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
0915 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
0916
0917 When the guest is shut down, the vfio_ap mediated devices may be removed.
0918
0919 Using our example again, to remove the vfio_ap mediated device $uuid1::
0920
0921 /sys/devices/vfio_ap/matrix/
0922 --- [mdev_supported_types]
0923 ------ [vfio_ap-passthrough]
0924 --------- [devices]
0925 ------------ [$uuid1]
0926 --------------- remove
0927
0928 ::
0929
0930 echo 1 > remove
0931
0932 This will remove all of the matrix mdev device's sysfs structures including
0933 the mdev device itself. To recreate and reconfigure the matrix mdev device,
0934 all of the steps starting with step 3 will have to be performed again. Note
0935 that the remove will fail if a guest using the vfio_ap mdev is still running.
0936
0937 It is not necessary to remove a vfio_ap mdev, but one may want to
0938 remove it if no guest will use it during the remaining lifetime of the linux
0939 host. If the vfio_ap mdev is removed, one may want to also reconfigure
0940 the pool of adapters and queues reserved for use by the default drivers.
0941
0942 Hot plug/unplug support:
0943 ========================
0944 An adapter, domain or control domain may be hot plugged into a running KVM
0945 guest by assigning it to the vfio_ap mediated device being used by the guest if
0946 the following conditions are met:
0947
0948 * The adapter, domain or control domain must also be assigned to the host's
0949 AP configuration.
0950
0951 * Each APQN derived from the Cartesian product comprised of the APID of the
0952 adapter being assigned and the APQIs of the domains assigned must reference a
0953 queue device bound to the vfio_ap device driver.
0954
0955 * To hot plug a domain, each APQN derived from the Cartesian product
0956 comprised of the APQI of the domain being assigned and the APIDs of the
0957 adapters assigned must reference a queue device bound to the vfio_ap device
0958 driver.
0959
0960 An adapter, domain or control domain may be hot unplugged from a running KVM
0961 guest by unassigning it from the vfio_ap mediated device being used by the
0962 guest.
0963
0964 Over-provisioning of AP queues for a KVM guest:
0965 ===============================================
0966 Over-provisioning is defined herein as the assignment of adapters or domains to
0967 a vfio_ap mediated device that do not reference AP devices in the host's AP
0968 configuration. The idea here is that when the adapter or domain becomes
0969 available, it will be automatically hot-plugged into the KVM guest using
0970 the vfio_ap mediated device to which it is assigned as long as each new APQN
0971 resulting from plugging it in references a queue device bound to the vfio_ap
0972 device driver.
0973
0974 Limitations
0975 ===========
0976 Live guest migration is not supported for guests using AP devices without
0977 intervention by a system administrator. Before a KVM guest can be migrated,
0978 the vfio_ap mediated device must be removed. Unfortunately, it can not be
0979 removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
0980 the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
0981 its mdev can be hot unplugged from the guest in one of two ways:
0982
0983 1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
0984 the following commands:
0985
0986 virsh detach-device <guestname> <path-to-device-xml>
0987
0988 For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
0989 the guest named 'my-guest':
0990
0991 virsh detach-device my-guest ~/config/my-guest-hostdev.xml
0992
0993 The contents of my-guest-hostdev.xml:
0994
0995 .. code-block:: xml
0996
0997 <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
0998 <source>
0999 <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
1000 </source>
1001 </hostdev>
1002
1003
1004 virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
1005
1006 For example, to hot unplug the vfio_ap mediated device identified on the
1007 qemu command line with 'id=hostdev0' from the guest named 'my-guest':
1008
1009 .. code-block:: sh
1010
1011 virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
1012
1013 2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor
1014 to the guest and using the following qemu monitor command:
1015
1016 (QEMU) device-del id=<device-id>
1017
1018 For example, to hot unplug the vfio_ap mediated device that was specified
1019 on the qemu command line with 'id=hostdev0' when the guest was started:
1020
1021 (QEMU) device-del id=hostdev0
1022
1023 After live migration of the KVM guest completes, an AP configuration can be
1024 restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
1025 system into the guest in one of two ways:
1026
1027 1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
1028 device into the guest via the following virsh commands:
1029
1030 virsh attach-device <guestname> <path-to-device-xml>
1031
1032 For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
1033 the guest named 'my-guest':
1034
1035 virsh attach-device my-guest ~/config/my-guest-hostdev.xml
1036
1037 The contents of my-guest-hostdev.xml:
1038
1039 .. code-block:: xml
1040
1041 <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
1042 <source>
1043 <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
1044 </source>
1045 </hostdev>
1046
1047
1048 virsh qemu-monitor-command <guest-name> --hmp \
1049 "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
1050
1051 For example, to hot plug the vfio_ap mediated device
1052 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
1053 device-id hostdev0:
1054
1055 virsh qemu-monitor-command my-guest --hmp \
1056 "device_add vfio-ap,\
1057 sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
1058 id=hostdev0"
1059
1060 2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor
1061 to the guest and using the following qemu monitor command:
1062
1063 (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
1064
1065 For example, to plug the vfio_ap mediated device
1066 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
1067 hostdev0:
1068
1069 (QEMU) device-add "vfio-ap,\
1070 sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
1071 id=hostdev0"