Documentation/driver-api/vfio.rst

0001 ==================================
0002 VFIO - "Virtual Function I/O" [1]_
0003 ==================================
0004
0005 Many modern systems now provide DMA and interrupt remapping facilities
0006 to help ensure I/O devices behave within the boundaries they've been
0007 allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
0008 POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
0009 systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
0010 agnostic framework for exposing direct device access to userspace, in
0011 a secure, IOMMU protected environment.  In other words, this allows
0012 safe [2]_, non-privileged, userspace drivers.
0013
0014 Why do we want that?  Virtual machines often make use of direct device
0015 access ("device assignment") when configured for the highest possible
0016 I/O performance.  From a device and host perspective, this simply
0017 turns the VM into a userspace driver, with the benefits of
0018 significantly reduced latency, higher bandwidth, and direct use of
0019 bare-metal device drivers [3]_.
0020
0021 Some applications, particularly in the high performance computing
0022 field, also benefit from low-overhead, direct device access from
0023 userspace.  Examples include network adapters (often non-TCP/IP based)
0024 and compute accelerators.  Prior to VFIO, these drivers had to either
0025 go through the full development cycle to become proper upstream
0026 driver, be maintained out of tree, or make use of the UIO framework,
0027 which has no notion of IOMMU protection, limited interrupt support,
0028 and requires root privileges to access things like PCI configuration
0029 space.
0030
0031 The VFIO driver framework intends to unify these, replacing both the
0032 KVM PCI specific device assignment code as well as provide a more
0033 secure, more featureful userspace driver environment than UIO.
0034
0035 Groups, Devices, and IOMMUs
0036 ---------------------------
0037
0038 Devices are the main target of any I/O driver.  Devices typically
0039 create a programming interface made up of I/O access, interrupts,
0040 and DMA.  Without going into the details of each of these, DMA is
0041 by far the most critical aspect for maintaining a secure environment
0042 as allowing a device read-write access to system memory imposes the
0043 greatest risk to the overall system integrity.
0044
0045 To help mitigate this risk, many modern IOMMUs now incorporate
0046 isolation properties into what was, in many cases, an interface only
0047 meant for translation (ie. solving the addressing problems of devices
0048 with limited address spaces).  With this, devices can now be isolated
0049 from each other and from arbitrary memory access, thus allowing
0050 things like secure direct assignment of devices into virtual machines.
0051
0052 This isolation is not always at the granularity of a single device
0053 though.  Even when an IOMMU is capable of this, properties of devices,
0054 interconnects, and IOMMU topologies can each reduce this isolation.
0055 For instance, an individual device may be part of a larger multi-
0056 function enclosure.  While the IOMMU may be able to distinguish
0057 between devices within the enclosure, the enclosure may not require
0058 transactions between devices to reach the IOMMU.  Examples of this
0059 could be anything from a multi-function PCI device with backdoors
0060 between functions to a non-PCI-ACS (Access Control Services) capable
0061 bridge allowing redirection without reaching the IOMMU.  Topology
0062 can also play a factor in terms of hiding devices.  A PCIe-to-PCI
0063 bridge masks the devices behind it, making transaction appear as if
0064 from the bridge itself.  Obviously IOMMU design plays a major factor
0065 as well.
0066
0067 Therefore, while for the most part an IOMMU may have device level
0068 granularity, any system is susceptible to reduced granularity.  The
0069 IOMMU API therefore supports a notion of IOMMU groups.  A group is
0070 a set of devices which is isolatable from all other devices in the
0071 system.  Groups are therefore the unit of ownership used by VFIO.
0072
0073 While the group is the minimum granularity that must be used to
0074 ensure secure user access, it's not necessarily the preferred
0075 granularity.  In IOMMUs which make use of page tables, it may be
0076 possible to share a set of page tables between different groups,
0077 reducing the overhead both to the platform (reduced TLB thrashing,
0078 reduced duplicate page tables), and to the user (programming only
0079 a single set of translations).  For this reason, VFIO makes use of
0080 a container class, which may hold one or more groups.  A container
0081 is created by simply opening the /dev/vfio/vfio character device.
0082
0083 On its own, the container provides little functionality, with all
0084 but a couple version and extension query interfaces locked away.
0085 The user needs to add a group into the container for the next level
0086 of functionality.  To do this, the user first needs to identify the
0087 group associated with the desired device.  This can be done using
0088 the sysfs links described in the example below.  By unbinding the
0089 device from the host driver and binding it to a VFIO driver, a new
0090 VFIO group will appear for the group as /dev/vfio/$GROUP, where
0091 $GROUP is the IOMMU group number of which the device is a member.
0092 If the IOMMU group contains multiple devices, each will need to
0093 be bound to a VFIO driver before operations on the VFIO group
0094 are allowed (it's also sufficient to only unbind the device from
0095 host drivers if a VFIO driver is unavailable; this will make the
0096 group available, but not that particular device).  TBD - interface
0097 for disabling driver probing/locking a device.
0098
0099 Once the group is ready, it may be added to the container by opening
0100 the VFIO group character device (/dev/vfio/$GROUP) and using the
0101 VFIO_GROUP_SET_CONTAINER ioctl, passing the file descriptor of the
0102 previously opened container file.  If desired and if the IOMMU driver
0103 supports sharing the IOMMU context between groups, multiple groups may
0104 be set to the same container.  If a group fails to set to a container
0105 with existing groups, a new empty container will need to be used
0106 instead.
0107
0108 With a group (or groups) attached to a container, the remaining
0109 ioctls become available, enabling access to the VFIO IOMMU interfaces.
0110 Additionally, it now becomes possible to get file descriptors for each
0111 device within a group using an ioctl on the VFIO group file descriptor.
0112
0113 The VFIO device API includes ioctls for describing the device, the I/O
0114 regions and their read/write/mmap offsets on the device descriptor, as
0115 well as mechanisms for describing and registering interrupt
0116 notifications.
0117
0118 VFIO Usage Example
0119 ------------------
0120
0121 Assume user wants to access PCI device 0000:06:0d.0::
0122
0123         $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
0124         ../../../../kernel/iommu_groups/26
0125
0126 This device is therefore in IOMMU group 26.  This device is on the
0127 pci bus, therefore the user will make use of vfio-pci to manage the
0128 group::
0129
0130         # modprobe vfio-pci
0131
0132 Binding this device to the vfio-pci driver creates the VFIO group
0133 character devices for this group::
0134
0135         $ lspci -n -s 0000:06:0d.0
0136         06:0d.0 0401: 1102:0002 (rev 08)
0137         # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
0138         # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
0139
0140 Now we need to look at what other devices are in the group to free
0141 it for use by VFIO::
0142
0143         $ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices
0144         total 0
0145         lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 ->
0146                 ../../../../devices/pci0000:00/0000:00:1e.0
0147         lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 ->
0148                 ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
0149         lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 ->
0150                 ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
0151
0152 This device is behind a PCIe-to-PCI bridge [4]_, therefore we also
0153 need to add device 0000:06:0d.1 to the group following the same
0154 procedure as above.  Device 0000:00:1e.0 is a bridge that does
0155 not currently have a host driver, therefore it's not required to
0156 bind this device to the vfio-pci driver (vfio-pci does not currently
0157 support PCI bridges).
0158
0159 The final step is to provide the user with access to the group if
0160 unprivileged operation is desired (note that /dev/vfio/vfio provides
0161 no capabilities on its own and is therefore expected to be set to
0162 mode 0666 by the system)::
0163
0164         # chown user:user /dev/vfio/26
0165
0166 The user now has full access to all the devices and the iommu for this
0167 group and can access them as follows::
0168
0169         int container, group, device, i;
0170         struct vfio_group_status group_status =
0171                                         { .argsz = sizeof(group_status) };
0172         struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
0173         struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
0174         struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
0175
0176         /* Create a new container */
0177         container = open("/dev/vfio/vfio", O_RDWR);
0178
0179         if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
0180                 /* Unknown API version */
0181
0182         if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
0183                 /* Doesn't support the IOMMU driver we want. */
0184
0185         /* Open the group */
0186         group = open("/dev/vfio/26", O_RDWR);
0187
0188         /* Test the group is viable and available */
0189         ioctl(group, VFIO_GROUP_GET_STATUS, &group_status);
0190
0191         if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE))
0192                 /* Group is not viable (ie, not all devices bound for vfio) */
0193
0194         /* Add the group to the container */
0195         ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
0196
0197         /* Enable the IOMMU model we want */
0198         ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
0199
0200         /* Get addition IOMMU info */
0201         ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
0202
0203         /* Allocate some space and setup a DMA mapping */
0204         dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
0205                              MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
0206         dma_map.size = 1024 * 1024;
0207         dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
0208         dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
0209
0210         ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
0211
0212         /* Get a file descriptor for the device */
0213         device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
0214
0215         /* Test and setup the device */
0216         ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
0217
0218         for (i = 0; i < device_info.num_regions; i++) {
0219                 struct vfio_region_info reg = { .argsz = sizeof(reg) };
0220
0221                 reg.index = i;
0222
0223                 ioctl(device, VFIO_DEVICE_GET_REGION_INFO, &reg);
0224
0225                 /* Setup mappings... read/write offsets, mmaps
0226                  * For PCI devices, config space is a region */
0227         }
0228
0229         for (i = 0; i < device_info.num_irqs; i++) {
0230                 struct vfio_irq_info irq = { .argsz = sizeof(irq) };
0231
0232                 irq.index = i;
0233
0234                 ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
0235
0236                 /* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */
0237         }
0238
0239         /* Gratuitous device reset and go... */
0240         ioctl(device, VFIO_DEVICE_RESET);
0241
0242 VFIO User API
0243 -------------------------------------------------------------------------------
0244
0245 Please see include/linux/vfio.h for complete API documentation.
0246
0247 VFIO bus driver API
0248 -------------------------------------------------------------------------------
0249
0250 VFIO bus drivers, such as vfio-pci make use of only a few interfaces
0251 into VFIO core.  When devices are bound and unbound to the driver,
0252 the driver should call vfio_register_group_dev() and
0253 vfio_unregister_group_dev() respectively::
0254
0255         void vfio_init_group_dev(struct vfio_device *device,
0256                                 struct device *dev,
0257                                 const struct vfio_device_ops *ops);
0258         void vfio_uninit_group_dev(struct vfio_device *device);
0259         int vfio_register_group_dev(struct vfio_device *device);
0260         void vfio_unregister_group_dev(struct vfio_device *device);
0261
0262 The driver should embed the vfio_device in its own structure and call
0263 vfio_init_group_dev() to pre-configure it before going to registration
0264 and call vfio_uninit_group_dev() after completing the un-registration.
0265 vfio_register_group_dev() indicates to the core to begin tracking the
0266 iommu_group of the specified dev and register the dev as owned by a VFIO bus
0267 driver. Once vfio_register_group_dev() returns it is possible for userspace to
0268 start accessing the driver, thus the driver should ensure it is completely
0269 ready before calling it. The driver provides an ops structure for callbacks
0270 similar to a file operations structure::
0271
0272         struct vfio_device_ops {
0273                 int     (*open)(struct vfio_device *vdev);
0274                 void    (*release)(struct vfio_device *vdev);
0275                 ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
0276                                 size_t count, loff_t *ppos);
0277                 ssize_t (*write)(struct vfio_device *vdev,
0278                                  const char __user *buf,
0279                                  size_t size, loff_t *ppos);
0280                 long    (*ioctl)(struct vfio_device *vdev, unsigned int cmd,
0281                                  unsigned long arg);
0282                 int     (*mmap)(struct vfio_device *vdev,
0283                                 struct vm_area_struct *vma);
0284         };
0285
0286 Each function is passed the vdev that was originally registered
0287 in the vfio_register_group_dev() call above.  This allows the bus driver
0288 to obtain its private data using container_of().  The open/release
0289 callbacks are issued when a new file descriptor is created for a
0290 device (via VFIO_GROUP_GET_DEVICE_FD).  The ioctl interface provides
0291 a direct pass through for VFIO_DEVICE_* ioctls.  The read/write/mmap
0292 interfaces implement the device region access defined by the device's
0293 own VFIO_DEVICE_GET_REGION_INFO ioctl.
0294
0295
0296 PPC64 sPAPR implementation note
0297 -------------------------------
0298
0299 This implementation has some specifics:
0300
0301 1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
0302    container is supported as an IOMMU table is allocated at the boot time,
0303    one table per a IOMMU group which is a Partitionable Endpoint (PE)
0304    (PE is often a PCI domain but not always).
0305
0306    Newer systems (POWER8 with IODA2) have improved hardware design which allows
0307    to remove this limitation and have multiple IOMMU groups per a VFIO
0308    container.
0309
0310 2) The hardware supports so called DMA windows - the PCI address range
0311    within which DMA transfer is allowed, any attempt to access address space
0312    out of the window leads to the whole PE isolation.
0313
0314 3) PPC64 guests are paravirtualized but not fully emulated. There is an API
0315    to map/unmap pages for DMA, and it normally maps 1..32 pages per call and
0316    currently there is no way to reduce the number of calls. In order to make
0317    things faster, the map/unmap handling has been implemented in real mode
0318    which provides an excellent performance which has limitations such as
0319    inability to do locked pages accounting in real time.
0320
0321 4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
0322    subtree that can be treated as a unit for the purposes of partitioning and
0323    error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
0324    function of a multi-function IOA, or multiple IOAs (possibly including
0325    switch and bridge structures above the multiple IOAs). PPC64 guests detect
0326    PCI errors and recover from them via EEH RTAS services, which works on the
0327    basis of additional ioctl commands.
0328
0329    So 4 additional ioctls have been added:
0330
0331         VFIO_IOMMU_SPAPR_TCE_GET_INFO
0332                 returns the size and the start of the DMA window on the PCI bus.
0333
0334         VFIO_IOMMU_ENABLE
0335                 enables the container. The locked pages accounting
0336                 is done at this point. This lets user first to know what
0337                 the DMA window is and adjust rlimit before doing any real job.
0338
0339         VFIO_IOMMU_DISABLE
0340                 disables the container.
0341
0342         VFIO_EEH_PE_OP
0343                 provides an API for EEH setup, error detection and recovery.
0344
0345    The code flow from the example above should be slightly changed::
0346
0347         struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 };
0348
0349         .....
0350         /* Add the group to the container */
0351         ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
0352
0353         /* Enable the IOMMU model we want */
0354         ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU)
0355
0356         /* Get addition sPAPR IOMMU info */
0357         vfio_iommu_spapr_tce_info spapr_iommu_info;
0358         ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info);
0359
0360         if (ioctl(container, VFIO_IOMMU_ENABLE))
0361                 /* Cannot enable container, may be low rlimit */
0362
0363         /* Allocate some space and setup a DMA mapping */
0364         dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
0365                              MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
0366
0367         dma_map.size = 1024 * 1024;
0368         dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
0369         dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
0370
0371         /* Check here is .iova/.size are within DMA window from spapr_iommu_info */
0372         ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
0373
0374         /* Get a file descriptor for the device */
0375         device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
0376
0377         ....
0378
0379         /* Gratuitous device reset and go... */
0380         ioctl(device, VFIO_DEVICE_RESET);
0381
0382         /* Make sure EEH is supported */
0383         ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
0384
0385         /* Enable the EEH functionality on the device */
0386         pe_op.op = VFIO_EEH_PE_ENABLE;
0387         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0388
0389         /* You're suggested to create additional data struct to represent
0390          * PE, and put child devices belonging to same IOMMU group to the
0391          * PE instance for later reference.
0392          */
0393
0394         /* Check the PE's state and make sure it's in functional state */
0395         pe_op.op = VFIO_EEH_PE_GET_STATE;
0396         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0397
0398         /* Save device state using pci_save_state().
0399          * EEH should be enabled on the specified device.
0400          */
0401
0402         ....
0403
0404         /* Inject EEH error, which is expected to be caused by 32-bits
0405          * config load.
0406          */
0407         pe_op.op = VFIO_EEH_PE_INJECT_ERR;
0408         pe_op.err.type = EEH_ERR_TYPE_32;
0409         pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR;
0410         pe_op.err.addr = 0ul;
0411         pe_op.err.mask = 0ul;
0412         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0413
0414         ....
0415
0416         /* When 0xFF's returned from reading PCI config space or IO BARs
0417          * of the PCI device. Check the PE's state to see if that has been
0418          * frozen.
0419          */
0420         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0421
0422         /* Waiting for pending PCI transactions to be completed and don't
0423          * produce any more PCI traffic from/to the affected PE until
0424          * recovery is finished.
0425          */
0426
0427         /* Enable IO for the affected PE and collect logs. Usually, the
0428          * standard part of PCI config space, AER registers are dumped
0429          * as logs for further analysis.
0430          */
0431         pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
0432         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0433
0434         /*
0435          * Issue PE reset: hot or fundamental reset. Usually, hot reset
0436          * is enough. However, the firmware of some PCI adapters would
0437          * require fundamental reset.
0438          */
0439         pe_op.op = VFIO_EEH_PE_RESET_HOT;
0440         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0441         pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
0442         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0443
0444         /* Configure the PCI bridges for the affected PE */
0445         pe_op.op = VFIO_EEH_PE_CONFIGURE;
0446         ioctl(container, VFIO_EEH_PE_OP, &pe_op);
0447
0448         /* Restored state we saved at initialization time. pci_restore_state()
0449          * is good enough as an example.
0450          */
0451
0452         /* Hopefully, error is recovered successfully. Now, you can resume to
0453          * start PCI traffic to/from the affected PE.
0454          */
0455
0456         ....
0457
0458 5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
0459    VFIO_IOMMU_DISABLE and implements 2 new ioctls:
0460    VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
0461    (which are unsupported in v1 IOMMU).
0462
0463    PPC64 paravirtualized guests generate a lot of map/unmap requests,
0464    and the handling of those includes pinning/unpinning pages and updating
0465    mm::locked_vm counter to make sure we do not exceed the rlimit.
0466    The v2 IOMMU splits accounting and pinning into separate operations:
0467
0468    - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
0469      receive a user space address and size of the block to be pinned.
0470      Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
0471      be called with the exact address and size used for registering
0472      the memory block. The userspace is not expected to call these often.
0473      The ranges are stored in a linked list in a VFIO container.
0474
0475    - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
0476      IOMMU table and do not do pinning; instead these check that the userspace
0477      address is from pre-registered range.
0478
0479    This separation helps in optimizing DMA for guests.
0480
0481 6) sPAPR specification allows guests to have an additional DMA window(s) on
0482    a PCI bus with a variable page size. Two ioctls have been added to support
0483    this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
0484    The platform has to support the functionality or error will be returned to
0485    the userspace. The existing hardware supports up to 2 DMA windows, one is
0486    2GB long, uses 4K pages and called "default 32bit window"; the other can
0487    be as big as entire RAM, use different page size, it is optional - guests
0488    create those in run-time if the guest driver supports 64bit DMA.
0489
0490    VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and
0491    a number of TCE table levels (if a TCE table is going to be big enough and
0492    the kernel may not be able to allocate enough of physically contiguous
0493    memory). It creates a new window in the available slot and returns the bus
0494    address where the new window starts. Due to hardware limitation, the user
0495    space cannot choose the location of DMA windows.
0496
0497    VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window
0498    and removes it.
0499
0500 -------------------------------------------------------------------------------
0501
0502 .. [1] VFIO was originally an acronym for "Virtual Function I/O" in its
0503    initial implementation by Tom Lyon while as Cisco.  We've since
0504    outgrown the acronym, but it's catchy.
0505
0506 .. [2] "safe" also depends upon a device being "well behaved".  It's
0507    possible for multi-function devices to have backdoors between
0508    functions and even for single function devices to have alternative
0509    access to things like PCI config space through MMIO registers.  To
0510    guard against the former we can include additional precautions in the
0511    IOMMU driver to group multi-function PCI devices together
0512    (iommu=group_mf).  The latter we can't prevent, but the IOMMU should
0513    still provide isolation.  For PCI, SR-IOV Virtual Functions are the
0514    best indicator of "well behaved", as these are designed for
0515    virtualization usage models.
0516
0517 .. [3] As always there are trade-offs to virtual machine device
0518    assignment that are beyond the scope of VFIO.  It's expected that
0519    future IOMMU technologies will reduce some, but maybe not all, of
0520    these trade-offs.
0521
0522 .. [4] In this case the device is below a PCI bridge, so transactions
0523    from either function of the device are indistinguishable to the iommu::
0524
0525         -[0000:00]-+-1e.0-[06]--+-0d.0
0526                                 \-0d.1
0527
0528         00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)