Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ============
0004 NET_FAILOVER
0005 ============
0006 
0007 Overview
0008 ========
0009 
0010 The net_failover driver provides an automated failover mechanism via APIs
0011 to create and destroy a failover master netdev and manages a primary and
0012 standby slave netdevs that get registered via the generic failover
0013 infrastructure.
0014 
0015 The failover netdev acts a master device and controls 2 slave devices. The
0016 original paravirtual interface is registered as 'standby' slave netdev and
0017 a passthru/vf device with the same MAC gets registered as 'primary' slave
0018 netdev. Both 'standby' and 'failover' netdevs are associated with the same
0019 'pci' device. The user accesses the network interface via 'failover' netdev.
0020 The 'failover' netdev chooses 'primary' netdev as default for transmits when
0021 it is available with link up and running.
0022 
0023 This can be used by paravirtual drivers to enable an alternate low latency
0024 datapath. It also enables hypervisor controlled live migration of a VM with
0025 direct attached VF by failing over to the paravirtual datapath when the VF
0026 is unplugged.
0027 
0028 virtio-net accelerated datapath: STANDBY mode
0029 =============================================
0030 
0031 net_failover enables hypervisor controlled accelerated datapath to virtio-net
0032 enabled VMs in a transparent manner with no/minimal guest userspace changes.
0033 
0034 To support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY
0035 feature on the virtio-net interface and assign the same MAC address to both
0036 virtio-net and VF interfaces.
0037 
0038 Here is an example libvirt XML snippet that shows such configuration:
0039 ::
0040 
0041   <interface type='network'>
0042     <mac address='52:54:00:00:12:53'/>
0043     <source network='enp66s0f0_br'/>
0044     <target dev='tap01'/>
0045     <model type='virtio'/>
0046     <driver name='vhost' queues='4'/>
0047     <link state='down'/>
0048     <teaming type='persistent'/>
0049     <alias name='ua-backup0'/>
0050   </interface>
0051   <interface type='hostdev' managed='yes'>
0052     <mac address='52:54:00:00:12:53'/>
0053     <source>
0054       <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
0055     </source>
0056     <teaming type='transient' persistent='ua-backup0'/>
0057   </interface>
0058 
0059 In this configuration, the first device definition is for the virtio-net
0060 interface and this acts as the 'persistent' device indicating that this
0061 interface will always be plugged in. This is specified by the 'teaming' tag with
0062 required attribute type having value 'persistent'. The link state for the
0063 virtio-net device is set to 'down' to ensure that the 'failover' netdev prefers
0064 the VF passthrough device for normal communication. The virtio-net device will
0065 be brought UP during live migration to allow uninterrupted communication.
0066 
0067 The second device definition is for the VF passthrough interface. Here the
0068 'teaming' tag is provided with type 'transient' indicating that this device may
0069 periodically be unplugged. A second attribute - 'persistent' is provided and
0070 points to the alias name declared for the virtio-net device.
0071 
0072 Booting a VM with the above configuration will result in the following 3
0073 interfaces created in the VM:
0074 ::
0075 
0076   4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
0077       link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
0078       inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10
0079          valid_lft 42482sec preferred_lft 42482sec
0080       inet6 fe80::97d8:db2:8c10:b6d6/64 scope link
0081          valid_lft forever preferred_lft forever
0082   5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000
0083       link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
0084   7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000
0085       link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
0086 
0087 Here, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby'
0088 virtio-net interface, and ens11 is the slave 'primary' VF passthrough interface.
0089 
0090 One point to note here is that some user space network configuration daemons
0091 like systemd-networkd, ifupdown, etc, do not understand the 'net_failover'
0092 device; and on the first boot, the VM might end up with both 'failover' device
0093 and VF accquiring IP addresses (either same or different) from the DHCP server.
0094 This will result in lack of connectivity to the VM. So some tweaks might be
0095 needed to these network configuration daemons to make sure that an IP is
0096 received only on the 'failover' device.
0097 
0098 Below is the patch snippet used with 'cloud-ifupdown-helper' script found on
0099 Debian cloud images:
0100 
0101 ::
0102   @@ -27,6 +27,8 @@ do_setup() {
0103        local working="$cfgdir/.$INTERFACE"
0104        local final="$cfgdir/$INTERFACE"
0105 
0106   +    if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi
0107   +
0108        if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then
0109            # interface is already known to ifupdown, no need to generate cfg
0110            log "Skipping configuration generation for $INTERFACE"
0111 
0112 
0113 Live Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode
0114 ==================================================================
0115 
0116 net_failover also enables hypervisor controlled live migration to be supported
0117 with VMs that have direct attached SR-IOV VF devices by automatic failover to
0118 the paravirtual datapath when the VF is unplugged.
0119 
0120 Here is a sample script that shows the steps to initiate live migration from
0121 the source hypervisor. Note: It is assumed that the VM is connected to a
0122 software bridge 'br0' which has a single VF attached to it along with the vnet
0123 device to the VM. This is not the VF that was passthrough'd to the VM (seen in
0124 the vf.xml file).
0125 ::
0126 
0127   # cat vf.xml
0128   <interface type='hostdev' managed='yes'>
0129     <mac address='52:54:00:00:12:53'/>
0130     <source>
0131       <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
0132     </source>
0133     <teaming type='transient' persistent='ua-backup0'/>
0134   </interface>
0135 
0136   # Source Hypervisor migrate.sh
0137   #!/bin/bash
0138 
0139   DOMAIN=vm-01
0140   PF=ens6np0
0141   VF=ens6v1             # VF attached to the bridge.
0142   VF_NUM=1
0143   TAP_IF=vmtap01        # virtio-net interface in the VM.
0144   VF_XML=vf.xml
0145 
0146   MAC=52:54:00:00:12:53
0147   ZERO_MAC=00:00:00:00:00:00
0148 
0149   # Set the virtio-net interface up.
0150   virsh domif-setlink $DOMAIN $TAP_IF up
0151 
0152   # Remove the VF that was passthrough'd to the VM.
0153   virsh detach-device --live --config $DOMAIN $VF_XML
0154 
0155   ip link set $PF vf $VF_NUM mac $ZERO_MAC
0156 
0157   # Add FDB entry for traffic to continue going to the VM via
0158   # the VF -> br0 -> vnet interface path.
0159   bridge fdb add $MAC dev $VF
0160   bridge fdb add $MAC dev $TAP_IF master
0161 
0162   # Migrate the VM
0163   virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system
0164 
0165   # Clean up FDB entries after migration completes.
0166   bridge fdb del $MAC dev $VF
0167   bridge fdb del $MAC dev $TAP_IF master
0168 
0169 On the destination hypervisor, a shared bridge 'br0' is created before migration
0170 starts, and a VF from the destination PF is added to the bridge. Similarly an
0171 appropriate FDB entry is added.
0172 
0173 The following script is executed on the destination hypervisor once migration
0174 completes, and it reattaches the VF to the VM and brings down the virtio-net
0175 interface.
0176 
0177 ::
0178   # reattach-vf.sh
0179   #!/bin/bash
0180 
0181   bridge fdb del 52:54:00:00:12:53 dev ens36v0
0182   bridge fdb del 52:54:00:00:12:53 dev vmtap01 master
0183   virsh attach-device --config --live vm01 vf.xml
0184   virsh domif-setlink vm01 vmtap01 down