0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 #########
0004 UML HowTo
0005 #########
0006
0007 .. contents:: :local:
0008
0009 ************
0010 Introduction
0011 ************
0012
0013 Welcome to User Mode Linux
0014
0015 User Mode Linux is the first Open Source virtualization platform (first
0016 release date 1991) and second virtualization platform for an x86 PC.
0017
0018 How is UML Different from a VM using Virtualization package X?
0019 ==============================================================
0020
0021 We have come to assume that virtualization also means some level of
0022 hardware emulation. In fact, it does not. As long as a virtualization
0023 package provides the OS with devices which the OS can recognize and
0024 has a driver for, the devices do not need to emulate real hardware.
0025 Most OSes today have built-in support for a number of "fake"
0026 devices used only under virtualization.
0027 User Mode Linux takes this concept to the ultimate extreme - there
0028 is not a single real device in sight. It is 100% artificial or if
0029 we use the correct term 100% paravirtual. All UML devices are abstract
0030 concepts which map onto something provided by the host - files, sockets,
0031 pipes, etc.
0032
0033 The other major difference between UML and various virtualization
0034 packages is that there is a distinct difference between the way the UML
0035 kernel and the UML programs operate.
0036 The UML kernel is just a process running on Linux - same as any other
0037 program. It can be run by an unprivileged user and it does not require
0038 anything in terms of special CPU features.
0039 The UML userspace, however, is a bit different. The Linux kernel on the
0040 host machine assists UML in intercepting everything the program running
0041 on a UML instance is trying to do and making the UML kernel handle all
0042 of its requests.
0043 This is different from other virtualization packages which do not make any
0044 difference between the guest kernel and guest programs. This difference
0045 results in a number of advantages and disadvantages of UML over let's say
0046 QEMU which we will cover later in this document.
0047
0048
0049 Why Would I Want User Mode Linux?
0050 =================================
0051
0052
0053 * If User Mode Linux kernel crashes, your host kernel is still fine. It
0054 is not accelerated in any way (vhost, kvm, etc) and it is not trying to
0055 access any devices directly. It is, in fact, a process like any other.
0056
0057 * You can run a usermode kernel as a non-root user (you may need to
0058 arrange appropriate permissions for some devices).
0059
0060 * You can run a very small VM with a minimal footprint for a specific
0061 task (for example 32M or less).
0062
0063 * You can get extremely high performance for anything which is a "kernel
0064 specific task" such as forwarding, firewalling, etc while still being
0065 isolated from the host kernel.
0066
0067 * You can play with kernel concepts without breaking things.
0068
0069 * You are not bound by "emulating" hardware, so you can try weird and
0070 wonderful concepts which are very difficult to support when emulating
0071 real hardware such as time travel and making your system clock
0072 dependent on what UML does (very useful for things like tests).
0073
0074 * It's fun.
0075
0076 Why not to run UML
0077 ==================
0078
0079 * The syscall interception technique used by UML makes it inherently
0080 slower for any userspace applications. While it can do kernel tasks
0081 on par with most other virtualization packages, its userspace is
0082 **slow**. The root cause is that UML has a very high cost of creating
0083 new processes and threads (something most Unix/Linux applications
0084 take for granted).
0085
0086 * UML is strictly uniprocessor at present. If you want to run an
0087 application which needs many CPUs to function, it is clearly the
0088 wrong choice.
0089
0090 ***********************
0091 Building a UML instance
0092 ***********************
0093
0094 There is no UML installer in any distribution. While you can use off
0095 the shelf install media to install into a blank VM using a virtualization
0096 package, there is no UML equivalent. You have to use appropriate tools on
0097 your host to build a viable filesystem image.
0098
0099 This is extremely easy on Debian - you can do it using debootstrap. It is
0100 also easy on OpenWRT - the build process can build UML images. All other
0101 distros - YMMV.
0102
0103 Creating an image
0104 =================
0105
0106 Create a sparse raw disk image::
0107
0108 # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G
0109
0110 This will create a 16G disk image. The OS will initially allocate only one
0111 block and will allocate more as they are written by UML. As of kernel
0112 version 4.19 UML fully supports TRIM (as usually used by flash drives).
0113 Using TRIM inside the UML image by specifying discard as a mount option
0114 or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to
0115 return any unused blocks to the OS.
0116
0117 Create a filesystem on the disk image and mount it::
0118
0119 # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt
0120
0121 This example uses ext4, any other filesystem such as ext3, btrfs, xfs,
0122 jfs, etc will work too.
0123
0124 Create a minimal OS installation on the mounted filesystem::
0125
0126 # debootstrap buster /mnt http://deb.debian.org/debian
0127
0128 debootstrap does not set up the root password, fstab, hostname or
0129 anything related to networking. It is up to the user to do that.
0130
0131 Set the root password - the easiest way to do that is to chroot into the
0132 mounted image::
0133
0134 # chroot /mnt
0135 # passwd
0136 # exit
0137
0138 Edit key system files
0139 =====================
0140
0141 UML block devices are called ubds. The fstab created by debootstrap
0142 will be empty and it needs an entry for the root file system::
0143
0144 /dev/ubd0 ext4 discard,errors=remount-ro 0 1
0145
0146 The image hostname will be set to the same as the host on which you
0147 are creating its image. It is a good idea to change that to avoid
0148 "Oh, bummer, I rebooted the wrong machine".
0149
0150 UML supports two classes of network devices - the older uml_net ones
0151 which are scheduled for obsoletion. These are called ethX. It also
0152 supports the newer vector IO devices which are significantly faster
0153 and have support for some standard virtual network encapsulations like
0154 Ethernet over GRE and Ethernet over L2TPv3. These are called vec0.
0155
0156 Depending on which one is in use, ``/etc/network/interfaces`` will
0157 need entries like::
0158
0159 # legacy UML network devices
0160 auto eth0
0161 iface eth0 inet dhcp
0162
0163 # vector UML network devices
0164 auto vec0
0165 iface vec0 inet dhcp
0166
0167 We now have a UML image which is nearly ready to run, all we need is a
0168 UML kernel and modules for it.
0169
0170 Most distributions have a UML package. Even if you intend to use your own
0171 kernel, testing the image with a stock one is always a good start. These
0172 packages come with a set of modules which should be copied to the target
0173 filesystem. The location is distribution dependent. For Debian these
0174 reside under /usr/lib/uml/modules. Copy recursively the content of this
0175 directory to the mounted UML filesystem::
0176
0177 # cp -rax /usr/lib/uml/modules /mnt/lib/modules
0178
0179 If you have compiled your own kernel, you need to use the usual "install
0180 modules to a location" procedure by running::
0181
0182 # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install
0183
0184 This will install modules into /mnt/lib/modules/$(KERNELRELEASE).
0185 To specify the full module installation path, use::
0186
0187 # make MODLIB=/mnt/lib/modules modules_install
0188
0189 At this point the image is ready to be brought up.
0190
0191 *************************
0192 Setting Up UML Networking
0193 *************************
0194
0195 UML networking is designed to emulate an Ethernet connection. This
0196 connection may be either point-to-point (similar to a connection
0197 between machines using a back-to-back cable) or a connection to a
0198 switch. UML supports a wide variety of means to build these
0199 connections to all of: local machine, remote machine(s), local and
0200 remote UML and other VM instances.
0201
0202
0203 +-----------+--------+------------------------------------+------------+
0204 | Transport | Type | Capabilities | Throughput |
0205 +===========+========+====================================+============+
0206 | tap | vector | checksum, tso | > 8Gbit |
0207 +-----------+--------+------------------------------------+------------+
0208 | hybrid | vector | checksum, tso, multipacket rx | > 6GBit |
0209 +-----------+--------+------------------------------------+------------+
0210 | raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit |
0211 +-----------+--------+------------------------------------+------------+
0212 | EoGRE | vector | multipacket rx, tx | > 3Gbit |
0213 +-----------+--------+------------------------------------+------------+
0214 | Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit |
0215 +-----------+--------+------------------------------------+------------+
0216 | bess | vector | multipacket rx, tx | > 3Gbit |
0217 +-----------+--------+------------------------------------+------------+
0218 | fd | vector | dependent on fd type | varies |
0219 +-----------+--------+------------------------------------+------------+
0220 | tuntap | legacy | none | ~ 500Mbit |
0221 +-----------+--------+------------------------------------+------------+
0222 | daemon | legacy | none | ~ 450Mbit |
0223 +-----------+--------+------------------------------------+------------+
0224 | socket | legacy | none | ~ 450Mbit |
0225 +-----------+--------+------------------------------------+------------+
0226 | pcap | legacy | rx only | ~ 450Mbit |
0227 +-----------+--------+------------------------------------+------------+
0228 | ethertap | legacy | obsolete | ~ 500Mbit |
0229 +-----------+--------+------------------------------------+------------+
0230 | vde | legacy | obsolete | ~ 500Mbit |
0231 +-----------+--------+------------------------------------+------------+
0232
0233 * All transports which have tso and checksum offloads can deliver speeds
0234 approaching 10G on TCP streams.
0235
0236 * All transports which have multi-packet rx and/or tx can deliver pps
0237 rates of up to 1Mps or more.
0238
0239 * All legacy transports are generally limited to ~600-700MBit and 0.05Mps.
0240
0241 * GRE and L2TPv3 allow connections to all of: local machine, remote
0242 machines, remote network devices and remote UML instances.
0243
0244 * Socket allows connections only between UML instances.
0245
0246 * Daemon and bess require running a local switch. This switch may be
0247 connected to the host as well.
0248
0249
0250 Network configuration privileges
0251 ================================
0252
0253 The majority of the supported networking modes need ``root`` privileges.
0254 For example, in the legacy tuntap networking mode, users were required
0255 to be part of the group associated with the tunnel device.
0256
0257 For newer network drivers like the vector transports, ``root`` privilege
0258 is required to fire an ioctl to setup the tun interface and/or use
0259 raw sockets where needed.
0260
0261 This can be achieved by granting the user a particular capability instead
0262 of running UML as root. In case of vector transport, a user can add the
0263 capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary.
0264 Thenceforth, UML can be run with normal user privilges, along with
0265 full networking.
0266
0267 For example::
0268
0269 # sudo setcap cap_net_raw,cap_net_admin+ep linux
0270
0271 Configuring vector transports
0272 ===============================
0273
0274 All vector transports support a similar syntax:
0275
0276 If X is the interface number as in vec0, vec1, vec2, etc, the general
0277 syntax for options is::
0278
0279 vecX:transport="Transport Name",option=value,option=value,...,option=value
0280
0281 Common options
0282 --------------
0283
0284 These options are common for all transports:
0285
0286 * ``depth=int`` - sets the queue depth for vector IO. This is the
0287 amount of packets UML will attempt to read or write in a single
0288 system call. The default number is 64 and is generally sufficient
0289 for most applications that need throughput in the 2-4 Gbit range.
0290 Higher speeds may require larger values.
0291
0292 * ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
0293
0294 * ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads.
0295 The effect of this option depends on the host side support in the transport
0296 which is being configured. In most cases it will enable TCP segmentation and
0297 RX/TX checksumming offloads. The setting must be identical on the host side
0298 and the UML side. The UML kernel will produce warnings if it is not.
0299 For example, GRO is enabled by default on local machine interfaces
0300 (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the
0301 corresponding UML transports (raw, tap, hybrid) in order for networking to
0302 operate correctly.
0303
0304 * ``mtu=int`` - sets the interface MTU
0305
0306 * ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
0307 if a packet will need to be re-encapsulated into for instance VXLAN.
0308
0309 * ``vec=0`` - disable multipacket IO and fall back to packet at a
0310 time mode
0311
0312 Shared Options
0313 --------------
0314
0315 * ``ifname=str`` Transports which bind to a local network interface
0316 have a shared option - the name of the interface to bind to.
0317
0318 * ``src, dst, src_port, dst_port`` - all transports which use sockets
0319 which have the notion of source and destination and/or source port
0320 and destination port use these to specify them.
0321
0322 * ``v6=[0,1]`` to specify if a v6 connection is desired for all
0323 transports which operate over IP. Additionally, for transports that
0324 have some differences in the way they operate over v4 and v6 (for example
0325 EoL2TPv3), sets the correct mode of operation. In the absence of this
0326 option, the socket type is determined based on what do the src and dst
0327 arguments resolve/parse to.
0328
0329 tap transport
0330 -------------
0331
0332 Example::
0333
0334 vecX:transport=tap,ifname=tap0,depth=128,gro=1
0335
0336 This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
0337 created using tunctl) and UP.
0338
0339 tap0 can be configured as a point-to-point interface and given an IP
0340 address so that UML can talk to the host. Alternatively, it is possible
0341 to connect UML to a tap interface which is connected to a bridge.
0342
0343 While tap relies on the vector infrastructure, it is not a true vector
0344 transport at this point, because Linux does not support multi-packet
0345 IO on tap file descriptors for normal userspace apps like UML. This
0346 is a privilege which is offered only to something which can hook up
0347 to it at kernel level via specialized interfaces like vhost-net. A
0348 vhost-net like helper for UML is planned at some point in the future.
0349
0350 Privileges required: tap transport requires either:
0351
0352 * tap interface to exist and be created persistent and owned by the
0353 UML user using tunctl. Example ``tunctl -u uml-user -t tap0``
0354
0355 * binary to have ``CAP_NET_ADMIN`` privilege
0356
0357 hybrid transport
0358 ----------------
0359
0360 Example::
0361
0362 vecX:transport=hybrid,ifname=tap0,depth=128,gro=1
0363
0364 This is an experimental/demo transport which couples tap for transmit
0365 and a raw socket for receive. The raw socket allows multi-packet
0366 receive resulting in significantly higher packet rates than normal tap.
0367
0368 Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
0369 the UML user as well as the requirements for the tap transport.
0370
0371 raw socket transport
0372 --------------------
0373
0374 Example::
0375
0376 vecX:transport=raw,ifname=p-veth0,depth=128,gro=1
0377
0378
0379 This transport uses vector IO on raw sockets. While you can bind to any
0380 interface including a physical one, the most common use it to bind to
0381 the "peer" side of a veth pair with the other side configured on the
0382 host.
0383
0384 Example host configuration for Debian:
0385
0386 **/etc/network/interfaces**::
0387
0388 auto veth0
0389 iface veth0 inet static
0390 address 192.168.4.1
0391 netmask 255.255.255.252
0392 broadcast 192.168.4.3
0393 pre-up ip link add veth0 type veth peer name p-veth0 && \
0394 ifconfig p-veth0 up
0395
0396 UML can now bind to p-veth0 like this::
0397
0398 vec0:transport=raw,ifname=p-veth0,depth=128,gro=1
0399
0400
0401 If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0
0402 it can talk to the host on 192.168.4.1
0403
0404 The raw transport also provides some support for offloading some of the
0405 filtering to the host. The two options to control it are:
0406
0407 * ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter
0408
0409 * ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux.
0410 This option allows the use of the ethtool load firmware command to
0411 load bpf code.
0412
0413 In either case the bpf code is loaded into the host kernel. While this is
0414 presently limited to legacy bpf syntax (not ebpf), it is still a security
0415 risk. It is not recommended to allow this unless the User Mode Linux
0416 instance is considered trusted.
0417
0418 Privileges required: raw socket transport requires `CAP_NET_RAW`
0419 capability.
0420
0421 GRE socket transport
0422 --------------------
0423
0424 Example::
0425
0426 vecX:transport=gre,src=$src_host,dst=$dst_host
0427
0428
0429 This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
0430 ``GREIRB``) tunnel which will connect the UML instance to a ``GRE``
0431 endpoint at host dst_host. ``GRE`` supports the following additional
0432 options:
0433
0434 * ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set,
0435 ``txkey`` must be set too
0436
0437 * ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set
0438 ``rx_key`` must be set too
0439
0440 * ``sequence=[0,1]`` - enable GRE sequence
0441
0442 * ``pin_sequence=[0,1]`` - pretend that the sequence is always reset
0443 on each packet (needed to interoperate with some really broken
0444 implementations)
0445
0446 * ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively
0447
0448 * GRE checksum is not presently supported
0449
0450 GRE has a number of caveats:
0451
0452 * You can use only one GRE connection per IP address. There is no way to
0453 multiplex connections as each GRE tunnel is terminated directly on
0454 the UML instance.
0455
0456 * The key is not really a security feature. While it was intended as such
0457 its "security" is laughable. It is, however, a useful feature to
0458 ensure that the tunnel is not misconfigured.
0459
0460 An example configuration for a Linux host with a local address of
0461 192.168.128.1 to connect to a UML instance at 192.168.129.1
0462
0463 **/etc/network/interfaces**::
0464
0465 auto gt0
0466 iface gt0 inet static
0467 address 10.0.0.1
0468 netmask 255.255.255.0
0469 broadcast 10.0.0.255
0470 mtu 1500
0471 pre-up ip link add gt0 type gretap local 192.168.128.1 \
0472 remote 192.168.129.1 || true
0473 down ip link del gt0 || true
0474
0475 Additionally, GRE has been tested versus a variety of network equipment.
0476
0477 Privileges required: GRE requires ``CAP_NET_RAW``
0478
0479 l2tpv3 socket transport
0480 -----------------------
0481
0482 _Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more
0483 options than GNU ls". While it has some advantages, there are usually
0484 easier (and less verbose) ways to connect a UML instance to something.
0485 For example, most devices which support L2TPv3 also support GRE.
0486
0487 Example::
0488
0489 vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff
0490
0491 This will configure an Ethernet over L2TPv3 fixed tunnel which will
0492 connect the UML instance to a L2TPv3 endpoint at host $dst_host using
0493 the L2TPv3 UDP flavour and UDP destination port $dst_port.
0494
0495 L2TPv3 always requires the following additional options:
0496
0497 * ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets
0498
0499 * ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets
0500
0501 As the tunnel is fixed these are not negotiated and they are
0502 preconfigured on both ends.
0503
0504 Additionally, L2TPv3 supports the following optional parameters.
0505
0506 * ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same
0507 functionality as GRE key, more to prevent misconfiguration than provide
0508 actual security
0509
0510 * ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets
0511
0512 * ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit.
0513
0514 * ``counter=[0,1]`` - enable l2tpv3 counter
0515
0516 * ``pin_counter=[0,1]`` - pretend that the counter is always reset on
0517 each packet (needed to interoperate with some really broken
0518 implementations)
0519
0520 * ``v6=[0,1]`` - force v6 sockets
0521
0522 * ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol
0523
0524 L2TPv3 has a number of caveats:
0525
0526 * you can use only one connection per IP address in raw mode. There is
0527 no way to multiplex connections as each L2TPv3 tunnel is terminated
0528 directly on the UML instance. UDP mode can use different ports for
0529 this purpose.
0530
0531 Here is an example of how to configure a Linux host to connect to UML
0532 via L2TPv3:
0533
0534 **/etc/network/interfaces**::
0535
0536 auto l2tp1
0537 iface l2tp1 inet static
0538 address 192.168.126.1
0539 netmask 255.255.255.0
0540 broadcast 192.168.126.255
0541 mtu 1500
0542 pre-up ip l2tp add tunnel remote 127.0.0.1 \
0543 local 127.0.0.1 encap udp tunnel_id 2 \
0544 peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \
0545 ip l2tp add session name l2tp1 tunnel_id 2 \
0546 session_id 0xffffffff peer_session_id 0xffffffff
0547 down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \
0548 ip l2tp del tunnel tunnel_id 2
0549
0550
0551 Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and
0552 no special privileges for the UDP mode.
0553
0554 BESS socket transport
0555 ---------------------
0556
0557 BESS is a high performance modular network switch.
0558
0559 https://github.com/NetSys/bess
0560
0561 It has support for a simple sequential packet socket mode which in the
0562 more recent versions is using vector IO for high performance.
0563
0564 Example::
0565
0566 vecX:transport=bess,src=$unix_src,dst=$unix_dst
0567
0568 This will configure a BESS transport using the unix_src Unix domain
0569 socket address as source and unix_dst socket address as destination.
0570
0571 For BESS configuration and how to allocate a BESS Unix domain socket port
0572 please see the BESS documentation.
0573
0574 https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports
0575
0576 BESS transport does not require any special privileges.
0577
0578 Configuring Legacy transports
0579 =============================
0580
0581 Legacy transports are now considered obsolete. Please use the vector
0582 versions.
0583
0584 ***********
0585 Running UML
0586 ***********
0587
0588 This section assumes that either the user-mode-linux package from the
0589 distribution or a custom built kernel has been installed on the host.
0590
0591 These add an executable called linux to the system. This is the UML
0592 kernel. It can be run just like any other executable.
0593 It will take most normal linux kernel arguments as command line
0594 arguments. Additionally, it will need some UML-specific arguments
0595 in order to do something useful.
0596
0597 Arguments
0598 =========
0599
0600 Mandatory Arguments:
0601 --------------------
0602
0603 * ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will
0604 also accept K, M or G qualifiers.
0605
0606 * ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
0607 mandatory, but it is likely to be needed in nearly all cases so we can
0608 specify a root file system.
0609 The simplest possible image specification is the name of the image
0610 file for the filesystem (created using one of the methods described
0611 in `Creating an image`_).
0612
0613 * UBD devices support copy on write (COW). The changes are kept in
0614 a separate file which can be discarded allowing a rollback to the
0615 original pristine image. If COW is desired, the UBD image is
0616 specified as: ``cow_file,master_image``.
0617 Example:``ubd0=Filesystem.cow,Filesystem.img``
0618
0619 * UBD devices can be set to use synchronous IO. Any writes are
0620 immediately flushed to disk. This is done by adding ``s`` after
0621 the ``ubdX`` specification.
0622
0623 * UBD performs some heuristics on devices specified as a single
0624 filename to make sure that a COW file has not been specified as
0625 the image. To turn them off, use the ``d`` flag after ``ubdX``.
0626
0627 * UBD supports TRIM - asking the Host OS to reclaim any unused
0628 blocks in the image. To turn it off, specify the ``t`` flag after
0629 ``ubdX``.
0630
0631 * ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
0632 filesystem image)
0633
0634 Important Optional Arguments
0635 ----------------------------
0636
0637 If UML is run as "linux" with no extra arguments, it will try to start an
0638 xterm for every console configured inside the image (up to 6 in most
0639 Linux distributions). Each console is started inside an
0640 xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
0641 however, the wrong approach if UML is to be used as a testing harness or run
0642 in a text-only environment.
0643
0644 In order to change this behaviour we need to specify an alternative console
0645 and wire it to one of the supported "line" channels. For this we need to map a
0646 console to use something different from the default xterm.
0647
0648 Example which will divert console number 1 to stdin/stdout::
0649
0650 con1=fd:0,fd:1
0651
0652 UML supports a wide variety of serial line channels which are specified using
0653 the following syntax
0654
0655 conX=channel_type:options[,channel_type:options]
0656
0657
0658 If the channel specification contains two parts separated by comma, the first
0659 one is input, the second one output.
0660
0661 * The null channel - Discard all input or output. Example ``con=null`` will set
0662 all consoles to null by default.
0663
0664 * The fd channel - use file descriptor numbers for input/output. Example:
0665 ``con1=fd:0,fd:1.``
0666
0667 * The port channel - start a telnet server on TCP port number. Example:
0668 ``con1=port:4321``. The host must have /usr/sbin/in.telnetd (usually part of
0669 a telnetd package) and the port-helper from the UML utilities (see the
0670 information for the xterm channel below). UML will not boot until a client
0671 connects.
0672
0673 * The pty and pts channels - use system pty/pts.
0674
0675 * The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8``
0676 will make UML use the host 8th console (usually unused).
0677
0678 * The xterm channel - this is the default - bring up an xterm on this channel
0679 and direct IO to it. Note that in order for xterm to work, the host must
0680 have the UML distribution package installed. This usually contains the
0681 port-helper and other utilities needed for UML to communicate with the xterm.
0682 Alternatively, these need to be complied and installed from source. All
0683 options applicable to consoles also apply to UML serial lines which are
0684 presented as ttyS inside UML.
0685
0686 Starting UML
0687 ============
0688
0689 We can now run UML.
0690 ::
0691
0692 # linux mem=2048M umid=TEST \
0693 ubd0=Filesystem.img \
0694 vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
0695 root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
0696
0697 This will run an instance with ``2048M RAM`` and try to use the image file
0698 called ``Filesystem.img`` as root. It will connect to the host using tap0.
0699 All consoles except ``con1`` will be disabled and console 1 will
0700 use standard input/output making it appear in the same terminal it was started.
0701
0702 Logging in
0703 ============
0704
0705 If you have not set up a password when generating the image, you will have to
0706 shut down the UML instance, mount the image, chroot into it and set it - as
0707 described in the Generating an Image section. If the password is already set,
0708 you can just log in.
0709
0710 The UML Management Console
0711 ============================
0712
0713 In addition to managing the image from "the inside" using normal sysadmin tools,
0714 it is possible to perform a number of low-level operations using the UML
0715 management console. The UML management console is a low-level interface to the
0716 kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
0717 there is a full-blown operating system under UML, there is much greater
0718 flexibility possible than with the SysRq mechanism.
0719
0720 There are a number of things you can do with the mconsole interface:
0721
0722 * get the kernel version
0723 * add and remove devices
0724 * halt or reboot the machine
0725 * Send SysRq commands
0726 * Pause and resume the UML
0727 * Inspect processes running inside UML
0728 * Inspect UML internal /proc state
0729
0730 You need the mconsole client (uml\_mconsole) which is a part of the UML
0731 tools package available in most Linux distritions.
0732
0733 You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML
0734 kernel. When you boot UML, you'll see a line like::
0735
0736 mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
0737
0738 If you specify a unique machine id on the UML command line, i.e.
0739 ``umid=debian``, you'll see this::
0740
0741 mconsole initialized on /home/jdike/.uml/debian/mconsole
0742
0743
0744 That file is the socket that uml_mconsole will use to communicate with
0745 UML. Run it with either the umid or the full path as its argument::
0746
0747 # uml_mconsole debian
0748
0749 or
0750
0751 # uml_mconsole /home/jdike/.uml/debian/mconsole
0752
0753
0754 You'll get a prompt, at which you can run one of these commands:
0755
0756 * version
0757 * help
0758 * halt
0759 * reboot
0760 * config
0761 * remove
0762 * sysrq
0763 * help
0764 * cad
0765 * stop
0766 * go
0767 * proc
0768 * stack
0769
0770 version
0771 -------
0772
0773 This command takes no arguments. It prints the UML version::
0774
0775 (mconsole) version
0776 OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64
0777
0778
0779 There are a couple actual uses for this. It's a simple no-op which
0780 can be used to check that a UML is running. It's also a way of
0781 sending a device interrupt to the UML. UML mconsole is treated internally as
0782 a UML device.
0783
0784 help
0785 ----
0786
0787 This command takes no arguments. It prints a short help screen with the
0788 supported mconsole commands.
0789
0790
0791 halt and reboot
0792 ---------------
0793
0794 These commands take no arguments. They shut the machine down immediately, with
0795 no syncing of disks and no clean shutdown of userspace. So, they are
0796 pretty close to crashing the machine::
0797
0798 (mconsole) halt
0799 OK
0800
0801 config
0802 ------
0803
0804 "config" adds a new device to the virtual machine. This is supported
0805 by most UML device drivers. It takes one argument, which is the
0806 device to add, with the same syntax as the kernel command line::
0807
0808 (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22
0809
0810 remove
0811 ------
0812
0813 "remove" deletes a device from the system. Its argument is just the
0814 name of the device to be removed. The device must be idle in whatever
0815 sense the driver considers necessary. In the case of the ubd driver,
0816 the removed block device must not be mounted, swapped on, or otherwise
0817 open, and in the case of the network driver, the device must be down::
0818
0819 (mconsole) remove ubd3
0820
0821 sysrq
0822 -----
0823
0824 This command takes one argument, which is a single letter. It calls the
0825 generic kernel's SysRq driver, which does whatever is called for by
0826 that argument. See the SysRq documentation in
0827 Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
0828 see what letters are valid and what they do.
0829
0830 cad
0831 ---
0832
0833 This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly
0834 this ends up doing is up to init, systemd, etc. Normally, it reboots the
0835 machine.
0836
0837 stop
0838 ----
0839
0840 This puts the UML in a loop reading mconsole requests until a 'go'
0841 mconsole command is received. This is very useful as a
0842 debugging/snapshotting tool.
0843
0844 go
0845 --
0846
0847 This resumes a UML after being paused by a 'stop' command. Note that
0848 when the UML has resumed, TCP connections may have timed out and if
0849 the UML is paused for a long period of time, crond might go a little
0850 crazy, running all the jobs it didn't do earlier.
0851
0852 proc
0853 ----
0854
0855 This takes one argument - the name of a file in /proc which is printed
0856 to the mconsole standard output
0857
0858 stack
0859 -----
0860
0861 This takes one argument - the pid number of a process. Its stack is
0862 printed to a standard output.
0863
0864 *******************
0865 Advanced UML Topics
0866 *******************
0867
0868 Sharing Filesystems between Virtual Machines
0869 ============================================
0870
0871 Don't attempt to share filesystems simply by booting two UMLs from the
0872 same file. That's the same thing as booting two physical machines
0873 from a shared disk. It will result in filesystem corruption.
0874
0875 Using layered block devices
0876 ---------------------------
0877
0878 The way to share a filesystem between two virtual machines is to use
0879 the copy-on-write (COW) layering capability of the ubd block driver.
0880 Any changed blocks are stored in the private COW file, while reads come
0881 from either device - the private one if the requested block is valid in
0882 it, the shared one if not. Using this scheme, the majority of data
0883 which is unchanged is shared between an arbitrary number of virtual
0884 machines, each of which has a much smaller file containing the changes
0885 that it has made. With a large number of UMLs booting from a large root
0886 filesystem, this leads to a huge disk space saving.
0887
0888 Sharing file system data will also help performance, since the host will
0889 be able to cache the shared data using a much smaller amount of memory,
0890 so UML disk requests will be served from the host's memory rather than
0891 its disks. There is a major caveat in doing this on multisocket NUMA
0892 machines. On such hardware, running many UML instances with a shared
0893 master image and COW changes may cause issues like NMIs from excess of
0894 inter-socket traffic.
0895
0896 If you are running UML on high-end hardware like this, make sure to
0897 bind UML to a set of logical CPUs residing on the same socket using the
0898 ``taskset`` command or have a look at the "tuning" section.
0899
0900 To add a copy-on-write layer to an existing block device file, simply
0901 add the name of the COW file to the appropriate ubd switch::
0902
0903 ubd0=root_fs_cow,root_fs_debian_22
0904
0905 where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is
0906 the existing shared filesystem. The COW file need not exist. If it
0907 doesn't, the driver will create and initialize it.
0908
0909 Disk Usage
0910 ----------
0911
0912 UML has TRIM support which will release any unused space in its disk
0913 image files to the underlying OS. It is important to use either ls -ls
0914 or du to verify the actual file size.
0915
0916 COW validity.
0917 -------------
0918
0919 Any changes to the master image will invalidate all COW files. If this
0920 happens, UML will *NOT* automatically delete any of the COW files and
0921 will refuse to boot. In this case the only solution is to either
0922 restore the old image (including its last modified timestamp) or remove
0923 all COW files which will result in their recreation. Any changes in
0924 the COW files will be lost.
0925
0926 Cows can moo - uml_moo : Merging a COW file with its backing file
0927 -----------------------------------------------------------------
0928
0929 Depending on how you use UML and COW devices, it may be advisable to
0930 merge the changes in the COW file into the backing file every once in
0931 a while.
0932
0933 The utility that does this is uml_moo. Its usage is::
0934
0935 uml_moo COW_file new_backing_file
0936
0937
0938 There's no need to specify the backing file since that information is
0939 already in the COW file header. If you're paranoid, boot the new
0940 merged file, and if you're happy with it, move it over the old backing
0941 file.
0942
0943 ``uml_moo`` creates a new backing file by default as a safety measure.
0944 It also has a destructive merge option which will merge the COW file
0945 directly into its current backing file. This is really only usable
0946 when the backing file only has one COW file associated with it. If
0947 there are multiple COWs associated with a backing file, a -d merge of
0948 one of them will invalidate all of the others. However, it is
0949 convenient if you're short of disk space, and it should also be
0950 noticeably faster than a non-destructive merge.
0951
0952 ``uml_moo`` is installed with the UML distribution packages and is
0953 available as a part of UML utilities.
0954
0955 Host file access
0956 ==================
0957
0958 If you want to access files on the host machine from inside UML, you
0959 can treat it as a separate machine and either nfs mount directories
0960 from the host or copy files into the virtual machine with scp.
0961 However, since UML is running on the host, it can access those
0962 files just like any other process and make them available inside the
0963 virtual machine without the need to use the network.
0964 This is possible with the hostfs virtual filesystem. With it, you
0965 can mount a host directory into the UML filesystem and access the
0966 files contained in it just as you would on the host.
0967
0968 *SECURITY WARNING*
0969
0970 Hostfs without any parameters to the UML Image will allow the image
0971 to mount any part of the host filesystem and write to it. Always
0972 confine hostfs to a specific "harmless" directory (for example ``/var/tmp``)
0973 if running UML. This is especially important if UML is being run as root.
0974
0975 Using hostfs
0976 ------------
0977
0978 To begin with, make sure that hostfs is available inside the virtual
0979 machine with::
0980
0981 # cat /proc/filesystems
0982
0983 ``hostfs`` should be listed. If it's not, either rebuild the kernel
0984 with hostfs configured into it or make sure that hostfs is built as a
0985 module and available inside the virtual machine, and insmod it.
0986
0987
0988 Now all you need to do is run mount::
0989
0990 # mount none /mnt/host -t hostfs
0991
0992 will mount the host's ``/`` on the virtual machine's ``/mnt/host``.
0993 If you don't want to mount the host root directory, then you can
0994 specify a subdirectory to mount with the -o switch to mount::
0995
0996 # mount none /mnt/home -t hostfs -o /home
0997
0998 will mount the host's /home on the virtual machine's /mnt/home.
0999
1000 hostfs as the root filesystem
1001 -----------------------------
1002
1003 It's possible to boot from a directory hierarchy on the host using
1004 hostfs rather than using the standard filesystem in a file.
1005 To start, you need that hierarchy. The easiest way is to loop mount
1006 an existing root_fs file::
1007
1008 # mount root_fs uml_root_dir -o loop
1009
1010
1011 You need to change the filesystem type of ``/`` in ``etc/fstab`` to be
1012 'hostfs', so that line looks like this::
1013
1014 /dev/ubd/0 / hostfs defaults 1 1
1015
1016 Then you need to chown to yourself all the files in that directory
1017 that are owned by root. This worked for me::
1018
1019 # find . -uid 0 -exec chown jdike {} \;
1020
1021 Next, make sure that your UML kernel has hostfs compiled in, not as a
1022 module. Then run UML with the boot device pointing at that directory::
1023
1024 ubd0=/path/to/uml/root/directory
1025
1026 UML should then boot as it does normally.
1027
1028 Hostfs Caveats
1029 --------------
1030
1031 Hostfs does not support keeping track of host filesystem changes on the
1032 host (outside UML). As a result, if a file is changed without UML's
1033 knowledge, UML will not know about it and its own in-memory cache of
1034 the file may be corrupt. While it is possible to fix this, it is not
1035 something which is being worked on at present.
1036
1037 Tuning UML
1038 ============
1039
1040 UML at present is strictly uniprocessor. It will, however spin up a
1041 number of threads to handle various functions.
1042
1043 The UBD driver, SIGIO and the MMU emulation do that. If the system is
1044 idle, these threads will be migrated to other processors on a SMP host.
1045 This, unfortunately, will usually result in LOWER performance because of
1046 all of the cache/memory synchronization traffic between cores. As a
1047 result, UML will usually benefit from being pinned on a single CPU,
1048 especially on a large system. This can result in performance differences
1049 of 5 times or higher on some benchmarks.
1050
1051 Similarly, on large multi-node NUMA systems UML will benefit if all of
1052 its memory is allocated from the same NUMA node it will run on. The
1053 OS will *NOT* do that by default. In order to do that, the sysadmin
1054 needs to create a suitable tmpfs ramdisk bound to a particular node
1055 and use that as the source for UML RAM allocation by specifying it
1056 in the TMP or TEMP environment variables. UML will look at the values
1057 of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will
1058 look for shmfs mounted under ``/dev/shm``. If everything else fails use
1059 ``/tmp/`` regardless of the filesystem type used for it::
1060
1061 mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX
1062 TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options..
1063
1064 *******************************************
1065 Contributing to UML and Developing with UML
1066 *******************************************
1067
1068 UML is an excellent platform to develop new Linux kernel concepts -
1069 filesystems, devices, virtualization, etc. It provides unrivalled
1070 opportunities to create and test them without being constrained to
1071 emulating specific hardware.
1072
1073 Example - want to try how Linux will work with 4096 "proper" network
1074 devices?
1075
1076 Not an issue with UML. At the same time, this is something which
1077 is difficult with other virtualization packages - they are
1078 constrained by the number of devices allowed on the hardware bus
1079 they are trying to emulate (for example 16 on a PCI bus in qemu).
1080
1081 If you have something to contribute such as a patch, a bugfix, a
1082 new feature, please send it to ``linux-um@lists.infradead.org``.
1083
1084 Please follow all standard Linux patch guidelines such as cc-ing
1085 relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch.
1086 For more details see ``Documentation/process/submitting-patches.rst``
1087
1088 Note - the list does not accept HTML or attachments, all emails must
1089 be formatted as plain text.
1090
1091 Developing always goes hand in hand with debugging. First of all,
1092 you can always run UML under gdb and there will be a whole section
1093 later on on how to do that. That, however, is not the only way to
1094 debug a Linux kernel. Quite often adding tracing statements and/or
1095 using UML specific approaches such as ptracing the UML kernel process
1096 are significantly more informative.
1097
1098 Tracing UML
1099 =============
1100
1101 When running, UML consists of a main kernel thread and a number of
1102 helper threads. The ones of interest for tracing are NOT the ones
1103 that are already ptraced by UML as a part of its MMU emulation.
1104
1105 These are usually the first three threads visible in a ps display.
1106 The one with the lowest PID number and using most CPU is usually the
1107 kernel thread. The other threads are the disk
1108 (ubd) device helper thread and the SIGIO helper thread.
1109 Running ptrace on this thread usually results in the following picture::
1110
1111 host$ strace -p 16566
1112 --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} ---
1113 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
1114 epoll_wait(4, [], 64, 0) = 0
1115 rt_sigreturn({mask=[PIPE]}) = 16967
1116 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
1117 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
1118 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
1119 ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0
1120 ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0
1121 ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0
1122 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} ---
1123 wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967
1124 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
1125 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
1126 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
1127 timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0
1128 getpid() = 16566
1129 clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
1130 --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
1131 rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call)
1132
1133 This is a typical picture from a mostly idle UML instance.
1134
1135 * UML interrupt controller uses epoll - this is UML waiting for IO
1136 interrupts:
1137
1138 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
1139
1140 * The sequence of ptrace calls is part of MMU emulation and running the
1141 UML userspace.
1142 * ``timer_settime`` is part of the UML high res timer subsystem mapping
1143 timer requests from inside UML onto the host high resolution timers.
1144 * ``clock_nanosleep`` is UML going into idle (similar to the way a PC
1145 will execute an ACPI idle).
1146
1147 As you can see UML will generate quite a bit of output even in idle. The output
1148 can be very informative when observing IO. It shows the actual IO calls, their
1149 arguments and returns values.
1150
1151 Kernel debugging
1152 ================
1153
1154 You can run UML under gdb now, though it will not necessarily agree to
1155 be started under it. If you are trying to track a runtime bug, it is
1156 much better to attach gdb to a running UML instance and let UML run.
1157
1158 Assuming the same PID number as in the previous example, this would be::
1159
1160 # gdb -p 16566
1161
1162 This will STOP the UML instance, so you must enter `cont` at the GDB
1163 command line to request it to continue. It may be a good idea to make
1164 this into a gdb script and pass it to gdb as an argument.
1165
1166 Developing Device Drivers
1167 =========================
1168
1169 Nearly all UML drivers are monolithic. While it is possible to build a
1170 UML driver as a kernel module, that limits the possible functionality
1171 to in-kernel only and non-UML specific. The reason for this is that
1172 in order to really leverage UML, one needs to write a piece of
1173 userspace code which maps driver concepts onto actual userspace host
1174 calls.
1175
1176 This forms the so-called "user" portion of the driver. While it can
1177 reuse a lot of kernel concepts, it is generally just another piece of
1178 userspace code. This portion needs some matching "kernel" code which
1179 resides inside the UML image and which implements the Linux kernel part.
1180
1181 *Note: There are very few limitations in the way "kernel" and "user" interact*.
1182
1183 UML does not have a strictly defined kernel-to-host API. It does not
1184 try to emulate a specific architecture or bus. UML's "kernel" and
1185 "user" can share memory, code and interact as needed to implement
1186 whatever design the software developer has in mind. The only
1187 limitations are purely technical. Due to a lot of functions and
1188 variables having the same names, the developer should be careful
1189 which includes and libraries they are trying to refer to.
1190
1191 As a result a lot of userspace code consists of simple wrappers.
1192 E.g. ``os_close_file()`` is just a wrapper around ``close()``
1193 which ensures that the userspace function close does not clash
1194 with similarly named function(s) in the kernel part.
1195
1196 Using UML as a Test Platform
1197 ============================
1198
1199 UML is an excellent test platform for device driver development. As
1200 with most things UML, "some user assembly may be required". It is
1201 up to the user to build their emulation environment. UML at present
1202 provides only the kernel infrastructure.
1203
1204 Part of this infrastructure is the ability to load and parse fdt
1205 device tree blobs as used in Arm or Open Firmware platforms. These
1206 are supplied as an optional extra argument to the kernel command
1207 line::
1208
1209 dtb=filename
1210
1211 The device tree is loaded and parsed at boottime and is accessible by
1212 drivers which query it. At this moment in time this facility is
1213 intended solely for development purposes. UML's own devices do not
1214 query the device tree.
1215
1216 Security Considerations
1217 -----------------------
1218
1219 Drivers or any new functionality should default to not
1220 accepting arbitrary filename, bpf code or other parameters
1221 which can affect the host from inside the UML instance.
1222 For example, specifying the socket used for IPC communication
1223 between a driver and the host at the UML command line is OK
1224 security-wise. Allowing it as a loadable module parameter
1225 isn't.
1226
1227 If such functionality is desireable for a particular application
1228 (e.g. loading BPF "firmware" for raw socket network transports),
1229 it should be off by default and should be explicitly turned on
1230 as a command line parameter at startup.
1231
1232 Even with this in mind, the level of isolation between UML
1233 and the host is relatively weak. If the UML userspace is
1234 allowed to load arbitrary kernel drivers, an attacker can
1235 use this to break out of UML. Thus, if UML is used in
1236 a production application, it is recommended that all modules
1237 are loaded at boot and kernel module loading is disabled
1238 afterwards.