Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ===============
0004 Shared Subtrees
0005 ===============
0006 
0007 .. Contents:
0008         1) Overview
0009         2) Features
0010         3) Setting mount states
0011         4) Use-case
0012         5) Detailed semantics
0013         6) Quiz
0014         7) FAQ
0015         8) Implementation
0016 
0017 
0018 1) Overview
0019 -----------
0020 
0021 Consider the following situation:
0022 
0023 A process wants to clone its own namespace, but still wants to access the CD
0024 that got mounted recently.  Shared subtree semantics provide the necessary
0025 mechanism to accomplish the above.
0026 
0027 It provides the necessary building blocks for features like per-user-namespace
0028 and versioned filesystem.
0029 
0030 2) Features
0031 -----------
0032 
0033 Shared subtree provides four different flavors of mounts; struct vfsmount to be
0034 precise
0035 
0036         a. shared mount
0037         b. slave mount
0038         c. private mount
0039         d. unbindable mount
0040 
0041 
0042 2a) A shared mount can be replicated to as many mountpoints and all the
0043 replicas continue to be exactly same.
0044 
0045         Here is an example:
0046 
0047         Let's say /mnt has a mount that is shared::
0048 
0049             mount --make-shared /mnt
0050 
0051         Note: mount(8) command now supports the --make-shared flag,
0052         so the sample 'smount' program is no longer needed and has been
0053         removed.
0054 
0055         ::
0056 
0057             # mount --bind /mnt /tmp
0058 
0059         The above command replicates the mount at /mnt to the mountpoint /tmp
0060         and the contents of both the mounts remain identical.
0061 
0062         ::
0063 
0064             #ls /mnt
0065             a b c
0066 
0067             #ls /tmp
0068             a b c
0069 
0070         Now let's say we mount a device at /tmp/a::
0071 
0072             # mount /dev/sd0  /tmp/a
0073 
0074             #ls /tmp/a
0075             t1 t2 t3
0076 
0077             #ls /mnt/a
0078             t1 t2 t3
0079 
0080         Note that the mount has propagated to the mount at /mnt as well.
0081 
0082         And the same is true even when /dev/sd0 is mounted on /mnt/a. The
0083         contents will be visible under /tmp/a too.
0084 
0085 
0086 2b) A slave mount is like a shared mount except that mount and umount events
0087         only propagate towards it.
0088 
0089         All slave mounts have a master mount which is a shared.
0090 
0091         Here is an example:
0092 
0093         Let's say /mnt has a mount which is shared.
0094         # mount --make-shared /mnt
0095 
0096         Let's bind mount /mnt to /tmp
0097         # mount --bind /mnt /tmp
0098 
0099         the new mount at /tmp becomes a shared mount and it is a replica of
0100         the mount at /mnt.
0101 
0102         Now let's make the mount at /tmp; a slave of /mnt
0103         # mount --make-slave /tmp
0104 
0105         let's mount /dev/sd0 on /mnt/a
0106         # mount /dev/sd0 /mnt/a
0107 
0108         #ls /mnt/a
0109         t1 t2 t3
0110 
0111         #ls /tmp/a
0112         t1 t2 t3
0113 
0114         Note the mount event has propagated to the mount at /tmp
0115 
0116         However let's see what happens if we mount something on the mount at /tmp
0117 
0118         # mount /dev/sd1 /tmp/b
0119 
0120         #ls /tmp/b
0121         s1 s2 s3
0122 
0123         #ls /mnt/b
0124 
0125         Note how the mount event has not propagated to the mount at
0126         /mnt
0127 
0128 
0129 2c) A private mount does not forward or receive propagation.
0130 
0131         This is the mount we are familiar with. Its the default type.
0132 
0133 
0134 2d) A unbindable mount is a unbindable private mount
0135 
0136         let's say we have a mount at /mnt and we make it unbindable::
0137 
0138             # mount --make-unbindable /mnt
0139 
0140          Let's try to bind mount this mount somewhere else::
0141 
0142             # mount --bind /mnt /tmp
0143             mount: wrong fs type, bad option, bad superblock on /mnt,
0144                     or too many mounted file systems
0145 
0146         Binding a unbindable mount is a invalid operation.
0147 
0148 
0149 3) Setting mount states
0150 
0151         The mount command (util-linux package) can be used to set mount
0152         states::
0153 
0154             mount --make-shared mountpoint
0155             mount --make-slave mountpoint
0156             mount --make-private mountpoint
0157             mount --make-unbindable mountpoint
0158 
0159 
0160 4) Use cases
0161 ------------
0162 
0163         A) A process wants to clone its own namespace, but still wants to
0164            access the CD that got mounted recently.
0165 
0166            Solution:
0167 
0168                 The system administrator can make the mount at /cdrom shared::
0169 
0170                     mount --bind /cdrom /cdrom
0171                     mount --make-shared /cdrom
0172 
0173                 Now any process that clones off a new namespace will have a
0174                 mount at /cdrom which is a replica of the same mount in the
0175                 parent namespace.
0176 
0177                 So when a CD is inserted and mounted at /cdrom that mount gets
0178                 propagated to the other mount at /cdrom in all the other clone
0179                 namespaces.
0180 
0181         B) A process wants its mounts invisible to any other process, but
0182         still be able to see the other system mounts.
0183 
0184            Solution:
0185 
0186                 To begin with, the administrator can mark the entire mount tree
0187                 as shareable::
0188 
0189                     mount --make-rshared /
0190 
0191                 A new process can clone off a new namespace. And mark some part
0192                 of its namespace as slave::
0193 
0194                     mount --make-rslave /myprivatetree
0195 
0196                 Hence forth any mounts within the /myprivatetree done by the
0197                 process will not show up in any other namespace. However mounts
0198                 done in the parent namespace under /myprivatetree still shows
0199                 up in the process's namespace.
0200 
0201 
0202         Apart from the above semantics this feature provides the
0203         building blocks to solve the following problems:
0204 
0205         C)  Per-user namespace
0206 
0207                 The above semantics allows a way to share mounts across
0208                 namespaces.  But namespaces are associated with processes. If
0209                 namespaces are made first class objects with user API to
0210                 associate/disassociate a namespace with userid, then each user
0211                 could have his/her own namespace and tailor it to his/her
0212                 requirements. This needs to be supported in PAM.
0213 
0214         D)  Versioned files
0215 
0216                 If the entire mount tree is visible at multiple locations, then
0217                 an underlying versioning file system can return different
0218                 versions of the file depending on the path used to access that
0219                 file.
0220 
0221                 An example is::
0222 
0223                     mount --make-shared /
0224                     mount --rbind / /view/v1
0225                     mount --rbind / /view/v2
0226                     mount --rbind / /view/v3
0227                     mount --rbind / /view/v4
0228 
0229                 and if /usr has a versioning filesystem mounted, then that
0230                 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
0231                 /view/v4/usr too
0232 
0233                 A user can request v3 version of the file /usr/fs/namespace.c
0234                 by accessing /view/v3/usr/fs/namespace.c . The underlying
0235                 versioning filesystem can then decipher that v3 version of the
0236                 filesystem is being requested and return the corresponding
0237                 inode.
0238 
0239 5) Detailed semantics
0240 ---------------------
0241         The section below explains the detailed semantics of
0242         bind, rbind, move, mount, umount and clone-namespace operations.
0243 
0244         Note: the word 'vfsmount' and the noun 'mount' have been used
0245         to mean the same thing, throughout this document.
0246 
0247 5a) Mount states
0248 
0249         A given mount can be in one of the following states
0250 
0251         1) shared
0252         2) slave
0253         3) shared and slave
0254         4) private
0255         5) unbindable
0256 
0257         A 'propagation event' is defined as event generated on a vfsmount
0258         that leads to mount or unmount actions in other vfsmounts.
0259 
0260         A 'peer group' is defined as a group of vfsmounts that propagate
0261         events to each other.
0262 
0263         (1) Shared mounts
0264 
0265                 A 'shared mount' is defined as a vfsmount that belongs to a
0266                 'peer group'.
0267 
0268                 For example::
0269 
0270                         mount --make-shared /mnt
0271                         mount --bind /mnt /tmp
0272 
0273                 The mount at /mnt and that at /tmp are both shared and belong
0274                 to the same peer group. Anything mounted or unmounted under
0275                 /mnt or /tmp reflect in all the other mounts of its peer
0276                 group.
0277 
0278 
0279         (2) Slave mounts
0280 
0281                 A 'slave mount' is defined as a vfsmount that receives
0282                 propagation events and does not forward propagation events.
0283 
0284                 A slave mount as the name implies has a master mount from which
0285                 mount/unmount events are received. Events do not propagate from
0286                 the slave mount to the master.  Only a shared mount can be made
0287                 a slave by executing the following command::
0288 
0289                         mount --make-slave mount
0290 
0291                 A shared mount that is made as a slave is no more shared unless
0292                 modified to become shared.
0293 
0294         (3) Shared and Slave
0295 
0296                 A vfsmount can be both shared as well as slave.  This state
0297                 indicates that the mount is a slave of some vfsmount, and
0298                 has its own peer group too.  This vfsmount receives propagation
0299                 events from its master vfsmount, and also forwards propagation
0300                 events to its 'peer group' and to its slave vfsmounts.
0301 
0302                 Strictly speaking, the vfsmount is shared having its own
0303                 peer group, and this peer-group is a slave of some other
0304                 peer group.
0305 
0306                 Only a slave vfsmount can be made as 'shared and slave' by
0307                 either executing the following command::
0308 
0309                         mount --make-shared mount
0310 
0311                 or by moving the slave vfsmount under a shared vfsmount.
0312 
0313         (4) Private mount
0314 
0315                 A 'private mount' is defined as vfsmount that does not
0316                 receive or forward any propagation events.
0317 
0318         (5) Unbindable mount
0319 
0320                 A 'unbindable mount' is defined as vfsmount that does not
0321                 receive or forward any propagation events and cannot
0322                 be bind mounted.
0323 
0324 
0325         State diagram:
0326 
0327         The state diagram below explains the state transition of a mount,
0328         in response to various commands::
0329 
0330             -----------------------------------------------------------------------
0331             |             |make-shared |  make-slave  | make-private |make-unbindab|
0332             --------------|------------|--------------|--------------|-------------|
0333             |shared       |shared      |*slave/private|   private    | unbindable  |
0334             |             |            |              |              |             |
0335             |-------------|------------|--------------|--------------|-------------|
0336             |slave        |shared      | **slave      |    private   | unbindable  |
0337             |             |and slave   |              |              |             |
0338             |-------------|------------|--------------|--------------|-------------|
0339             |shared       |shared      | slave        |    private   | unbindable  |
0340             |and slave    |and slave   |              |              |             |
0341             |-------------|------------|--------------|--------------|-------------|
0342             |private      |shared      |  **private   |    private   | unbindable  |
0343             |-------------|------------|--------------|--------------|-------------|
0344             |unbindable   |shared      |**unbindable  |    private   | unbindable  |
0345             ------------------------------------------------------------------------
0346 
0347             * if the shared mount is the only mount in its peer group, making it
0348             slave, makes it private automatically. Note that there is no master to
0349             which it can be slaved to.
0350 
0351             ** slaving a non-shared mount has no effect on the mount.
0352 
0353         Apart from the commands listed below, the 'move' operation also changes
0354         the state of a mount depending on type of the destination mount. Its
0355         explained in section 5d.
0356 
0357 5b) Bind semantics
0358 
0359         Consider the following command::
0360 
0361             mount --bind A/a  B/b
0362 
0363         where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
0364         is the destination mount and 'b' is the dentry in the destination mount.
0365 
0366         The outcome depends on the type of mount of 'A' and 'B'. The table
0367         below contains quick reference::
0368 
0369             --------------------------------------------------------------------------
0370             |         BIND MOUNT OPERATION                                           |
0371             |************************************************************************|
0372             |source(A)->| shared      |       private  |       slave    | unbindable |
0373             | dest(B)  |              |                |                |            |
0374             |   |      |              |                |                |            |
0375             |   v      |              |                |                |            |
0376             |************************************************************************|
0377             |  shared  | shared       |     shared     | shared & slave |  invalid   |
0378             |          |              |                |                |            |
0379             |non-shared| shared       |      private   |      slave     |  invalid   |
0380             **************************************************************************
0381 
0382         Details:
0383 
0384     1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
0385         which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
0386         mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
0387         are created and mounted at the dentry 'b' on all mounts where 'B'
0388         propagates to. A new propagation tree containing 'C1',..,'Cn' is
0389         created. This propagation tree is identical to the propagation tree of
0390         'B'.  And finally the peer-group of 'C' is merged with the peer group
0391         of 'A'.
0392 
0393     2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
0394         which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
0395         mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
0396         are created and mounted at the dentry 'b' on all mounts where 'B'
0397         propagates to. A new propagation tree is set containing all new mounts
0398         'C', 'C1', .., 'Cn' with exactly the same configuration as the
0399         propagation tree for 'B'.
0400 
0401     3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
0402         mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
0403         'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
0404         'C3' ... are created and mounted at the dentry 'b' on all mounts where
0405         'B' propagates to. A new propagation tree containing the new mounts
0406         'C','C1',..  'Cn' is created. This propagation tree is identical to the
0407         propagation tree for 'B'. And finally the mount 'C' and its peer group
0408         is made the slave of mount 'Z'.  In other words, mount 'C' is in the
0409         state 'slave and shared'.
0410 
0411     4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
0412         invalid operation.
0413 
0414     5. 'A' is a private mount and 'B' is a non-shared(private or slave or
0415         unbindable) mount. A new mount 'C' which is clone of 'A', is created.
0416         Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
0417 
0418     6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
0419         which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
0420         mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
0421         peer-group of 'A'.
0422 
0423     7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
0424         new mount 'C' which is a clone of 'A' is created. Its root dentry is
0425         'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
0426         slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
0427         'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
0428         mount/unmount on 'A' do not propagate anywhere else. Similarly
0429         mount/unmount on 'C' do not propagate anywhere else.
0430 
0431     8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
0432         invalid operation. A unbindable mount cannot be bind mounted.
0433 
0434 5c) Rbind semantics
0435 
0436         rbind is same as bind. Bind replicates the specified mount.  Rbind
0437         replicates all the mounts in the tree belonging to the specified mount.
0438         Rbind mount is bind mount applied to all the mounts in the tree.
0439 
0440         If the source tree that is rbind has some unbindable mounts,
0441         then the subtree under the unbindable mount is pruned in the new
0442         location.
0443 
0444         eg:
0445 
0446           let's say we have the following mount tree::
0447 
0448                 A
0449               /   \
0450               B   C
0451              / \ / \
0452              D E F G
0453 
0454           Let's say all the mount except the mount C in the tree are
0455           of a type other than unbindable.
0456 
0457           If this tree is rbound to say Z
0458 
0459           We will have the following tree at the new location::
0460 
0461                 Z
0462                 |
0463                 A'
0464                /
0465               B'                Note how the tree under C is pruned
0466              / \                in the new location.
0467             D' E'
0468 
0469 
0470 
0471 5d) Move semantics
0472 
0473         Consider the following command
0474 
0475         mount --move A  B/b
0476 
0477         where 'A' is the source mount, 'B' is the destination mount and 'b' is
0478         the dentry in the destination mount.
0479 
0480         The outcome depends on the type of the mount of 'A' and 'B'. The table
0481         below is a quick reference::
0482 
0483             ---------------------------------------------------------------------------
0484             |                   MOVE MOUNT OPERATION                                 |
0485             |**************************************************************************
0486             | source(A)->| shared      |       private  |       slave    | unbindable |
0487             | dest(B)  |               |                |                |            |
0488             |   |      |               |                |                |            |
0489             |   v      |               |                |                |            |
0490             |**************************************************************************
0491             |  shared  | shared        |     shared     |shared and slave|  invalid   |
0492             |          |               |                |                |            |
0493             |non-shared| shared        |      private   |    slave       | unbindable |
0494             ***************************************************************************
0495 
0496         .. Note:: moving a mount residing under a shared mount is invalid.
0497 
0498       Details follow:
0499 
0500     1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
0501         mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
0502         are created and mounted at dentry 'b' on all mounts that receive
0503         propagation from mount 'B'. A new propagation tree is created in the
0504         exact same configuration as that of 'B'. This new propagation tree
0505         contains all the new mounts 'A1', 'A2'...  'An'.  And this new
0506         propagation tree is appended to the already existing propagation tree
0507         of 'A'.
0508 
0509     2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
0510         mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
0511         are created and mounted at dentry 'b' on all mounts that receive
0512         propagation from mount 'B'. The mount 'A' becomes a shared mount and a
0513         propagation tree is created which is identical to that of
0514         'B'. This new propagation tree contains all the new mounts 'A1',
0515         'A2'...  'An'.
0516 
0517     3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
0518         mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
0519         'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
0520         receive propagation from mount 'B'. A new propagation tree is created
0521         in the exact same configuration as that of 'B'. This new propagation
0522         tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
0523         propagation tree is appended to the already existing propagation tree of
0524         'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
0525         becomes 'shared'.
0526 
0527     4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
0528         is invalid. Because mounting anything on the shared mount 'B' can
0529         create new mounts that get mounted on the mounts that receive
0530         propagation from 'B'.  And since the mount 'A' is unbindable, cloning
0531         it to mount at other mountpoints is not possible.
0532 
0533     5. 'A' is a private mount and 'B' is a non-shared(private or slave or
0534         unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
0535 
0536     6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
0537         is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
0538         shared mount.
0539 
0540     7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
0541         The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
0542         continues to be a slave mount of mount 'Z'.
0543 
0544     8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
0545         'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
0546         unbindable mount.
0547 
0548 5e) Mount semantics
0549 
0550         Consider the following command::
0551 
0552             mount device  B/b
0553 
0554         'B' is the destination mount and 'b' is the dentry in the destination
0555         mount.
0556 
0557         The above operation is the same as bind operation with the exception
0558         that the source mount is always a private mount.
0559 
0560 
0561 5f) Unmount semantics
0562 
0563         Consider the following command::
0564 
0565             umount A
0566 
0567         where 'A' is a mount mounted on mount 'B' at dentry 'b'.
0568 
0569         If mount 'B' is shared, then all most-recently-mounted mounts at dentry
0570         'b' on mounts that receive propagation from mount 'B' and does not have
0571         sub-mounts within them are unmounted.
0572 
0573         Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
0574         each other.
0575 
0576         let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
0577         'B1', 'B2' and 'B3' respectively.
0578 
0579         let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
0580         mount 'B1', 'B2' and 'B3' respectively.
0581 
0582         if 'C1' is unmounted, all the mounts that are most-recently-mounted on
0583         'B1' and on the mounts that 'B1' propagates-to are unmounted.
0584 
0585         'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
0586         on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
0587 
0588         So all 'C1', 'C2' and 'C3' should be unmounted.
0589 
0590         If any of 'C2' or 'C3' has some child mounts, then that mount is not
0591         unmounted, but all other mounts are unmounted. However if 'C1' is told
0592         to be unmounted and 'C1' has some sub-mounts, the umount operation is
0593         failed entirely.
0594 
0595 5g) Clone Namespace
0596 
0597         A cloned namespace contains all the mounts as that of the parent
0598         namespace.
0599 
0600         Let's say 'A' and 'B' are the corresponding mounts in the parent and the
0601         child namespace.
0602 
0603         If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
0604         each other.
0605 
0606         If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
0607         'Z'.
0608 
0609         If 'A' is a private mount, then 'B' is a private mount too.
0610 
0611         If 'A' is unbindable mount, then 'B' is a unbindable mount too.
0612 
0613 
0614 6) Quiz
0615 
0616         A. What is the result of the following command sequence?
0617 
0618                 ::
0619 
0620                     mount --bind /mnt /mnt
0621                     mount --make-shared /mnt
0622                     mount --bind /mnt /tmp
0623                     mount --move /tmp /mnt/1
0624 
0625                 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
0626                 Should they all be identical? or should /mnt and /mnt/1 be
0627                 identical only?
0628 
0629 
0630         B. What is the result of the following command sequence?
0631 
0632                 ::
0633 
0634                     mount --make-rshared /
0635                     mkdir -p /v/1
0636                     mount --rbind / /v/1
0637 
0638                 what should be the content of /v/1/v/1 be?
0639 
0640 
0641         C. What is the result of the following command sequence?
0642 
0643                 ::
0644 
0645                     mount --bind /mnt /mnt
0646                     mount --make-shared /mnt
0647                     mkdir -p /mnt/1/2/3 /mnt/1/test
0648                     mount --bind /mnt/1 /tmp
0649                     mount --make-slave /mnt
0650                     mount --make-shared /mnt
0651                     mount --bind /mnt/1/2 /tmp1
0652                     mount --make-slave /mnt
0653 
0654                 At this point we have the first mount at /tmp and
0655                 its root dentry is 1. Let's call this mount 'A'
0656                 And then we have a second mount at /tmp1 with root
0657                 dentry 2. Let's call this mount 'B'
0658                 Next we have a third mount at /mnt with root dentry
0659                 mnt. Let's call this mount 'C'
0660 
0661                 'B' is the slave of 'A' and 'C' is a slave of 'B'
0662                 A -> B -> C
0663 
0664                 at this point if we execute the following command
0665 
0666                 mount --bind /bin /tmp/test
0667 
0668                 The mount is attempted on 'A'
0669 
0670                 will the mount propagate to 'B' and 'C' ?
0671 
0672                 what would be the contents of
0673                 /mnt/1/test be?
0674 
0675 7) FAQ
0676 
0677         Q1. Why is bind mount needed? How is it different from symbolic links?
0678                 symbolic links can get stale if the destination mount gets
0679                 unmounted or moved. Bind mounts continue to exist even if the
0680                 other mount is unmounted or moved.
0681 
0682         Q2. Why can't the shared subtree be implemented using exportfs?
0683 
0684                 exportfs is a heavyweight way of accomplishing part of what
0685                 shared subtree can do. I cannot imagine a way to implement the
0686                 semantics of slave mount using exportfs?
0687 
0688         Q3 Why is unbindable mount needed?
0689 
0690                 Let's say we want to replicate the mount tree at multiple
0691                 locations within the same subtree.
0692 
0693                 if one rbind mounts a tree within the same subtree 'n' times
0694                 the number of mounts created is an exponential function of 'n'.
0695                 Having unbindable mount can help prune the unneeded bind
0696                 mounts. Here is an example.
0697 
0698                 step 1:
0699                    let's say the root tree has just two directories with
0700                    one vfsmount::
0701 
0702                                     root
0703                                    /    \
0704                                   tmp    usr
0705 
0706                     And we want to replicate the tree at multiple
0707                     mountpoints under /root/tmp
0708 
0709                 step 2:
0710                       ::
0711 
0712 
0713                         mount --make-shared /root
0714 
0715                         mkdir -p /tmp/m1
0716 
0717                         mount --rbind /root /tmp/m1
0718 
0719                       the new tree now looks like this::
0720 
0721                                     root
0722                                    /    \
0723                                  tmp    usr
0724                                 /
0725                                m1
0726                               /  \
0727                              tmp  usr
0728                              /
0729                             m1
0730 
0731                           it has two vfsmounts
0732 
0733                 step 3:
0734                     ::
0735 
0736                             mkdir -p /tmp/m2
0737                             mount --rbind /root /tmp/m2
0738 
0739                         the new tree now looks like this::
0740 
0741                                       root
0742                                      /    \
0743                                    tmp     usr
0744                                   /    \
0745                                 m1       m2
0746                                / \       /  \
0747                              tmp  usr   tmp  usr
0748                              / \          /
0749                             m1  m2      m1
0750                                 / \     /  \
0751                               tmp usr  tmp   usr
0752                               /        / \
0753                              m1       m1  m2
0754                             /  \
0755                           tmp   usr
0756                           /  \
0757                          m1   m2
0758 
0759                        it has 6 vfsmounts
0760 
0761                 step 4:
0762                       ::
0763                           mkdir -p /tmp/m3
0764                           mount --rbind /root /tmp/m3
0765 
0766                           I won't draw the tree..but it has 24 vfsmounts
0767 
0768 
0769                 at step i the number of vfsmounts is V[i] = i*V[i-1].
0770                 This is an exponential function. And this tree has way more
0771                 mounts than what we really needed in the first place.
0772 
0773                 One could use a series of umount at each step to prune
0774                 out the unneeded mounts. But there is a better solution.
0775                 Unclonable mounts come in handy here.
0776 
0777                 step 1:
0778                    let's say the root tree has just two directories with
0779                    one vfsmount::
0780 
0781                                     root
0782                                    /    \
0783                                   tmp    usr
0784 
0785                     How do we set up the same tree at multiple locations under
0786                     /root/tmp
0787 
0788                 step 2:
0789                       ::
0790 
0791 
0792                         mount --bind /root/tmp /root/tmp
0793 
0794                         mount --make-rshared /root
0795                         mount --make-unbindable /root/tmp
0796 
0797                         mkdir -p /tmp/m1
0798 
0799                         mount --rbind /root /tmp/m1
0800 
0801                       the new tree now looks like this::
0802 
0803                                     root
0804                                    /    \
0805                                  tmp    usr
0806                                 /
0807                                m1
0808                               /  \
0809                              tmp  usr
0810 
0811                 step 3:
0812                       ::
0813 
0814                             mkdir -p /tmp/m2
0815                             mount --rbind /root /tmp/m2
0816 
0817                       the new tree now looks like this::
0818 
0819                                     root
0820                                    /    \
0821                                  tmp    usr
0822                                 /   \
0823                                m1     m2
0824                               /  \     / \
0825                              tmp  usr tmp usr
0826 
0827                 step 4:
0828                       ::
0829 
0830                             mkdir -p /tmp/m3
0831                             mount --rbind /root /tmp/m3
0832 
0833                       the new tree now looks like this::
0834 
0835                                           root
0836                                       /           \
0837                                      tmp           usr
0838                                  /    \    \
0839                                m1     m2     m3
0840                               /  \     / \    /  \
0841                              tmp  usr tmp usr tmp usr
0842 
0843 8) Implementation
0844 
0845 8A) Datastructure
0846 
0847         4 new fields are introduced to struct vfsmount:
0848 
0849         *   ->mnt_share
0850         *   ->mnt_slave_list
0851         *   ->mnt_slave
0852         *   ->mnt_master
0853 
0854         ->mnt_share
0855                 links together all the mount to/from which this vfsmount
0856                 send/receives propagation events.
0857 
0858         ->mnt_slave_list
0859                 links all the mounts to which this vfsmount propagates
0860                 to.
0861 
0862         ->mnt_slave
0863                 links together all the slaves that its master vfsmount
0864                 propagates to.
0865 
0866         ->mnt_master
0867                 points to the master vfsmount from which this vfsmount
0868                 receives propagation.
0869 
0870         ->mnt_flags
0871                 takes two more flags to indicate the propagation status of
0872                 the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
0873                 vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
0874                 replicated.
0875 
0876         All the shared vfsmounts in a peer group form a cyclic list through
0877         ->mnt_share.
0878 
0879         All vfsmounts with the same ->mnt_master form on a cyclic list anchored
0880         in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
0881 
0882          ->mnt_master can point to arbitrary (and possibly different) members
0883          of master peer group.  To find all immediate slaves of a peer group
0884          you need to go through _all_ ->mnt_slave_list of its members.
0885          Conceptually it's just a single set - distribution among the
0886          individual lists does not affect propagation or the way propagation
0887          tree is modified by operations.
0888 
0889         All vfsmounts in a peer group have the same ->mnt_master.  If it is
0890         non-NULL, they form a contiguous (ordered) segment of slave list.
0891 
0892         A example propagation tree looks as shown in the figure below.
0893         [ NOTE: Though it looks like a forest, if we consider all the shared
0894         mounts as a conceptual entity called 'pnode', it becomes a tree]::
0895 
0896 
0897                         A <--> B <--> C <---> D
0898                        /|\            /|      |\
0899                       / F G          J K      H I
0900                      /
0901                     E<-->K
0902                         /|\
0903                        M L N
0904 
0905         In the above figure  A,B,C and D all are shared and propagate to each
0906         other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
0907         mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
0908         'E' is also shared with 'K' and they propagate to each other.  And
0909         'K' has 3 slaves 'M', 'L' and 'N'
0910 
0911         A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
0912 
0913         A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
0914 
0915         E's ->mnt_share links with ->mnt_share of K
0916 
0917         'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
0918 
0919         'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
0920 
0921         K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
0922 
0923         C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
0924 
0925         J and K's ->mnt_master points to struct vfsmount of C
0926 
0927         and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
0928 
0929         'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
0930 
0931 
0932         NOTE: The propagation tree is orthogonal to the mount tree.
0933 
0934 8B Locking:
0935 
0936         ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
0937         by namespace_sem (exclusive for modifications, shared for reading).
0938 
0939         Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
0940         There are two exceptions: do_add_mount() and clone_mnt().
0941         The former modifies a vfsmount that has not been visible in any shared
0942         data structures yet.
0943         The latter holds namespace_sem and the only references to vfsmount
0944         are in lists that can't be traversed without namespace_sem.
0945 
0946 8C Algorithm:
0947 
0948         The crux of the implementation resides in rbind/move operation.
0949 
0950         The overall algorithm breaks the operation into 3 phases: (look at
0951         attach_recursive_mnt() and propagate_mnt())
0952 
0953         1. prepare phase.
0954         2. commit phases.
0955         3. abort phases.
0956 
0957         Prepare phase:
0958 
0959         for each mount in the source tree:
0960 
0961                    a) Create the necessary number of mount trees to
0962                         be attached to each of the mounts that receive
0963                         propagation from the destination mount.
0964                    b) Do not attach any of the trees to its destination.
0965                       However note down its ->mnt_parent and ->mnt_mountpoint
0966                    c) Link all the new mounts to form a propagation tree that
0967                       is identical to the propagation tree of the destination
0968                       mount.
0969 
0970                    If this phase is successful, there should be 'n' new
0971                    propagation trees; where 'n' is the number of mounts in the
0972                    source tree.  Go to the commit phase
0973 
0974                    Also there should be 'm' new mount trees, where 'm' is
0975                    the number of mounts to which the destination mount
0976                    propagates to.
0977 
0978                    if any memory allocations fail, go to the abort phase.
0979 
0980         Commit phase
0981                 attach each of the mount trees to their corresponding
0982                 destination mounts.
0983 
0984         Abort phase
0985                 delete all the newly created trees.
0986 
0987         .. Note::
0988            all the propagation related functionality resides in the file pnode.c
0989 
0990 
0991 ------------------------------------------------------------------------
0992 
0993 version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
0994 
0995 version 0.2  (Incorporated comments from Al Viro)