autofs4.txt 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520
  1. <head>
  2. <style> p { max-width:50em} ol, ul {max-width: 40em}</style>
  3. </head>
  4. autofs - how it works
  5. =====================
  6. Purpose
  7. -------
  8. The goal of autofs is to provide on-demand mounting and race free
  9. automatic unmounting of various other filesystems. This provides two
  10. key advantages:
  11. 1. There is no need to delay boot until all filesystems that
  12. might be needed are mounted. Processes that try to access those
  13. slow filesystems might be delayed but other processes can
  14. continue freely. This is particularly important for
  15. network filesystems (e.g. NFS) or filesystems stored on
  16. media with a media-changing robot.
  17. 2. The names and locations of filesystems can be stored in
  18. a remote database and can change at any time. The content
  19. in that data base at the time of access will be used to provide
  20. a target for the access. The interpretation of names in the
  21. filesystem can even be programmatic rather than database-backed,
  22. allowing wildcards for example, and can vary based on the user who
  23. first accessed a name.
  24. Context
  25. -------
  26. The "autofs4" filesystem module is only one part of an autofs system.
  27. There also needs to be a user-space program which looks up names
  28. and mounts filesystems. This will often be the "automount" program,
  29. though other tools including "systemd" can make use of "autofs4".
  30. This document describes only the kernel module and the interactions
  31. required with any user-space program. Subsequent text refers to this
  32. as the "automount daemon" or simply "the daemon".
  33. "autofs4" is a Linux kernel module with provides the "autofs"
  34. filesystem type. Several "autofs" filesystems can be mounted and they
  35. can each be managed separately, or all managed by the same daemon.
  36. Content
  37. -------
  38. An autofs filesystem can contain 3 sorts of objects: directories,
  39. symbolic links and mount traps. Mount traps are directories with
  40. extra properties as described in the next section.
  41. Objects can only be created by the automount daemon: symlinks are
  42. created with a regular `symlink` system call, while directories and
  43. mount traps are created with `mkdir`. The determination of whether a
  44. directory should be a mount trap or not is quite _ad hoc_, largely for
  45. historical reasons, and is determined in part by the
  46. *direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
  47. If neither the *direct* or *offset* mount options are given (so the
  48. mount is considered to be *indirect*), then the root directory is
  49. always a regular directory, otherwise it is a mount trap when it is
  50. empty and a regular directory when not empty. Note that *direct* and
  51. *offset* are treated identically so a concise summary is that the root
  52. directory is a mount trap only if the filesystem is mounted *direct*
  53. and the root is empty.
  54. Directories created in the root directory are mount traps only if the
  55. filesystem is mounted *indirect* and they are empty.
  56. Directories further down the tree depend on the *maxproto* mount
  57. option and particularly whether it is less than five or not.
  58. When *maxproto* is five, no directories further down the
  59. tree are ever mount traps, they are always regular directories. When
  60. the *maxproto* is four (or three), these directories are mount traps
  61. precisely when they are empty.
  62. So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
  63. directories are sometimes mount traps, and sometimes not depending on
  64. where in the tree they are (root, top level, or lower), the *maxproto*,
  65. and whether the mount was *indirect* or not.
  66. Mount Traps
  67. ---------------
  68. A core element of the implementation of autofs is the Mount Traps
  69. which are provided by the Linux VFS. Any directory provided by a
  70. filesystem can be designated as a trap. This involves two separate
  71. features that work together to allow autofs to do its job.
  72. **DCACHE_NEED_AUTOMOUNT**
  73. If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
  74. the inode has S_AUTOMOUNT set, or can be set directly) then it is
  75. (potentially) a mount trap. Any access to this directory beyond a
  76. "`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
  77. to be called. The task of this method is to find the filesystem that
  78. should be mounted on the directory and to return it. The VFS is
  79. responsible for actually mounting the root of this filesystem on the
  80. directory.
  81. autofs doesn't find the filesystem itself but sends a message to the
  82. automount daemon asking it to find and mount the filesystem. The
  83. autofs `d_automount` method then waits for the daemon to report that
  84. everything is ready. It will then return "`NULL`" indicating that the
  85. mount has already happened. The VFS doesn't try to mount anything but
  86. follows down the mount that is already there.
  87. This functionality is sufficient for some users of mount traps such
  88. as NFS which creates traps so that mountpoints on the server can be
  89. reflected on the client. However it is not sufficient for autofs. As
  90. mounting onto a directory is considered to be "beyond a `stat`", the
  91. automount daemon would not be able to mount a filesystem on the 'trap'
  92. directory without some way to avoid getting caught in the trap. For
  93. that purpose there is another flag.
  94. **DCACHE_MANAGE_TRANSIT**
  95. If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
  96. related behaviors are invoked, both using the `d_op->d_manage()`
  97. dentry operation.
  98. Firstly, before checking to see if any filesystem is mounted on the
  99. directory, d_manage() will be called with the `rcu_walk` parameter set
  100. to `false`. It may return one of three things:
  101. - A return value of zero indicates that there is nothing special
  102. about this dentry and normal checks for mounts and automounts
  103. should proceed.
  104. autofs normally returns zero, but first waits for any
  105. expiry (automatic unmounting of the mounted filesystem) to
  106. complete. This avoids races.
  107. - A return value of `-EISDIR` tells the VFS to ignore any mounts
  108. on the directory and to not consider calling `->d_automount()`.
  109. This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
  110. causing the directory not be a mount trap after all.
  111. autofs returns this if it detects that the process performing the
  112. lookup is the automount daemon and that the mount has been
  113. requested but has not yet completed. How it determines this is
  114. discussed later. This allows the automount daemon not to get
  115. caught in the mount trap.
  116. There is a subtlety here. It is possible that a second autofs
  117. filesystem can be mounted below the first and for both of them to
  118. be managed by the same daemon. For the daemon to be able to mount
  119. something on the second it must be able to "walk" down past the
  120. first. This means that d_manage cannot *always* return -EISDIR for
  121. the automount daemon. It must only return it when a mount has
  122. been requested, but has not yet completed.
  123. `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
  124. mount trap, either because it is a symbolic link or because it is
  125. not empty.
  126. - Any other negative value is treated as an error and returned
  127. to the caller.
  128. autofs can return
  129. - -ENOENT if the automount daemon failed to mount anything,
  130. - -ENOMEM if it ran out of memory,
  131. - -EINTR if a signal arrived while waiting for expiry to
  132. complete
  133. - or any other error sent down by the automount daemon.
  134. The second use case only occurs during an "RCU-walk" and so `rcu_walk`
  135. will be set.
  136. An RCU-walk is a fast and lightweight process for walking down a
  137. filename path (i.e. it is like running on tip-toes). RCU-walk cannot
  138. cope with all situations so when it finds a difficulty it falls back
  139. to "REF-walk", which is slower but more robust.
  140. RCU-walk will never call `->d_automount`; the filesystems must already
  141. be mounted or RCU-walk cannot handle the path.
  142. To determine if a mount-trap is safe for RCU-walk mode it calls
  143. `->d_manage()` with `rcu_walk` set to `true`.
  144. In this case `d_manage()` must avoid blocking and should avoid taking
  145. spinlocks if at all possible. Its sole purpose is to determine if it
  146. would be safe to follow down into any mounted directory and the only
  147. reason that it might not be is if an expiry of the mount is
  148. underway.
  149. In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
  150. VFS that this is a directory that doesn't require d_automount. If
  151. `rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
  152. mounted, it *will* fall back to REF-walk. `d_manage()` cannot make the
  153. VFS remain in RCU-walk mode, but can only tell it to get out of
  154. RCU-walk mode by returning `-ECHILD`.
  155. So `d_manage()`, when called with `rcu_walk` set, should either return
  156. -ECHILD if there is any reason to believe it is unsafe to end the
  157. mounted filesystem, and otherwise should return 0.
  158. autofs will return `-ECHILD` if an expiry of the filesystem has been
  159. initiated or is being considered, otherwise it returns 0.
  160. Mountpoint expiry
  161. -----------------
  162. The VFS has a mechansim for automatically expiring unused mounts,
  163. much as it can expire any unused dentry information from the dcache.
  164. This is guided by the MNT_SHRINKABLE flag. This only applies to
  165. mounts that were created by `d_automount()` returning a filesystem to be
  166. mounted. As autofs doesn't return such a filesystem but leaves the
  167. mounting to the automount daemon, it must involve the automount daemon
  168. in unmounting as well. This also means that autofs has more control
  169. of expiry.
  170. The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
  171. the `umount` system call. Unmounting with MNT_EXPIRE will fail unless
  172. a previous attempt had been made, and the filesystem has been inactive
  173. and untouched since that previous attempt. autofs4 does not depend on
  174. this but has its own internal tracking of whether filesystems were
  175. recently used. This allows individual names in the autofs directory
  176. to expire separately.
  177. With version 4 of the protocol, the automount daemon can try to
  178. unmount any filesystems mounted on the autofs filesystem or remove any
  179. symbolic links or empty directories any time it likes. If the unmount
  180. or removal is successful the filesystem will be returned to the state
  181. it was before the mount or creation, so that any access of the name
  182. will trigger normal auto-mount processing. In particlar, `rmdir` and
  183. `unlink` do not leave negative entries in the dcache as a normal
  184. filesystem would, so an attempt to access a recently-removed object is
  185. passed to autofs for handling.
  186. With version 5, this is not safe except for unmounting from top-level
  187. directories. As lower-level directories are never mount traps, other
  188. processes will see an empty directory as soon as the filesystem is
  189. unmounted. So it is generally safest to use the autofs expiry
  190. protocol described below.
  191. Normally the daemon only wants to remove entries which haven't been
  192. used for a while. For this purpose autofs maintains a "`last_used`"
  193. time stamp on each directory or symlink. For symlinks it genuinely
  194. does record the last time the symlink was "used" or followed to find
  195. out where it points to. For directories the field is a slight
  196. misnomer. It actually records the last time that autofs checked if
  197. the directory or one of its descendents was busy and found that it
  198. was. This is just as useful and doesn't require updating the field so
  199. often.
  200. The daemon is able to ask autofs if anything is due to be expired,
  201. using an `ioctl` as discussed later. For a *direct* mount, autofs
  202. considers if the entire mount-tree can be unmounted or not. For an
  203. *indirect* mount, autofs considers each of the names in the top level
  204. directory to determine if any of those can be unmounted and cleaned
  205. up.
  206. There is an option with indirect mounts to consider each of the leaves
  207. that has been mounted on instead of considering the top-level names.
  208. This is intended for compatability with version 4 of autofs and should
  209. be considered as deprecated.
  210. When autofs considers a directory it checks the `last_used` time and
  211. compares it with the "timeout" value set when the filesystem was
  212. mounted, though this check is ignored in some cases. It also checks if
  213. the directory or anything below it is in use. For symbolic links,
  214. only the `last_used` time is ever considered.
  215. If both appear to support expiring the directory or symlink, an action
  216. is taken.
  217. There are two ways to ask autofs to consider expiry. The first is to
  218. use the **AUTOFS_IOC_EXPIRE** ioctl. This only works for indirect
  219. mounts. If it finds something in the root directory to expire it will
  220. return the name of that thing. Once a name has been returned the
  221. automount daemon needs to unmount any filesystems mounted below the
  222. name normally. As described above, this is unsafe for non-toplevel
  223. mounts in a version-5 autofs. For this reason the current `automountd`
  224. does not use this ioctl.
  225. The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
  226. the **AUTOFS_IOC_EXPIRE_MULTI** ioctl. This will work for both direct and
  227. indirect mounts. If it selects an object to expire, it will notify
  228. the daemon using the notification mechanism described below. This
  229. will block until the daemon acknowledges the expiry notification.
  230. This implies that the "`EXPIRE`" ioctl must be sent from a different
  231. thread than the one which handles notification.
  232. While the ioctl is blocking, the entry is marked as "expiring" and
  233. `d_manage` will block until the daemon affirms that the unmount has
  234. completed (together with removing any directories that might have been
  235. necessary), or has been aborted.
  236. Communicating with autofs: detecting the daemon
  237. -----------------------------------------------
  238. There are several forms of communication between the automount daemon
  239. and the filesystem. As we have already seen, the daemon can create and
  240. remove directories and symlinks using normal filesystem operations.
  241. autofs knows whether a process requesting some operation is the daemon
  242. or not based on its process-group id number (see getpgid(1)).
  243. When an autofs filesystem it mounted the pgid of the mounting
  244. processes is recorded unless the "pgrp=" option is given, in which
  245. case that number is recorded instead. Any request arriving from a
  246. process in that process group is considered to come from the daemon.
  247. If the daemon ever has to be stopped and restarted a new pgid can be
  248. provided through an ioctl as will be described below.
  249. Communicating with autofs: the event pipe
  250. -----------------------------------------
  251. When an autofs filesystem is mounted, the 'write' end of a pipe must
  252. be passed using the 'fd=' mount option. autofs will write
  253. notification messages to this pipe for the daemon to respond to.
  254. For version 5, the format of the message is:
  255. struct autofs_v5_packet {
  256. int proto_version; /* Protocol version */
  257. int type; /* Type of packet */
  258. autofs_wqt_t wait_queue_token;
  259. __u32 dev;
  260. __u64 ino;
  261. __u32 uid;
  262. __u32 gid;
  263. __u32 pid;
  264. __u32 tgid;
  265. __u32 len;
  266. char name[NAME_MAX+1];
  267. };
  268. where the type is one of
  269. autofs_ptype_missing_indirect
  270. autofs_ptype_expire_indirect
  271. autofs_ptype_missing_direct
  272. autofs_ptype_expire_direct
  273. so messages can indicate that a name is missing (something tried to
  274. access it but it isn't there) or that it has been selected for expiry.
  275. The pipe will be set to "packet mode" (equivalent to passing
  276. `O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
  277. most one packet, and any unread portion of a packet will be discarded.
  278. The `wait_queue_token` is a unique number which can identify a
  279. particular request to be acknowledged. When a message is sent over
  280. the pipe the affected dentry is marked as either "active" or
  281. "expiring" and other accesses to it block until the message is
  282. acknowledged using one of the ioctls below and the relevant
  283. `wait_queue_token`.
  284. Communicating with autofs: root directory ioctls
  285. ------------------------------------------------
  286. The root directory of an autofs filesystem will respond to a number of
  287. ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN
  288. capability, or must be the automount daemon.
  289. The available ioctl commands are:
  290. - **AUTOFS_IOC_READY**: a notification has been handled. The argument
  291. to the ioctl command is the "wait_queue_token" number
  292. corresponding to the notification being acknowledged.
  293. - **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
  294. the error code `ENOENT`.
  295. - **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
  296. mode meaning that it stops sending notifications to the daemon.
  297. This mode is also entered if a write to the pipe fails.
  298. - **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use.
  299. - **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
  300. is really a version number for the implementation. It is
  301. currently 2.
  302. - **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned
  303. long. The value is used to set the timeout for expiry, and
  304. the current timeout value is stored back through the pointer.
  305. - **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if
  306. the filesystem could be unmounted. This is only a hint as
  307. the situation could change at any instant. This call can be
  308. use to avoid a more expensive full unmount attempt.
  309. - **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
  310. anything suitable to expire. A pointer to a packet:
  311. struct autofs_packet_expire_multi {
  312. int proto_version; /* Protocol version */
  313. int type; /* Type of packet */
  314. autofs_wqt_t wait_queue_token;
  315. int len;
  316. char name[NAME_MAX+1];
  317. };
  318. is required. This is filled in with the name of something
  319. that can be unmounted or removed. If nothing can be expired,
  320. `errno` is set to `EAGAIN`. Even though a `wait_queue_token`
  321. is present in the structure, no "wait queue" is established
  322. and no acknowledgment is needed.
  323. - **AUTOFS_IOC_EXPIRE_MULTI**: This is similar to
  324. **AUTOFS_IOC_EXPIRE** except that it causes notification to be
  325. sent to the daemon, and it blocks until the daemon acknowledges.
  326. The argument is an integer which can contain two different flags.
  327. **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
  328. and objects are expired if the are not in use.
  329. **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
  330. name to expire. This is only safe when *maxproto* is 4.
  331. Communicating with autofs: char-device ioctls
  332. ---------------------------------------------
  333. It is not always possible to open the root of an autofs filesystem,
  334. particularly a *direct* mounted filesystem. If the automount daemon
  335. is restarted there is no way for it to regain control of existing
  336. mounts using any of the above communication channels. To address this
  337. need there is a "miscellaneous" character device (major 10, minor 235)
  338. which can be used to communicate directly with the autofs filesystem.
  339. It requires CAP_SYS_ADMIN for access.
  340. The `ioctl`s that can be used on this device are described in a separate
  341. document `autofs4-mount-control.txt`, and are summarized briefly here.
  342. Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
  343. struct autofs_dev_ioctl {
  344. __u32 ver_major;
  345. __u32 ver_minor;
  346. __u32 size; /* total size of data passed in
  347. * including this struct */
  348. __s32 ioctlfd; /* automount command fd */
  349. __u32 arg1; /* Command parameters */
  350. __u32 arg2;
  351. char path[0];
  352. };
  353. For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
  354. filesystem is identified by the `path`. All other commands identify
  355. the filesystem by the `ioctlfd` which is a file descriptor open on the
  356. root, and which can be returned by **OPEN_MOUNT**.
  357. The `ver_major` and `ver_minor` are in/out parameters which check that
  358. the requested version is supported, and report the maximum version
  359. that the kernel module can support.
  360. Commands are:
  361. - **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and
  362. set version numbers.
  363. - **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
  364. on the root of an autofs filesystem. The filesystem is identified
  365. by name and device number, which is stored in `arg1`. Device
  366. numbers for existing filesystems can be found in
  367. `/proc/self/mountinfo`.
  368. - **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
  369. - **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in
  370. catatonic mode, this can provide the write end of a new pipe
  371. in `arg1` to re-establish communication with a daemon. The
  372. process group of the calling process is used to identify the
  373. daemon.
  374. - **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
  375. name within the filesystem that has been auto-mounted on.
  376. arg1 is the dev number of the underlying autofs. On successful
  377. return, `arg1` and `arg2` will be the UID and GID of the process
  378. which triggered that mount.
  379. - **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
  380. mountpoint of a particular type - see separate documentation for
  381. details.
  382. - **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
  383. - **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
  384. - **AUTOFS_DEV_IOCTL_READY_CMD**:
  385. - **AUTOFS_DEV_IOCTL_FAIL_CMD**:
  386. - **AUTOFS_DEV_IOCTL_CATATONIC_CMD**:
  387. - **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**:
  388. - **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
  389. - **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**: These all have the same
  390. function as the similarly named **AUTOFS_IOC** ioctls, except
  391. that **FAIL** can be given an explicit error number in `arg1`
  392. instead of assuming `ENOENT`, and this **EXPIRE** command
  393. corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
  394. Catatonic mode
  395. --------------
  396. As mentioned, an autofs mount can enter "catatonic" mode. This
  397. happens if a write to the notification pipe fails, or if it is
  398. explicitly requested by an `ioctl`.
  399. When entering catatonic mode, the pipe is closed and any pending
  400. notifications are acknowledged with the error `ENOENT`.
  401. Once in catatonic mode attempts to access non-existing names will
  402. result in `ENOENT` while attempts to access existing directories will
  403. be treated in the same way as if they came from the daemon, so mount
  404. traps will not fire.
  405. When the filesystem is mounted a _uid_ and _gid_ can be given which
  406. set the ownership of directories and symbolic links. When the
  407. filesystem is in catatonic mode, any process with a matching UID can
  408. create directories or symlinks in the root directory, but not in other
  409. directories.
  410. Catatonic mode can only be left via the
  411. **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
  412. autofs, name spaces, and shared mounts
  413. --------------------------------------
  414. With bind mounts and name spaces it is possible for an autofs
  415. filesystem to appear at multiple places in one or more filesystem
  416. name spaces. For this to work sensibly, the autofs filesystem should
  417. always be mounted "shared". e.g.
  418. > `mount --make-shared /autofs/mount/point`
  419. The automount daemon is only able to mange a single mount location for
  420. an autofs filesystem and if mounts on that are not 'shared', other
  421. locations will not behave as expected. In particular access to those
  422. other locations will likely result in the `ELOOP` error
  423. > Too many levels of symbolic links