How Is The Root File System Found?

One of the important kernel boot parameters is "root=", which tells the kernel where to find the root filesystem. For instance,

root=/dev/hda1

This is commonly specified as what looks like a standard Unix pathname (as above). But standard Unix pathnames are interpreted according to currently-mounted filesystems. So how do you interpret the above root pathname, before you've even mounted any filesystems?

It took me a few hours to decipher the answer to this (the following applies at least as of the 2.6.11 kernel sources). First of all, at kernel initialization time, there is an absolutely minimal filesystem registered, called "rootfs". The code that implements this filesystem can be found in fs/ramfs/inode.c, which also happens to contain the code for the "ramfs" filesystem. rootfs is basically identical to ramfs, except for the specification of the MS_NOUSER flag; this is interpreted by the routine graft_tree in fs/namespace.c, and I think it prevents userland processes doing their own mounts of rootfs.

The routine init_mount_tree (found in fs/namespace.c) is called at system startup time to mount an instance of rootfs, and make it the root namespace of the current process (remember that, under Linux, different processes can have different filesystem namespaces). This routine is called at the end of mnt_init (also in fs/namespace.c), as part of the following sequence:

sysfs_init(); /* causes sysfs to register itself--this is needed later for actually finding the root device */
init_rootfs(); /* causes rootfs to register itself */
init_mount_tree(); /* actually creates the initial filesystem namespace, with rootfs mounted at "/" */

mnt_init is called from vfs_caches_init in fs/dcache.c, which in turn is called from start_kernel in init/main.c.

The actual interpretation of the root=path parameter is done in a routine called name_to_dev_t, found in init/do_mounts.c. This tries all the various syntaxes that are supported, one of which is the form "/dev/name", where name is interpreted by doing a temporary mount of the sysfs filesystem (at its usual place, /sys), and then looking for an entry under /sys/block/name (done in the subsidiary routine try_name in the same source file). name_to_dev_t is called from prepare_namespace, which in turn is called from init in init/main.c. This routine is spawned as the first process on the system (pid 1) by a call to kernel_thread in rest_init, which comes at the end of the abovementioned start_kernel.

start_kernel is the very last routine called in the boot sequence after the kernel gets control from the bootloader (in arch/i386/kernel/head.S for the i386 architecture). It never returns, because the very last thing it does after all the initialization is call cpu_idle, which runs an endless loop for soaking up CPU time as long as the CPU doesn't have anything else to do (like run a process or service an interrupt).