= How Is The Root File System Found? =

One of the important kernel boot parameters is "root=", which tells the kernel where to find the root filesystem. For instance,

{{{
root=/dev/hda1}}}

This is commonly specified as what ''looks'' like a standard Unix pathname (as above). But standard Unix pathnames are interpreted according to currently-mounted filesystems. So how do you interpret the above root pathname, before you've even mounted any filesystems?

It took me a few hours to decipher the answer to this (the following applies at least as of the 2.6.11 kernel sources). First of all, at kernel initialization time, there is an absolutely minimal filesystem registered, called "rootfs". The code that implements this filesystem can be found in {{{fs/ramfs/inode.c}}}, which also happens to contain the code for the "ramfs" filesystem. rootfs is basically identical to ramfs, except for the specification of the {{{MS_NOUSER}}} flag; this is interpreted by the routine {{{graft_tree}}} in {{{fs/namespace.c}}}, and I think it prevents userland processes doing their own mounts of rootfs.

The routine {{{init_mount_tree}}} (found in {{{fs/namespace.c}}}) is called at system startup time to mount an instance of rootfs, and make it the root namespace of the current process (remember that, under Linux, different processes can have different filesystem namespaces). This routine is called at the end of {{{mnt_init}}} (also in {{{fs/namespace.c}}}), as part of the following sequence:

{{{
sysfs_init(); /* causes sysfs to register itself--this is needed later for actually finding the root device */
init_rootfs(); /* causes rootfs to register itself */
init_mount_tree(); /* actually creates the initial filesystem namespace, with rootfs mounted at "/" */}}}

{{{mnt_init}}} is called from {{{vfs_caches_init}}} in {{{fs/dcache.c}}}, which in turn is called from {{{start_kernel}}} in {{{init/main.c}}}.

The actual interpretation of the {{{root=}}}''path'' parameter is done in a routine called {{{name_to_dev_t}}}, found in {{{init/do_mounts.c}}}. This tries all the various syntaxes that are supported, one of which is the form "{{{/dev/}}}''name''", where ''name'' is interpreted by doing a temporary mount of the sysfs filesystem (at its usual place, {{{/sys}}}), and then looking for an entry under {{{/sys/block/}}}''name'' (done in the subsidiary routine {{{try_name}}} in the same source file). {{{name_to_dev_t}}} is called from {{{prepare_namespace}}}, which in turn is called from {{{init}}} in {{{init/main.c}}}. This routine is spawned as the first process on the system (pid 1) by a call to {{{kernel_thread}}} in {{{rest_init}}}, which comes at the end of the abovementioned {{{start_kernel}}}.

{{{start_kernel}}} is the very last routine called in the boot sequence after the kernel gets control from the bootloader (in {{{arch/i386/kernel.head.S}}} for the i386 architecture). It never returns, because the very last thing it does after all the initialization is call {{{cpu_idle}}}, which runs an endless loop for soaking up CPU time as long as the CPU doesn't have anything else to do (like run a process or service an interrupt).