Armed with all of the above information, we're now ready to understand how the Linux kernel's initcall mechanism works. In fact, if you've understood most of what has been said up to this point, you already understand how it works; you might want to stop reading now and explore it on your own!
When you write a Linux kernel device driver there is a simple template that you follow. Following this template, together with some entries into the build system, a user can compile your driver either into the kernel or as a loadable module. All drivers, when loaded, have an opportunity to run a one-time initialization function. After this function is called it will never be called again for the duration of the time your driver is loaded. If your driver is used as a module, this one-time initialization function will be called when the driver is loaded. If your driver is compiled into the kernel, this one-time function is called as the system boots up. Having a kernel that has a fair amount of memory used by functions that are called once as the machine is brought up and will never be called again is a considerable waste. Therefore the kernel developers have arranged it such that all this code is put into its own ELF segment which is then tossed away once the machine is up and running (and has passed the initialization phase).
Dumping a whole bunch of code into a separate segment at compile time is a nice idea, but how do you then call all those functions at run time? The functions aren't all the same length, and it wouldn't be a very productive idea to force them all to be! Therefore it isn't possible to step through the code segment, calling functions as you go along. Although the function definitions themselves aren't the same length, luckily pointers to functions are all the same length (on the same system) so we can therefore build a table of pointers to all the initialization functions to call and step through this table calling each one in turn. Since this table is also something that is only needed at initialization time it makes sense to also put the table of function pointers into its own segment so that it too can be reclaimed after the initialization phase is complete.
Notice that the above trick of putting the initialization code into one segment and the initialization function pointer call table into another segment (both of which can be released once the machine is up and running) is only used when a device driver is compiled into the kernel. If the device driver is compiled as a module then the initialization code is handled differently.
The decision as to whether to compile something into the kernel or as a module is made not at code-writing time by the device driver writer, but at kernel configuration and build time, sometimes by someone other than the device driver writer. It is important to try to use the same code for both situations, and it makes a lot of sense to make these things very easy to handle and code for the person writing the device driver. So how are these two situations handled? By writing a bunch of macros and getting the programmers to follow a template.
I have distilled the Linux device driver writing template for a very simple driver into the following code. I have found and expanded the macros for the situation where we want to create a driver that is built into the Linux kernel. Also note that if you want to write your own device drivers and are just learning, this is not what your code would look like at all since device drivers do not contain a main()! I wrote this code in such a way so that it uses the same ideas and roughly the same code as the kernel, but in such a way that it could be played with as a regular user as code that isn't a device driver.
/* * Copyright (C) 2006 Trevor Woerner */ #include <stdio.h> typedef int (*initcall_t)(void); extern initcall_t __initcall_start, __initcall_end; #define __initcall(fn) \ static initcall_t __initcall_##fn __init_call = fn #define __init_call __attribute__ ((unused,__section__ ("function_ptrs"))) #define module_init(x) __initcall(x); #define __init __attribute__ ((__section__ ("code_segment"))) static int __init my_init1 (void) { printf ("my_init () #1\n"); return 0; } static int __init my_init2 (void) { printf ("my_init () #2\n"); return 0; } module_init (my_init1); module_init (my_init2); void do_initcalls (void) { initcall_t *call_p; call_p = &__initcall_start; do { fprintf (stderr, "call_p: %p\n", call_p); (*call_p)(); ++call_p; } while (call_p < &__initcall_end); } int main (void) { fprintf (stderr, "in main()\n"); do_initcalls (); return 0; }
Let's examine these #define's closely:
1. module_init(x) (calls __initcall(fn))
#define __initcall(fn) \ static initcall_t __initcall_##fn __init_call = fn #define __init_call __attribute__ ((unused,__section__ ("function_ptrs"))) #define module_init(x) __initcall(x);
is a macro that:
- takes a function name
defines a variable whose name is the concatenation of the string "__initcall_" plus the function's name
of type initcall_t (i.e. a function pointer)
which has the attributes assigned to it from the expansion of the __init_call macro (which just basically says to put this object (a function pointer) into its own segment called function_ptrs)
- which is assigned the value of the function's address
This macro could be shortened to:
#define module_init(fn) \ static initcall_t __initcall_##fn __attribute__ ((section ("function_ptrs"))) = fn
with no loss of generality (that I am aware of).
2. __init
- is a macro that
tells the compiler to put all of these such objects into their own segment called code_segment
Compiling this code we get... an error:
[trevor@trevor code]$ make initcalls cc initcalls.c -o initcalls /tmp/ccG4XFSM.o(.text+0x9): In function `do_initcalls': initcalls.c: undefined reference to `__initcall_start' /tmp/ccG4XFSM.o(.text+0x36):initcalls.c: undefined reference to `__initcall_end' collect2: ld returned 1 exit status make: *** [initcalls] Error 1
Oh yea, that's right, there's that symbol that doesn't appear in any of the code anywhere, just in the linker script. That's what got me started on all this in the first place! A linker script is used to make this all work. To be honest I'm not sure why they don't take advantage of the fact that the GNU linker will give you those start and end symbols for free, but there's probably a good reason (or maybe not).
Trying to create a valid linker script by hand from scratch would be a nice exercise, but not something I have the time to investigate. So instead I'll get the linker to tell me what its default linker script is and modify that to generate my required linker script. Following the lead of the kernel's linker scripts I have added the following lines to the linker script:
__initcall_start = .; function_ptrs : { *(function_ptrs) } __initcall_end = .; code_segment : { *(code_segment) }
Which results in:
[trevor@trevor code]$ gcc -Tlinker.lds -o initcalls initcalls.c [trevor@trevor code]$ ./initcalls in main() call_p: 0x804850c my_init () #1 call_p: 0x8048510 my_init () #2
It works!
The relevant objdump -t looks like:
0804850c g *ABS* 00000000 __initcall_start 0804850c l O function_ptrs 00000004 __initcall_my_init1 0804850c l d function_ptrs 00000000 function_ptrs 08048510 l O function_ptrs 00000004 __initcall_my_init2 08048514 g *ABS* 00000000 __initcall_end 08048514 l F code_segment 0000001d my_init1 08048514 l d code_segment 00000000 code_segment 08048531 l F code_segment 0000001d my_init2
Noticed how if we re-arrange the following lines from the source:
module_init (my_init2); module_init (my_init1);
The output becomes:
[trevor@trevor code]$ gcc -Tlinker.lds -o initcalls initcalls.c [trevor@trevor code]$ ./initcalls in main() call_p: 0x804850c my_init () #2 call_p: 0x8048510 my_init () #1
and
0804850c g *ABS* 00000000 __initcall_start 0804850c l O function_ptrs 00000004 __initcall_my_init2 0804850c l d function_ptrs 00000000 function_ptrs 08048510 l O function_ptrs 00000004 __initcall_my_init1 08048514 g *ABS* 00000000 __initcall_end 08048514 l F code_segment 0000001d my_init1 08048514 l d code_segment 00000000 code_segment 08048531 l F code_segment 0000001d my_init2