How do I intercept system calls?
Intercepting system calls is one of the most common "first exercise" when trying to learn about the Linux kernel. When trying to explain how this can be done, it is important to differentiate between the "old days" (2.4 kernel series) and recent kernel versions (2.6 series).
The old days (2.4 kernel series)
Intercepting a system call means that you want a function of your own to be called instead of the kernel function implementing a given system call everytime the latter in invoked. Generally, this is achieved by writing a Loadable Kernel Module (LKM) which contains that function (e.g. myfork) along with code to hijack the targetted system call when the LKM is loaded into the kernel and restore the original setting when it is unloaded.
The key in intercepting system calls is to modify the sys_call_table kernel data structure. This table is an array which contains as many entries as there are system calls. Each system call is identified inside of the kernel by a constant (think #define) representing an integer number. These numbers are used as index in the sys_call_table to retrieve the address of the kernel function which implements that particular system call.
When a user space program needs to call a given system call, the interrupt 0x80 is raised after loading the EAX register with the system call number. The interrupt handler, in kernel space, then uses the contents of EAX as an index in the sys_call_table array and calla the function which address is stored there. Please note that modern processors offer more efficient ways to implement system calls than raising the 0x80 interupt but we will focus on this simple approach for the purpose of this explanation since it relies on the same kernel data structures than more advanced ones.
If we change the address contained in sys_call_table at the index corresponding to the fork system call number, we can replace it by the address of a function written inside of our LKM. The next time a system call to fork is processed, our function will be invoked instead of the default one. If we are careful about saving the address of the original system call function, we can even invoke it from within our own "interceptor code". This would allow us, for instance, to display a message everytime a fork system call is invoked, call the original code so that the kernel properly handles the system call, and finally return as if nothing happened.
Let's work on some code to illustrate all this, our LKM will have to:
Modify the sys_call_table when it is loaded so that a given system call is now implemented by a function of our own
- Provide the above-mentionned function in the module itself so that it will be loaded as part of the kernel and therefore addressable by it
Modify again the sys_call_table when unloaded to restore the original system call
If you are browsing the kernel source code while reading this (cf. /IDEs), the following files might be of interest to you when dealing with system calls interception;
- /include/linux/syscalls.h Signatures of system calls
- /include/asm-um/unistd.h Header for definition of system calls
/include/asm-um/arch/unistd.h Definition of the __NR constants corresponding to system calls internal numbers
A system call interception LKM for 2.4 kernel series
This sections shows the code for a 2.4 kernel series LKM able to intercept the fork system call. Be warned, this is a horrible hack based on modifying entries in the system call table, that works only on 2.4 kernels. This is strongly unrecommended - it is not safe against module unloading, it is not architecture independent, and it is just ugly anyway. You can read more about it in chapter 8 of Linux Kernel Module Programmer's Guide (Salzman, Burian, Pomerantz, 2005/05/26 version 2.6.1, http://www.tldp.org/LDP/lkmpg/2.6/lkmpg.pdf).
// base modules inclusions #include <linux/init.h> #include <linux/module.h> // for getpid() #include <linux/unistd.h> #include <asm/arch/unistd.h> MODULE_LICENSE("GPL"); extern void* sys_call_table[];
The sys_call_table symbol is exported by the kernel but we still need to have our code indicate that we are referencing an array of void pointers defined somewhere else.
int (*original_fork)(struct pt_regs); // we define original_fork as a pointer on function w/ the same prototype // as the fork system call. It is meant to store the address of the original // fork function in the kernel while we replace it w/ our own int edu_fork(struct pt_regs regs) { pid_t pid; // loging the syscall printk(KERN_ALERT "[edu] fork syscall intercepted from %d\n", current->pid);
Why not using getpid()? Get used to it, we are in kernel code now, APIs that you used to use in user space are no longer available. current is a global kernel pointer refering to the currently executing process' struct task_struct (its process control block). We're using here the pid field of this structure which, as you can guess, contains the process' pid.
// making the call to the original fork syscall pid = (*original_fork)(regs); return pid; }
The above lines are there to maintain the functionality of the system. Try to remove them and you would have broken your system; no way for the shell to fork a child to execute a command to start off with! Our function acts as a wrapper around fork; we make the call to the original system call BUT we can add before and/or after it some code of our own (logging information for isntance).
Consider that this is practical but also slows down execution speed on one of the most critical execution paths of the kernel: system calls.
static int edu_init(void) { printk( KERN_ALERT "[edu] Module successfully loaded\n"); printk( KERN_ALERT "[edu] Intercepting fork() syscall... "); original_fork = sys_call_table[__NR_fork]; sys_call_table[__NR_fork] = edu_fork;
This is where the theft takes place, note the use of the constant NR_fork instead of hardcoding the syscall number. Not much to add here once you understand the initialization of the module you're pretty much already ok with its shutdown function. These macros are there to "register" the functiosn edu_init and edu_exit as the functions to be called respectively when the LKM is loaded and unloaded from the kernel.
This fails horribly for the execve system call, and there's a very good reason for this. Let's look at the prototype of sys_execve() : Note the argument - that is not a pointer! Your attempt to intercept sys_execve in the same way is not going to work. This argument indicates that the process's registers have been saved on the stack. Code inside sys_execve actually modifies these stack locations to place the PC value register at the start of the new executable - so you must let the code access the original point in the stack! For example code that does the modification of the registers, see start_thread() called from load_elf_binary(). The simplest way to get around this problem is by calling do_execve() instead of the saved old sys_execve pointer value, duplicating the kernel's sys_execve() code. Ugly huh? Please don't ever do this in real code. If you want to provide some code in a module that kernel code needs to call, provide a hook in the kernel code as a patch, then a module on top of that (an example of this is sys_nfsservctl()).
Until the kernel series 2.4, the sys_call_table data structure was an exported kernel symbol. This meant that any LKM could reference this variable from within its code and, during the loading of the module, it would be appropriately linked. Since the kernel series 2.6, Linus Torvalds decided to no longer export this symbol. This decision can be easily justified; making it easy for LKMs to intercept system calls isn't beneficial to the overall kernel security. Indeed, many rootkits have been exploiting this technique to enable an intruder to maintain a presence (and a backdoor) on your system after (s)he left while making sure these new "features" would take great care to hide themselves from any curious eye. By making the sys_call_table symbol less accessible, the simplest system call interceptions methods are made unusable. Of course, new ones have arisen since then but their level of complexity prevent most beginners to write their own root-kit in less than 10 minutes. If you are using a 2.6 kernel, the above LKM won't work. However, alternative possibilities are available; This approach involves modifying the sources of the kernel we want to use, recompile it and then write a LKM destined to be loaded in this particular kernel only. While this looks like a way to implement a poorly portable LKM, it lets us reverse Linus' decision by hand in our own kernel and therefore conduct the manipulation without further burden. This is not a portable way to play with kernel coding, in fact it's an overall bad idea except if you do it only once just to try out the above LKM Historically, this second approach has been introduced to be able to patch the Linux kernel without having to restart it even when LKM had access to such symbol as sys_call_table. The objective was to circumvent security measures such as forbidding the loading of LKMs on a given system and / or using a hardened kernel version such as a 2.6 series which doesn't let LKMs intercept system calls easily. You can find more information about these hot patching techniques at: In a nutshell: http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html From Phrack, sd & devik, issue # 58, http://www.phrack.org/show.php?p=58&a=7 There are several recent projects that are meant to hijack system calls (among other things) in order to monitor the kernel's activity: SysCallTrack project http://syscalltrack.sourceforge.net/index.html Linux Trace Toolkit http://www.opersys.com/LTT/index.html printk( KERN_ALERT "done/n");
printk( KERN_ALERT "[edu] Starting Loging system calls\n");
return 0;
}
static void edu_exit(void)
{
sys_call_table[__NR_fork] = original_fork;
printk(KERN_ALERT "[edu] Stopping loging system calls\n");
}
module_init(edu_init);
module_exit(edu_exit);
Limits of this approach: sys_execve
asmlinkage int sys_execve(struct pt_regs regs)
What about now? (2.6 kernel series)