• Immutable Page
  • Info
  • Attachments


Why writing files from the kernel is bad ?


The question "how to I open/read/write files from the kernel ?" is often asked on the kernelnewbies mailing list. However, the question cannot really be answered: opening, reading and writing files from within the kernel is usually a bad idea. Generally speaking, trying to use any of the sys_*() functions from the kernel itself is a bad idea.

For several reasons:

  • Selecting where and in what format to read/write data is a policy and policy does not belong to kernel. A userland daemon is much easier to replace with one that receives or sends the data over a network, generates or converts them from/to different format etc.

  • Filesystem operations need a user context (i.e.: current != NULL). You can't be sure you're in user context so you can't write something from (for example) an interrupt handler.

  • The kernel allows multiple filesystem namespaces for user processes. Which one should it use? How do you make sure it indeed uses the one you want?

  • Kernel should not depend on particular layout of a filesystem nor on availability of writable filesystem. The location of the file is a policy decision and policy decisions should be done in userspace. Maybe you want to dump the kernel output in a remote MySQL server tomorrow, that kind of policy is so much easier in userland.

  • Kernel code should be kept simple and stupid, because any bug in it is likely to have serious consequences. Working with files requires being aware of various locking issues and would add unnecessary complexity.

The good ways to exchange data with user space

There are several ways to exchange informations between userspace and the kernel, and the one to use really depends on what you want to do:

  • kernel module parameters are useful to set general configuration options for your modules

  • device firmware should be loaded through the request_firmware() API

  • sysfs is useful to get/set attributes to devices

  • debugfs

  • relayfs

  • netlink sockets

Using /proc is not anymore a good idea these days, except if you want to export information related to processes.

The good way to create device nodes

The good way is to have your device exported in sysfs and to let udev create the device node in /dev. Do not call try to call sys_mknod() from the kernel.


All that being said, there may be cases where the abovementioned ways to exchange data with userspace are not viable. For example quota reads the limits from special files, coda and intermezzo filesystems serve the content from files in cache etc. In such cases it is important to note, that you don't have a file descriptor table (syscalls have the one of the calling process and kernel threads usually drop it as part of reparenting to init), so you cannot use the sys_*() functions. Instead you have to use the functions operating on struct file directly. Be careful to check locking requirements of these functions, especially of the methods in the file_operations table.

Tell others about this page:

last edited 2006-11-08 15:35:49 by ErikMouw