KernelNewbies:

The big kernel lock (BKL) is an old serialization method that we are trying to get rid of, replacing it with more fine-grained locking, in particular mutex, spinlock and RCU, where appropriate.

The BKL is a recursive lock, meaning that you can take it from a thread that already holds it. This may sound convenient, but easily introduces all sorts of bugs. Another problem is that the BKL is automatically released when a thread sleeps. This avoids lock order problems with mutexes in some circumstances, but also creates more problems because it makes it really hard to track what code is executed under the lock.

A number of areas need to take care of independently:

llseek

The problem is in the llseek callback of the struct file_operations. Drivers and filesystems implement it to move the file pointer.

The problem arises when it's not implemented by a driver or a filesystem. In this case, the VFS layer calls a default one called default_llseek() that just change the file pointer and does nothing else. But to protect against concurrent calls to llseek on a same file, default_llseek() protects this file pointer change using the BKL.

The thing is rather evil because not only do we have a lot of existing drivers that don't implement llseek, but also every new driver/filesystem that gets merged and that don't implement llseek falls back to the default_llseek() implementation.

It means two things: It can't stop bleeding and it does more and more. And we can't remove the BKL (or at least making it modular) until we get rid of default_llseek() (or at least making it modular :-) ).

So the strategy is to give a sane llseek implementation to these drivers, once there is no llseek stub, we can start punching default_llseek() (ie: making it modular; build it only if drivers depend on the BKL, or may be an even more granular dependency).

The sane existing implementations are the following:

...and about this protection

ioctl

Like llseek, file_operations can contain a .ioctl callback. This is always called with the BKL held. In order to remove the BKL from the core VFS code, all file_operations should be converted to use the .unlocked_ioctl callback instead. This can be done in one of two ways:

A patch to do the second approach for all remaining drivers is on its way.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/4/16/4559539 http://kerneltrap.org/mailarchive/linux-kernel/2010/4/16/4559539].

TTY layer

Probably the most central part of the BKL removal is TTY layer, which used to heavily rely on the BKL. Alan Cox has been working for years on introducing sane locking to the TTY code that will eventually let us remove the BKL, but this may still take some time to get there.

Also, a patch series from Arnd Bergmann exists to take a shortcut, by separating the BKL usage in the TTY layer from the usage outside of it.

This series introduces a new Big TTY Mutex that is based on the earlier implementation of the Big Kernel Semaphore, but comes with a number of changes:

The first eight patches convert all the code using the BKL in the TTY layer and related drivers to the new interface, while the final patch adds the real mutex implementation as an experimental configuration option.

When that option is disabled, the behaviour should be basically unchanged regarding serialization against other subsystems using the BKL.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/3/30/4553357 http://kerneltrap.org/mailarchive/linux-kernel/2010/3/30/4553357].

Block layer

There is a patch to remove the BKL from the block layer.

In there, the BKL is used mainly for serializing the blkdev_get and blkdev_put functions and some ioctl implementations as well as some less common open functions of related character devices following a previous pushdown and parts of the blktrace code.

The only one that seems to be a bit nasty is the blkdev_get function which is actually recursive and may try to get the BKL twice.

All users except the one in blkdev_get seem to be outermost locks, meaning we don't rely on the release-on-sleep semantics to avoid deadlocks.

The ctl_mutex (pktcdvd.ko), raw_mutex (raw.ko), state_mutex (dasd.ko), reconfig_mutex (md.ko), and jfs_log_mutex (jfs.ko) may be held when blkdev_get is called, but as far as I can tell, these mutexes are never acquired from any of the functions that get converted in this patch.

In order to get rid of the BKL, this introduces a new global mutex called blkdev_mutex, which replaces the BKL in all drivers that directly interact with the block layer. In case of blkdev_get, the mutex is moved outside of the function itself in order to avoid the recursive taking of blkdev_mutex.

Testing so far has shown no problems whatsoever from this patch, but the usage in blkdev_get may introduce extra latencies, and I may have missed corner cases where an block device ioctl function sleeps for a significant amount of time, which may be harmful to the performance of other threads.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/4/14/4558922 http://kerneltrap.org/mailarchive/linux-kernel/2010/4/14/4558922].

File locking (fs/locks.c)

One of the oldest patches to remove the BKL, which still has not been merged is for the file locking. The patch itself should be fairly stable at this point, but there is still interaction with how the BKL is used in the NFS file system, in particular lockd, which runs for its entire life time with the BKL held.

The hard part here is to find out what data structures in NFS actually need to be protected by lock_flock instead of lock_kernel.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/4/14/4558923 http://kerneltrap.org/mailarchive/linux-kernel/2010/4/14/4558923].

Super block operations

A patch series from Jan Blunck removes the BKL from the generic file system mount code by pushing it into those file systems that still need it.

The patches appear to be stable, but have not made it upstream yet.

More info: [http://kerneltrap.com/mailarchive/linux-fsdevel/2009/11/18/6582233 http://kerneltrap.com/mailarchive/linux-fsdevel/2009/11/18/6582233].

USB layer

Andi Kleen has posted a patch series removing the BKL from all central parts of the USB device driver layer.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/3/29/4552603 http://kerneltrap.org/mailarchive/linux-kernel/2010/3/29/4552603].

Direct rendering manager

The DRM code still uses the BKL in ugly ways, a solution is needed.

More info: [http://kerneltrap.org/mailarchive/linux-kernel/2010/4/23/4562233 http://kerneltrap.org/mailarchive/linux-kernel/2010/4/23/4562233].

init/main.c

The initial kernel thread at boot time runs with the BKL held. There does not seem to be any reason for this, at least not once all the other users have been removed. It will be trivial to remove this instance of the BKL.

procfs core (ready)

There is no more use of the BKL in the procfs core. llseek stubs have been replaced with saner llseek default implementations and every users of procfs that had an .ioctl callback implemented got the Big Kernel Lock pushed down in their ioctl implementation, making them using the .unlocked_ioctl callback.

Now that there is no more users of ioctl in procfs, the ioctl callback is called without the BKL and procfs warns users about the deprecation.

This patch-set waits for the [:Linux_2_6_35:2.6.35] merge window to be proposed upstream. Users of procfs that don't implement llseek still need to be fixed though.

Tree: [http://git.kernel.org/?p=linux/kernel/git/frederic/random-tracing.git;a=shortlog;h=refs/heads/bkl/procfs http://git.kernel.org/?p=linux/kernel/git/frederic/random-tracing.git;a=shortlog;h=refs/heads/bkl/procfs].

ptrace (ready)

ptrace was using the BKL to protect the ptrace syscall. It has been proven useless though. A patch to remove it is ready and waits for the [:Linux_2_6_35:2.6.35] merge window to be proposed upstream.

Tree: [http://git.kernel.org/?p=linux/kernel/git/frederic/random-tracing.git;a=shortlog;h=refs/heads/bkl/core http://git.kernel.org/?p=linux/kernel/git/frederic/random-tracing.git;a=shortlog;h=refs/heads/bkl/core].

Remaining drivers

When all the above changes have been done the base kernel no longer needs the BKL, but there are still a number of modules that need it. We need to mark these as 'depends on BKL' in Kconfig or remove the BKL from them, one at a time:

 * sound/soundcore.ko
 * sound/soc/snd-soc-core.ko
 * sound/oss/sound.ko
 * sound/oss/msnd_pinnacle.ko
 * sound/oss/msnd_classic.ko
 * sound/core/snd.ko
 * sound/core/snd-pcm.ko
 * sound/core/seq/snd-seq.ko
 * sound/core/oss/snd-pcm-oss.ko
 * net/x25/x25.ko
 * net/wanrouter/wanrouter.ko
 * net/sunrpc/sunrpc.ko
 * net/irda/irnet/irnet.ko
 * net/irda/irda.ko
 * net/ipx/ipx.ko
 * net/appletalk/appletalk.ko
 * fs/ufs/ufs.ko
 * fs/udf/udf.ko
 * fs/squashfs/squashfs.ko
 * fs/smbfs/smbfs.ko
 * fs/reiserfs/reiserfs.ko
 * fs/qnx4/qnx4.ko
 * fs/ocfs2/ocfs2_stack_user.ko
 * fs/ocfs2/ocfs2.ko
 * fs/nfsd/nfsd.ko
 * fs/nfs/nfs.ko
 * fs/ncpfs/ncpfs.ko
 * fs/lockd/lockd.ko
 * fs/jffs2/jffs2.ko
 * fs/isofs/isofs.ko
 * fs/hpfs/hpfs.ko
 * fs/hfsplus/hfsplus.ko
 * fs/freevxfs/freevxfs.ko
 * fs/fat/vfat.ko
 * fs/fat/msdos.ko
 * fs/fat/fat.ko
 * fs/ecryptfs/ecryptfs.ko
 * fs/coda/coda.ko
 * fs/autofs4/autofs4.ko
 * fs/autofs/autofs.ko
 * fs/afs/kafs.ko
 * fs/adfs/adfs.ko
 * drivers/usb/misc/usblcd.ko
 * drivers/usb/misc/sisusbvga/sisusbvga.ko
 * drivers/usb/misc/rio500.ko
 * drivers/usb/misc/iowarrior.ko
 * drivers/usb/misc/idmouse.ko
 * drivers/usb/gadget/gadgetfs.ko
 * drivers/usb/gadget/g_printer.ko
 * drivers/usb/class/usblp.ko
 * drivers/telephony/ixj.ko
 * drivers/scsi/st.ko
 * drivers/scsi/scsi_tgt.ko
 * drivers/scsi/pmcraid.ko
 * drivers/scsi/osst.ko
 * drivers/scsi/osd/osd.ko
 * drivers/scsi/mpt2sas/mpt2sas.ko
 * drivers/scsi/megaraid/megaraid_sas.ko
 * drivers/scsi/megaraid/megaraid_mm.ko
 * drivers/scsi/megaraid.ko
 * drivers/scsi/gdth.ko
 * drivers/scsi/dpt_i2o.ko
 * drivers/scsi/ch.ko
 * drivers/scsi/aacraid/aacraid.ko
 * drivers/scsi/3w-xxxx.ko
 * drivers/scsi/3w-sas.ko
 * drivers/scsi/3w-9xxx.ko
 * drivers/rtc/rtc-m41t80.ko
 * drivers/pci/hotplug/cpqphp.ko
 * drivers/net/wireless/ray_cs.ko
 * drivers/net/wireless/airo.ko
 * drivers/net/wan/cosa.ko
 * drivers/net/ppp_generic.ko
 * drivers/mtd/ubi/ubi.ko
 * drivers/mtd/mtdchar.ko
 * drivers/misc/phantom.ko
 * drivers/message/i2o/i2o_config.ko
 * drivers/message/fusion/mptctl.ko
 * drivers/media/video/zoran/zr36067.ko
 * drivers/media/video/videodev.ko
 * drivers/media/video/usbvision/usbvision.ko
 * drivers/media/video/usbvideo/vicam.ko
 * drivers/media/video/tlg2300/poseidon.ko
 * drivers/media/video/stv680.ko
 * drivers/media/video/stradis.ko
 * drivers/media/video/stkwebcam.ko
 * drivers/media/video/se401.ko
 * drivers/media/video/s2255drv.ko
 * drivers/media/video/pwc/pwc.ko
 * drivers/media/video/dabusb.ko
 * drivers/media/video/cx88/cx8800.ko
 * drivers/media/video/cx88/cx88-blackbird.ko
 * drivers/media/video/cx23885/cx23885.ko
 * drivers/media/video/cpia.ko
 * drivers/media/video/bt8xx/bttv.ko
 * drivers/media/radio/si470x/radio-usb-si470x.ko
 * drivers/media/dvb/ttpci/dvb-ttpci.ko
 * drivers/media/dvb/firewire/firedtv.ko
 * drivers/media/dvb/dvb-core/dvb-core.ko
 * drivers/media/dvb/bt8xx/dst_ca.ko
 * drivers/isdn/mISDN/mISDN_core.ko
 * drivers/isdn/i4l/isdn.ko
 * drivers/isdn/hysdn/hysdn.ko
 * drivers/isdn/hardware/eicon/divas.ko
 * drivers/isdn/hardware/eicon/diva_mnt.ko
 * drivers/isdn/hardware/eicon/diva_idi.ko
 * drivers/isdn/divert/dss1_divert.ko
 * drivers/isdn/capi/capifs.ko
 * drivers/isdn/capi/capi.ko
 * drivers/input/serio/serio_raw.ko
 * drivers/input/misc/uinput.ko
 * drivers/infiniband/core/rdma_ucm.ko
 * drivers/infiniband/core/ib_uverbs.ko
 * drivers/infiniband/core/ib_umad.ko
 * drivers/infiniband/core/ib_ucm.ko
 * drivers/ide/ide-tape.ko
 * drivers/hwmon/fschmd.ko
 * drivers/hid/usbhid/usbhid.ko
 * drivers/hid/hid.ko
 * drivers/gpu/drm/i830/i830.ko
 * drivers/gpu/drm/i810/i810.ko
 * drivers/gpu/drm/drm.ko
 * drivers/char/toshiba.ko
 * drivers/char/tlclk.ko
 * drivers/char/stallion.ko
 * drivers/char/raw.ko
 * drivers/char/ppdev.ko
 * drivers/char/pcmcia/cm4040_cs.ko
 * drivers/char/pcmcia/cm4000_cs.ko
 * drivers/char/mwave/mwave.ko
 * drivers/char/lp.ko
 * drivers/char/istallion.ko
 * drivers/char/ipmi/ipmi_watchdog.ko
 * drivers/char/ipmi/ipmi_devintf.ko
 * drivers/char/ip2/ip2.ko
 * drivers/char/i8k.ko
 * drivers/char/dtlk.ko
 * drivers/char/applicom.ko
 * drivers/block/pktcdvd.ko
 * drivers/block/paride/pt.ko
 * drivers/block/paride/pg.ko
 * drivers/block/DAC960.ko


KernelNewbies: BigKernelLock (last edited 2010-05-18 20:00:50 by StefanR)