• Immutable Page
  • Info
  • Attachments


Kernel Janitor's List Maintained by Arnaldo Carvalho de Melo <acme at conectiva dot com dot br>

  • Dave Jones <davej at suse dot de> and Jeff Garzik <jgarzik at mandrakesoft dot com>

Please send additions/fixes to kernel-janitor-discuss@lists.sourceforge.net Last Updated in: Wed May 30 18:04:19 EST 2001

This TODO list is maintained in CVS, you can get the latest version in http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/kernel-janitor/kernel-janitor

You should look also at http://linux24.sourceforge.net for more things to do.

E-mail addresses are masked to avoid spammers.

- get rid of check_region, use just request_region checking its return (2.2

  • request_region returned void) and now the driver init sequence is not to be serialized anymore, so races are possible (look at cardbus/pcihotplug code)

    • - check isapnp.c: doesn't release regions on failure

- uninitialize static variables initialized to 0, to make it go to the .bss,

  • instead of .data initdata has to be initialized, even if to 0 tough

- request_irq needs to be checked

- check all init_etherdev return

  • - drivers allocating net_device with init_etherdev doesn't need zeroing it

    • (init_etherdev does this for us)

- check all kmalloc, vmaloc, skb_alloc, etcalloc

  • - check drivers/char/ip2main.c

- register_netdev has to be checked as well

- release previously successful allocations on failure

  • - use forward gotos to release previously successfull allocations

- convert drivers to new PCI API

- remove uneeded historic code

  • - checking for NULL on probe routines for net drivers

- proc_register() is dead. Use create_proc_read_entry() instead. (from Al Viro on lkml)

- check proc_*_create result, again, it can fail, and is common (bad) practice

  • in most of the kernel sources

- check freeing skbs with kfree instead of kfree_skb

- check drivers/ide/ide-probe.c, full of unchecked kmallocs

- get rid of panic function in drivers (watchdogs need to use machine_restart

  • instead of panic 8) ) - several char drivers do this happily :(

- get rid of isa_read/write[bwl], use ioremap instead

  • - unsigned long addr = 0xC0000; - isa_writeb (0xC0, addr); + void *cookie = ioremap (0xC0000, region_size); + if (!cookie) { fail... } + writeb (0xC0, cookie); + iounmap (cookie);

Note there is a ioremap step at driver init time, and an iounmap step at driver shutdown time. isa_xxx simply doesn't have the remap/unmap stuff.

Read Documentation/IO-mapping.txt for more details.

- sed s/return EWHATEVER/return -EWHATEVER/

- check misc_register return (yes, it can fail, murphy's law applies here as

  • well)

- converting cli to spinlocks (look at net/netrom/*.c, etc)

- check that net_device interrupt functions use dev_kfree_skb_irq and not

  • just dev_kfree_skb

- get rid of verify_area with copy_*_user get/put_user, only needed if

  • using copy_*_user et al

    • - to make sure that copy_to_user et all are checked

      • - look at drivers/char/generic_serial.c - look at drivers/char/n_tty.c

      - use "return copy_to_user(...) ? -EFAULT : ok_value;"

      • - check drivers/char/dz.c wrt return copy*user(...);

        • has to be ... ? -EFAULT : 0;

Date: Wed, 14 Feb 2001 12:51:53 +0100 From: Manfred Spraul <manfred at colorfullife dot com>

bug in network drivers:

* dev->mem_start: NULL means "not command line configuration" 0xffffffff means "default". several drivers only check for NULL, not for 0xffffffff.

And then: Date: Wed, 14 Feb 2001 05:54:34 -0600 (CST) From: Jeff Garzik <jgarzik at mandrakesoft dot mandrakesoft dot com>

netdev->mem_start is unsigned long... Should the test be for ~0 instead? The value 0xFFFFFFFF seems wrong for 64-bit machines.

  • - drivers that try to find multiple boards, possibly successfully allocating

    • for the first ones, then failing for, lets say, the third board, then it just returns failure for init_module, the module is unloaded but the resources remain allocated... in these cases we need to rollback the allocations, freeing it before returning from init_module or equivalent.

- prumpf suggested:

  • - make sure drivers never read loops_per_sec - it might change under

    • them (prumpf did this in 2.2.18pre series, need forward port to 2.4)

    - fix watchdog drivers to use link order rather than explicit

    • initialization calls (i810 is particularly broken)

    - get rid of init_module / cleanup_module (softdog in particular)

    - make sure BUG() is used correctly (i.e. if(function()) BUG(); is

    • evil) i.e. even when no opping BUG we still have an if

- prumpf suggested, rmk agreed

  • - get rid of save_flags_cli, use local_irq_save instead

Maybe: - check drivers/scsi/ips.c for resource leaks (ips_release doesn't seems

  • to release all the kmalloc memory it got in ips_detect. (quick look)

From: Hans Grobler <grobh at sun dot ac dot za>

- audit an ioctl function to make sure there is no way a user can crash

  • the machine or do sensitive stuff. for example, check for the necessary capabilities. check that no userspace pointers are accessed directly. be carefull with this... it takes a while to figure out which ioctls are covered by the parent ioctl function (these can be nested to great depths).

- check for balanced *_lock_* and *_unlock_* along all paths in a function

  • (can cause dead-lock if not).

- check for dev_close calls without rntl_lock held (cases assertion

  • failures).

- check that a freed pointer (*kfree_*) is not used again.

- ioremap must be balanced by an iounmap (this causes a memory leak).

From: Rasmus Andersen <rasmus at jaquet dot dk>

Stuff I observed in drivers/scsi:

o no check of scsi_register's return value o get_free_pages should be added to the list of *allocs to check o check for matching pairs of pci_alloc_consistent and pci_free_consistent

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

* ALL PCI drivers should call pci_enable_device --before-- reading pdev->irq or pdev->resource[]. irq and resource[] may not have correct values until after PCI hotplug setup occurs at pci_enable_device() time. Many PCI drivers need to be evaluated and checked for this. Many of these fixed already at: http://www.lib.uaa.alaska.edu/linux-kernel/archive/2001-Week-11/0150.html

* Net drivers should call SET_MODULE_OWNER instead of MOD_{INC,DEC}_USE_COUNT

* Net drivers should set dev->last_rx immediately after netif_rx

From: David Woodhouse <dwmw2 at infradead dot org>

- Anything which uses sleep_on() has a 90% chance of being broken. Fix

  • them all, because we want to remove sleep_on() and friends in 2.5.

Explanation by Andi Kleen <ak at suse dot de> lkml thread at: http://boudicca.tux.org/hypermail/linux-kernel/2001week05/0305.html

> don't even know WHY it's bad. Not only that, but what am I supposed to use > instead?

You can miss wakeups. The standard pattern is:

  • get locks

    add_wait_queue(&waitqueue, &wait); for (;;) {

    • if (condition you're waiting for is true)

      • break;

      unlock any non sleeping locks you need for condition set_task_state(current, TASK_UNINTERRUPTIBLE); schedule(); set_task_state(current, TASK_RUNNING); reaquire locks

    } remove_wait_queue(&waitqueue, &wait);

When you want to handle signals you can check for them before or after the condition check. Also use TASK_INTERRUPTIBLE in this case.

When you need a timeout use schedule_timeout().

For some cases you can also use the wait_event_* macros which encapsulate that for you, assuming condition can be tested/used lockless.

An alternative is to use a semaphore, although that behaves a bit differently under load.

From: David Woodhouse <dwmw2 at infradead dot org>

> Ok, so how should this code have been written?

DECLARE_WAIT_QUEUE(wait, current);

while(1) {

  • do_something_important()

    set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&q, &wait);

    /* Now if something arrives, we'll be 'woken' immediately -

    • - that is; our state will be set to TASK_RUNNING */

    if (!stuff_to_do()) {

    • /* If the 'stuff' arrives here, we get woken anyway,

      • so the schedule() returns immediately. You can use schedule_timeout() here if you need a timeout, obviously */



    set_current_state(TASK_RUNNING); remove_wait_queue(&q, &wait);

    if (signal_pending(current)) {

    • /* You've been signalled. Deal with it. If you don't

      • want signals to wake you, use TASK_UNINTERRUPTIBLE above instead of TASK_INTERRUPTIBLE. Be aware that you'll add one to the load average all the time your task is sleeping then. */

      return -EINTR;



Alternatively, you could up() a semaphore for each task that's do be done, and down() it again each time you remove one from the queue.

From: David Woodhouse <dwmw2 at infradead dot org>

- Oh, and there are still kernel threads which don't use up_and_exit() for

  • dying.

From: Andrew Morton <andrewm at uow dot edu dot au>

Here - have about 300 bugs:

A lot of the timer deletion races are hard to fix because of the deadlock problem.

Date: Mon, 26 Feb 2001 16:53:50 -0800 (PST) From: "David S. Miller" <davem at redhat dot com>

Andrew Morton writes:

  • > Could this be a driver problem?

    > netif_rx(skb); ... > new_skb = skb;

This is illegal and broken and will never work.

Once you give an skb to netif_rx() it is not yours to reference any longer.

Donald Becker added:

Easier fix: - np->stats.rx_bytes += skb->len; + np->stats.rx_bytes += pkt_len;

Grouping the writes to np->stats results in better cache usage.

Jeff Garzik added:

It makes use of the existing local 'pkt_len', and it checks off another item that should probably be on the janitor's todo list: Set 'dev->last_rx=jiffies' immediately after netif_rx.

  • Jeff

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

> It is highly recommended to always compile with CONFIG_ISAPNP=y due > to these differences. If you grep around for CONFIG_ISAPNP versus > CONFIG_ISAPNP_MODULE, you'll see that many drivers are woefully > unprepared for isapnp support compiled as a module.

Yep.. grep for CONFIG_ISAPNP, look at the code, and evaluate it to make sure that isapnp works for that drivers regardless of whether CONFIG_ISAPNP -or- CONFIG_ISAPNP_MODULE is defined.

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

Audit all memcpy and memmove calls.

If calling memmove, see if the areas ever overlap. If not, use memcpy instead.

If calling memcpy, see if the areas ever overlap. If so, use memmmove instead.

From: Andrey Panin <pazke at orbita dot don dot sitek dot net>

- check printk() calls (should include appropriate KERN_* constant).

From Dave Jones <davej at suse dot de>

- Lots of drivers are doing evil looking things writing to PCI_CACHE_LINE_SIZE

  • This should be done during PCI initialisation.

- Checking LINUX_VERSION_CODE & KERNEL_VERSION for specific versions of

  • 2.3 should be checked.

- Lots of unnecessary casts in drivers can be removed.

  • Example: - struct netdev_private *np = (struct netdev_private *)dev->priv; + struct netdev_private *np = dev->priv; Patch for a lot of these is available from ftp.suse.com/pub/people/davej/kernel/2.4/2.4.2ac20/unneeded-net-casts.diff

- sound/maestro3.c doesn't pci_free_consistent any buffers if

  • one allocation fails, but others succeeded.

- drivers/net/aironet*.c could use some functions being made statics.

- de4x5.c doesn't pci_enable_device()

- Many drivers have alignments such as..

  • dev->priv = (void *)(((unsigned long)dev->priv + 7) & ~7); jgarzik:

    • What they are doing here is DMA'ing into their private board (dev->priv), which is wrong -- they should be using PCI DMA instead. They can't use init_etherdev because it doesn't pass GFP_DMA to kmalloc for dev->priv, and since they are DMA'ing into dev->priv, they have to do things manually.

- grep POLYNOMIAL drivers/net/*

  • Maybe putting this common #define in a .h would be a good idea ?

- drivers/net/rcpci45.c

  • Quiten debug printk's in RCioctl()

- Replace uses of suser() and fsuser() with capability checks.

- Check kmallocs for things like GFP_DMA without a memtype.

- Strtok() doesn't seem SMP safe. Replace with strsep() where possible.

  • Read lib/string.c:14

- Signedness issues. We have _lots_ of these.

  • Adding a -W to the kernel build line shows them. We have lots of code paths that are not executed because comparisons <0 on an unsigned int always evaluates to true.

    We also need to check for things like... char foo[4]={1,2,3,133}; /* 133 in signed char!*/

- Audit return codes for..

  • misc_register() request_region() register_reboot_notifier()

  • and handle failure properly.

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

pci_map_* might return 0 for a valid mapping. Some code tests mapping for a non-zero value, which is incorrect.

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

read_eeprom can be marked init in other net drivers also.

akpm: well, not blindly, of course. Need to follow the call graph,

  • make sure it's safe.

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

The string form

  • [const] char *foo = "blah";

creates two variables in the final assembly output, a static string, and a char pointer to the static string. The alternate string form

  • [const] char foo[] = "blah";

is better because it declares a single variable.

For variables marked initdata, the "*foo" form causes only the pointer, not the string itself, to be dropped from the kernel image, which is a bug. Using the "foo[]" form with regular 'ole local variables also makes the assembly shorter.

Look at the diff between include/linux/etherdevice.h in pre7 versus pre8. The assembly for the pre8 version is ~10 lines shorter, with 6 of those removed lines being the string constant stored separately. Maybe this should go into CodingStyle?

From: Jeff Garzik <jgarzik at mandrakesoft dot com>

1) "unsigned int" is preferred to "int", it generates better asm code on all platforms except sh5. This replacement needs to be done manually, because often 'int' is required due to negative values -Exxx commonly passed as error values.

2) someone who knows DocBook, or is willing to learn, should go through and clean up Documentation/DocBook to kill all the warnings that occur during "make pdfdocs" and generally make the documents look nicer, and render smaller PDFs. See, not everyone needs to be a kernel hacker to really help out :)

Pointed out by dwmw2: Audit drivers usage of the return value of ioremap. Some are aparently using this as a pointer, which is wrong.

Tell others about this page:

last edited 2008-10-19 00:30:44 by MikeSteiner