FAQ/TestWpBit - Linux Kernel Newbies

test_wp_bit, or how exception fixups work

During early bootup on the i386 architecture, the kernel checks whether the CPU enforces the write protect bit while running in ring 0 (supervisor mode). This happens fairly early on during kernel bootup, and looks like this:

Checking if this processor honours the WP bit even in supervisor mode... Ok.

Linux tests this by trying to write to a read-only page, and checking whether that write operation fails.

Not very spectacular huh? Think again...

A write to a read-only page normally results in a page fault CPU exception. The page fault handler detects that the page is a kernel page and already present, so the instruction gets restarted. Of course, the page is still read-only, and your kernel would get stuck in an infinite loop.

Exception table

What we need to do to finish booting is for the kernel to skip to another instruction, instead of restarting the instruction that page faulted.

This is done by adding an exception table entry, specifying that if an exception occurs at a certain address, execution should resume at the fixup address, after return from the exception handler.

In the do_test_wp_bit function below, this is indicated with the __ex_table section bit. If the CPU triggers an exception at the address of the instruction at label 1:, the kernel should jump to the instruction at address 2:. In this case, that is the end of the code.

static int noinline do_test_wp_bit(void)
{
        char tmp_reg;
        int flag;

        __asm__ __volatile__(
                "       movb %0,%1      \n"
                "1:     movb %1,%0      \n"
                "       xorl %2,%2      \n"
                "2:                     \n"
                ".section __ex_table,\"a\"\n"
                "       .align 4        \n"
                "       .long 1b,2b     \n"
                ".previous              \n"
                :"=m" (*(char *)fix_to_virt(FIX_WP_TEST)),
                 "=q" (tmp_reg),
                 "=r" (flag)
                :"2" (1)
                :"memory");

        return flag;
}

How it works

In arch/i386/kernel/entry.S you will see a number of exception handling entry points, which get triggered when the CPU throws a certain kind of exception. In this case we have an attempted write to a read-only page, so we get a page fault.

KPROBE_ENTRY(page_fault)
        RING0_EC_FRAME
        pushl $do_page_fault
        CFI_ADJUST_CFA_OFFSET 4
        jmp error_code
        CFI_ENDPROC
        .previous .text

The function do_page_fault has a call to the exception handler.

no_context:
        /* Are we prepared to handle this kernel fault?  */
        if (fixup_exception(regs))
                return;

As you can imagine, the real magic is done in fixup_exception. To be precise, it searches the exception table to see if the address of the faulting instruction is in it, and if it is, it gets replaced by the fixup address before returning from the page fault handler.

int fixup_exception(struct pt_regs *regs)
{
        const struct exception_table_entry *fixup;

        fixup = search_exception_tables(regs->eip);
        if (fixup) {
                regs->eip = fixup->fixup;
                return 1;
        }

        return 0;
}

As you can guess by now, the exception table is just a table with addresses of instructions that are expected to throw exceptions, and the fixup addresses that should be put in place.

struct exception_table_entry
{
        unsigned long insn, fixup;
};

I guess this magic isn't so magic after all...

["CategoryFAQ"]