KernelNewbies:

Intro

Some bugs are simple to spot, like a kernel crash. Some bugs are harder to reproduce.

What is an oops?

How to capture dmesg

Sometimes, you can just copy lines from /var/log/kern.log or /var/log/messages or simply run

dmesg | tee log-file.txt

and then reproduce the error.

This only works if your machine doesn't totally crash when you hit the bug. If the system freezes, the dmesg may not be written to disk. In that case, your only alternative is to use either serial console or netconsole to capture the dmesg when the crash happens.

Kernel options to turn on

Serial console

Netconsole

Netconsole is a powerful Linux kernel debugging tool. The dmesg output from a machine under test is transferred over an ethernet link (via UDP packets) to another machine. That means that you can see the debugging messages from the test machine on the screen of another machine. Netconsole isn't good for debugging early kernel panics, but it is very useful if your new kernel driver hangs your system.

Prework

First, you need to have some tools installed. You'll need netcat, ping, and (optionally) wireshark. You'll also need to have netconsole compiled as a module on the source box. Netconsole has to be a module so you can load it after you get the system set up.

Make sure the ethernet driver for both machines supports netpoll. Also make sure that the machines are both plugged into the same router or subnet.

Configuring Netconsole

In this section, I'll refer to the computer under test that is generating the dmesg output as the source machine. The computer that receives the debugging messages is called the target machine.

First, on the source machine, make sure you have the daemon that routes kernel messages (sysctl) set up so that messages of all priority types will end up in /var/log/messages. You can do this by running

sudo dmesg -n 8

or

sudo sysctl -w kernel.printk="8 8 8 8"

Next, on the source machine, figure out which ethernet port is connected. It may be referred to as ethN (e.g. eth0 or eth1). Run ifconfig to find out:

sarah@xanatos:~$ sudo ifconfig 
eth1      Link encap:Ethernet  HWaddr 12:34:56:78:90:12  
          inet addr:10.7.201.12  Bcast:10.7.201.255  Mask:255.255.255.0
          inet6 addr: fe80::3e97:eff:fe39:d710/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:157762 errors:0 dropped:0 overruns:0 frame:0
          TX packets:39377 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:64231841 (64.2 MB)  TX bytes:6863064 (6.8 MB)
          Interrupt:20 Memory:f2500000-f2520000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:4672 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4672 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:394237 (394.2 KB)  TX bytes:394237 (394.2 KB)

Next, on the target machine, use the same command to find its IPv4 address and mac address. (Netconsole does not currently support IPv6 addresses.) Those are the inet and HWaddr numbers. Let's assume the IP address is 10.7.201.20 and the mac address is 12:34:56:78:90:20.

Now, make sure both boxes can ping each other. This ensures both boxes are on the same subnet and can send UDP packets. In this example, on the target machine, we would run

ping 10.7.201.12

and on the source machine, we would run

ping 10.7.201.20

If you aren't getting packet transmissions from the pings, go debug your network. Wireshark is a useful tool in this case.

Starting netconsole

On the target machine, start netconsole to capture the dmesg output:

nc -l -u 6666 | tee ~/dmesg-`date +%Y-%m-%d-%H-%M`.txt

This will create a new file in your home directory with the dmesg output, while allowing you to see what's going into the file.

Next, on the source machine, load the netconsole module. You'll need to load the module with an extra 'netconsole' module parameter. The [http://lxr.linux.no/#linux/Documentation/networking/netconsole.txt netconsole documentation] describes how you use the module parameters to tell netconsole how to send dmesg packets to your target machine. The parameter format is currently netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]. In our example, we're leaving off the source port and source IP, since the default is fine. The source device (dev) is eth1, from our ifconfig example above. The target IP address is 10.7.201.20, and the target MAC address is 12:34:56:78:90:20, which we found from running ifconfig on the target machine. Thus, the modprobe command would be:

sudo modprobe netconsole netconsole=@/eth1,@10.7.201.20/12:34:56:78:90:20

If you've followed the directions correctly, on your target machine, you should see the terminal with netconsole spit out some dmesg:

[ 2009.373932] netpoll: netconsole: local port 6665
[ 2009.373941] netpoll: netconsole: local IP 0.0.0.0
[ 2009.373945] netpoll: netconsole: interface 'eth1'
[ 2009.373949] netpoll: netconsole: remote port 6666
[ 2009.373952] netpoll: netconsole: remote IP 10.7.201.20
[ 2009.373956] netpoll: netconsole: remote ethernet address 12:34:56:78:90:20
[ 2009.373962] netpoll: netconsole: local IP 10.7.201.12
[ 2009.375261] console [netcon0] enabled
[ 2009.375307] netconsole: network logging started

Now you can go trigger the kernel crash and capture the dmesg from the crash on your target machine.

KernelNewbies: KernelDebug (last edited 2013-01-15 17:59:24 by SarahSharp)