Intro
Some bugs are simple to spot, like a kernel crash. Some bugs are harder to reproduce.
How to capture dmesg
Sometimes, you can just copy lines from /var/log/kern.log or /var/log/messages or simply run
dmesg | tee log-file.txt
and then reproduce the error.
This only works if your machine doesn't totally crash when you hit the bug. If the system freezes, the dmesg may not be written to disk. In that case, your only alternative is to use either serial console or netconsole to capture the dmesg when the crash happens.
Kernel options to turn on
You may need to turn on several different kernel configuration options in order to capture all of the dmesg necessary to debug your kernel issue. Here's some examples
- CONFIG_PRINTK_TIME - add time stamps to dmesg
- CONFIG_DEBUG_KERNEL - turn on kernel debugging
- CONFIG_DETECT_HUNG_TASK - good for figuring out what's causing a kernel freeze
- CONFIG_DEBUG_INFO - ensures you can decode kernel oops symbols
- CONFIG_EARLY_PRINTK
- CONFIG_LOG_BUF_SHIFT=21 - sets the kernel buffer log size to the biggest buffer
- CONFIG_NETCONSOLE=m - compiles netconsole as a module, see tutorial below.
Other kernel subsystems often have debug config options you can turn on for more verbose debug.
Netconsole
Netconsole is a powerful Linux kernel debugging tool. The dmesg output from a machine under test is transferred over an ethernet link (via UDP packets) to another machine. That means that you can see the debugging messages from the test machine on the screen of another machine. Netconsole isn't good for debugging early kernel panics, but it is very useful if your new kernel driver hangs your system.
Netconsole is a kernel module, so you will need to compile a custom kernel with CONFIG_NETCONSOLE=m. If you need help compiling a custom kernel, follow the directions on KernelBuild. Which kernel you choose to compile depends on which kernel you want to reproduce the bug on. You may want to download the source of your distribution kernel, or attempt to reproduce the bug on the latest stable kernel.
Prework
First, you need to have some tools installed. You'll need netcat, ping, and (optionally) wireshark. You'll also need to have netconsole compiled as a module on the source box. Netconsole has to be a module so you can load it after you get the system set up.
Make sure the ethernet driver for both machines supports netpoll. Also make sure that the machines are both plugged into the same router or subnet.
Configuring Netconsole
In this section, I'll refer to the computer under test that is generating the dmesg output as the source machine. The computer that receives the debugging messages is called the target machine.
First, on the source machine, make sure you have the daemon that routes kernel messages (sysctl) set up so that messages of all priority types will end up in /var/log/messages. You can do this by running
sudo dmesg -n 8
or
sudo sysctl -w kernel.printk="8 8 8 8"
Next, on the source machine, figure out which ethernet port is connected. It may be referred to as ethN (e.g. eth0 or eth1). Run ifconfig to find out:
sarah@xanatos:~$ sudo ifconfig eth1 Link encap:Ethernet HWaddr 12:34:56:78:90:12 inet addr:10.7.201.12 Bcast:10.7.201.255 Mask:255.255.255.0 inet6 addr: fe80::3e97:eff:fe39:d710/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:157762 errors:0 dropped:0 overruns:0 frame:0 TX packets:39377 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:64231841 (64.2 MB) TX bytes:6863064 (6.8 MB) Interrupt:20 Memory:f2500000-f2520000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:4672 errors:0 dropped:0 overruns:0 frame:0 TX packets:4672 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:394237 (394.2 KB) TX bytes:394237 (394.2 KB)
Next, on the target machine, use the same command to find its IPv4 address and mac address. (Netconsole does not currently support IPv6 addresses.) Those are the inet and HWaddr numbers. Let's assume the IP address is 10.7.201.20 and the mac address is 12:34:56:78:90:20.
Now, make sure both boxes can ping each other. This ensures both boxes are on the same subnet and can send UDP packets. In this example, on the target machine, we would run
ping 10.7.201.12
and on the source machine, we would run
ping 10.7.201.20
If you aren't getting packet transmissions from the pings, go debug your network. Wireshark is a useful tool in this case.
Starting netconsole
On the target machine, start netconsole to capture the dmesg output:
nc -l -u 6666 | tee ~/dmesg-`date +%Y-%m-%d-%H-%M`.txt
This will create a new file in your home directory with the dmesg output, while allowing you to see what's going into the file.
Next, on the source machine, load the netconsole module. You'll need to load the module with an extra 'netconsole' module parameter. The netconsole documentation describes how you use the module parameters to tell netconsole how to send dmesg packets to your target machine. The parameter format is currently netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]. In our example, we're leaving off the source port and source IP, since the default is fine. The source device (dev) is eth1, from our ifconfig example above. The target IP address is 10.7.201.20, and the target MAC address is 12:34:56:78:90:20, which we found from running ifconfig on the target machine. Thus, the modprobe command would be:
sudo modprobe netconsole netconsole=@/eth1,@10.7.201.20/12:34:56:78:90:20
If you've followed the directions correctly, on your target machine, you should see the terminal with netconsole spit out some dmesg:
[ 2009.373932] netpoll: netconsole: local port 6665 [ 2009.373941] netpoll: netconsole: local IP 0.0.0.0 [ 2009.373945] netpoll: netconsole: interface 'eth1' [ 2009.373949] netpoll: netconsole: remote port 6666 [ 2009.373952] netpoll: netconsole: remote IP 10.7.201.20 [ 2009.373956] netpoll: netconsole: remote ethernet address 12:34:56:78:90:20 [ 2009.373962] netpoll: netconsole: local IP 10.7.201.12 [ 2009.375261] console [netcon0] enabled [ 2009.375307] netconsole: network logging started
Now you can go trigger the kernel crash and capture the dmesg from the crash on your target machine.