#format wiki #language en == Wade Mealing == Email: [[MailTo(wmealing AT SPAMFREE gmail DOT com)]] Going to document some of the things that I work on, specifically how I use crash in RHEL style kernels. Perhaps this might be useful to someone else who also needs to figure out the output of crash and what some of the crazyness means. ---- == x86-32 Crash == So, I've received a vmcore running the kernel 2.6.9-34.0.2. The kernel was captured via a netdump server. After installing the correct [javascript:void(0);/*1230876717868*/ kernel debuginfo] package and [javascript:void(0);/*1230876762145*/ starting crash] , This was a crash from a 32 bit system. I was greeted with the usual style 'omgpanic' info from crash shown below: . {{{ [wmealing@core-i386 work]$ ./crash crash 4.0-5.0.3 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... KERNEL: /cores/vmlinux DUMPFILE: /cores/251703.vmcore CPUS: 2 DATE: Thu Dec 18 00:23:40 2008 UPTIME: 459 days, 21:16:32 LOAD AVERAGE: 1.05, 0.77, 0.68 TASKS: 280 NODENAME: paranioa RELEASE: 2.6.9-34.0.2.ELsmp VERSION: #1 SMP Fri Jun 30 10:33:58 EDT 2006 MACHINE: i686 (3801 Mhz) MEMORY: 4.4 GB PANIC: "Oops: 0002 [#1]" (check log for details) PID: 1629 COMMAND: "kjournald" TASK: c37fedb0 [THREAD_INFO: c37a4000] CPU: 0 STATE: TASK_UNINTERRUPTIBLE (PANIC) }}} A smart punter in this readership may have noticed that this doesn't actually tell you much, besides that the actual panic information is in the log. Sometimes the BUG() or the problem can appear in this first initial screen but we are not that lucky today. The log command in crash is pretty much the same as the dmesg that the system should/would have had if the system had continued to run instead of falling over. Because this can be very long, I'm going to tail the last 75 lines for brevity and your sanity, you should probably look through most of it as previous oopses or problems can appear in this command. {{{ crash> log | tail -n 75 cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. cdrom: open failed. Unable to handle kernel NULL pointer dereference at virtual address 0000023c printing eip: f8855437 *pde = 30619001 Oops: 0002 [#1] SMP Modules linked in: iptable_filter ip_tables parport_pc parport st seos(U) eAC_mini(U) sg cpqci(U) netconsole netdump dm_mirror dm_mod uhci_hcd ehci_hcd hw_random e1000(U) tg3 bond1(U) bonding(U) floppy ext3 jbd cciss sd_mod scsi_mod CPU: 0 EIP: 0060:[] Tainted: P VLI EFLAGS: 00010087 (2.6.9-34.0.2.ELsmp) EIP is at do_cciss_intr+0xdc/0x4b4 [cciss] eax: 00000000 ebx: 00000004 ecx: 00000004 edx: 00000000 esi: f7400000 edi: 00000000 ebp: c3765800 esp: c03eafbc ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1629, threadinfo=c03ea000 task=c37fedb0) Stack: 00000000 00000001 00000001 00000082 f7dd4800 00000001 00000000 c37a4ab8 c0107472 c37a4a9c c03ea000 c0387900 c37a4000 c01079d2 00000032 c37a4ab8 f7dd4800 Call Trace: [] handle_IRQ_event+0x25/0x4f [] do_IRQ+0x11c/0x1ae ======================= [] common_interrupt+0x18/0x20 [] do_cciss_request+0x9e/0x2eb [cciss] [] mempool_alloc+0x7b/0x135 [] autoremove_wake_function+0x0/0x2d [] mempool_alloc+0x7b/0x135 [] autoremove_wake_function+0x0/0x2d [] __cfq_get_queue+0x91/0xf6 [] autoremove_wake_function+0x0/0x2d [] cfq_get_queue+0x30/0x37 [] cfq_set_request+0x33/0x6b [] cfq_set_request+0x0/0x6b [] get_request+0x1de/0x1e8 [] finish_wait+0x2c/0x50 [] ll_back_merge_fn+0x175/0x1de [] elv_merged_request+0x9/0xa [] __make_request+0x452/0x46c [] mempool_free+0x60/0x64 [] cfq_dispatch_requests+0x55/0x80 [] cfq_next_request+0x21/0x35 [] __generic_unplug_device+0x2b/0x2d [] generic_unplug_device+0x15/0x21 [] blk_backing_dev_unplug+0xf/0x10 [] sync_buffer+0x2c/0x2d [] __wait_on_buffer+0x67/0x83 [] bh_wake_function+0x0/0x29 [] submit_bh+0x15a/0x166 [] bh_wake_function+0x0/0x29 [] journal_commit_transaction+0x8a7/0xfc1 [jbd] [] autoremove_wake_function+0x0/0x2d [] autoremove_wake_function+0x0/0x2d [] find_busiest_group+0xdd/0x2ba [] load_balance_newidle+0x56/0x82 [] schedule+0x83d/0x8d3 [] schedule+0x86d/0x8d3 [] del_timer_sync+0x7a/0x9c [] kjournald+0xc7/0x219 [jbd] [] autoremove_wake_function+0x0/0x2d [] autoremove_wake_function+0x0/0x2d [] schedule_tail+0x31/0xa7 [] commit_timeout+0x0/0x5 [jbd] [] kjournald+0x0/0x219 [jbd] [] kernel_thread_helper+0x5/0xb Code: 95 30 03 00 00 74 38 8b 86 3c 02 00 00 39 f0 74 2e 39 b5 30 03 00 00 75 06 89 85 30 03 00 00 8b 86 38 02 00 00 8b 96 3c 02 00 00 <89> 90 3c 02 00 00 8b 96 3c 02 00 00 89 82 38 02 00 00 eb 06 c7 }}} {{{ What we are looking at is what people call a "panic message". I'll try to bisect it below so that you can have understanding and common lexicon when discussing problems with fellow hackers.}}} {{{ :}}} {{{ Unable to handle kernel NULL pointer dereference at virtual address 0000023c printing eip: f8855437 *pde = 30619001 Oops: 0002 [#1]}}} {{{ The problem as understood by the kernel is: Unable to handle kernel NULL pointer dereference at virtual address 0000023c. }}} {{{ The EIP[1] or "Executable instruction pointer" is the current location in the loaded code in memory that the CPU is executing. Further down the panic message, crash will resolve this into a function and instruction offset. I guess this is printed at this point in case crash is unable to so.}}} {{{ pde = 30619001 ( page descriptor entry i think) is which page descriptor that the oops occured in.}}} {{{ }}} {{{ Oops: 0002 [#1] <-- seems to be more than one oops, nwe dont know why.}}} {{{ Module Information Modules linked in: iptable_filter ip_tables parport_pc parport st seos(U) eAC_mini(U) sg cpqci(U) netconsole netdump dm_mirror dm_mod uhci_hcd ehci_hcd hw_random e1000(U) tg3 bond1(U) bonding(U) floppy ext3 jbd cciss sd_mod scsi_m}}} {{{ }}} {{{ ) }}} {{{ }}}