Re: oops in 2.6.14-rc3 — Linux Kernel

On Sun, 23 Oct 2005 10:22:49 -0400 Adam Kropelin wrote:

> Sasa Ostrouska <sasa.ostrouska@volja.net> wrote:
> > Oct 20 03:01:50 rc-vaio kernel: Unable to handle kernel paging request at virtual address f8e43706
> > Oct 20 03:01:50 rc-vaio kernel:  printing eip:
> > Oct 20 03:01:50 rc-vaio kernel: c01eaf49
> > Oct 20 03:01:50 rc-vaio kernel: *pde = 01bae067
> > Oct 20 03:01:50 rc-vaio kernel: Oops: 0000 [#1]
> > Oct 20 03:01:50 rc-vaio kernel: PREEMPT
> > Oct 20 03:01:50 rc-vaio kernel: Modules linked in: snd_pcm_oss
> > snd_mixer_oss lp ipv6 uhci_hcd joydev parport_pc parport psmouse pcspkr
> > rtc sis_agp shpchp pci_hotplug i2c_sis96x i2c_core usb_storage
> > snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd
> > snd_page_alloc ohci_hcd ehci_hcd usbcore sis900 ohci1394 ieee1394 tsdev
> > pcmcia firmware_class yenta_socket rsrc_nonstatic pcmcia_core ide_scsi
> > agpgart
> > Oct 20 03:01:50 rc-vaio kernel: CPU:    0
> > Oct 20 03:01:50 rc-vaio kernel: EIP:    0060:[<c01eaf49>]    Not tainted VLI
> > Oct 20 03:01:50 rc-vaio kernel: EFLAGS: 00010297   (2.6.14-rc4)
> > Oct 20 03:01:50 rc-vaio kernel: EIP is at vsnprintf+0x369/0x500
> > Oct 20 03:01:50 rc-vaio kernel: eax: f8e43706   ebx: 0000000a   ecx: f8e43706   edx: fffffffe
> > Oct 20 03:01:50 rc-vaio kernel: esi: f596e11f   edi: 00000000   ebp: f596efff   esp: f398ded0
> > Oct 20 03:01:50 rc-vaio kernel: ds: 007b   es: 007b   ss: 0068
> > Oct 20 03:01:50 rc-vaio kernel: Process grep (pid: 7529, threadinfo=f398c000 task=f6122030)
> > Oct 20 03:01:50 rc-vaio kernel: Stack: 000003e1 00000000 00000010 00000004 00000002 00000001 ffffffff ffffffff
> > Oct 20 03:01:50 rc-vaio kernel:        00000eed f596e113 c0331532 f596e113 f665c380 f665c380 00000113 c017c52f
> > Oct 20 03:01:50 rc-vaio kernel:        f398df44 c0330829 f7fe0ca0 c011fcb4 f665c380 c0331520 00000000 c0330829
> > Oct 20 03:01:50 rc-vaio kernel: Call Trace:
> > Oct 20 03:01:50 rc-vaio kernel:  [<c017c52f>] seq_printf+0x2f/0x60
> > Oct 20 03:01:50 rc-vaio kernel:  [<c011fcb4>] r_show+0x84/0x90
> > Oct 20 03:01:50 rc-vaio kernel:  [<c017c0f1>] seq_read+0x221/0x290
> > Oct 20 03:01:50 rc-vaio kernel:  [<c015bae7>] vfs_read+0xc7/0x180
> > Oct 20 03:01:50 rc-vaio kernel:  [<c015be77>] sys_read+0x47/0x80
> > Oct 20 03:01:50 rc-vaio kernel:  [<c0103005>] syscall_call+0x7/0xb
> > Oct 20 03:01:50 rc-vaio kernel: Code: 00 83 cf 01 89 44 24 1c eb bc 8b
> > 44 24 40 8b 54 24 18 83 44 24 40 04 8b 08 b8 fe 14 34 c0 81 f9 ff 0f 00
> > 00 0f 46 c8 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 83
> > e7 10 89 c3 75 20
> > Oct 20 03:01:50 rc-vaio kernel:  <6>note: grep[7529] exited with preempt_count 1
> 
> If I had to guess (and I do) I'd say one of your shutdown scripts tried
> to grep thru something in /proc and the module that once supplied the
> data for that something is gone, without having removed its /proc
> entries. Lacking any particular insight on what module to blame, I'd
> start by disabling various modules and booting cleanly so they never
> load. Binary search your way thru them until you find the culprit.

$ git grep -w r_show    
fs/reiserfs/procfs.c:static int r_show(struct seq_file *m, void *v)
fs/reiserfs/procfs.c:   .show = r_show,
kernel/resource.c:static int r_show(struct seq_file *m, void *v)
kernel/resource.c:      .show   = r_show,

This does not look like r_show() from reiserfs, because that function
does not call seq_printf() directly, so it must be r_show() from
kernel/resource.c:

static int r_show(struct seq_file *m, void *v)
{
	struct resource *root = m->private;
	struct resource *r = v, *p;
	int width = root->end < 0x10000 ? 4 : 8;
	int depth;

	for (depth = 0, p = r; depth < MAX_IORES_LEVEL; depth++, p = p->parent)
		if (p->parent == root)
			break;
	seq_printf(m, "%*s%0*lx-%0*lx : %s\n",
			depth * 2, "",
			width, r->start,
			width, r->end,
			r->name ? r->name : "<BAD>");
	return 0;
}

This function is responsible for /proc/ioports and /proc/iomem.

First parameters of seq_printf() in the stack were:

	f665c380	m
	c0331520	"%*s%0*lx-%0*lx : %s\n"
	00000000	depth * 2
	c0330829	""

(unfortunately, no more information is available in the stack dump).

They do not look like the bad pointer (f8e43706), so the most likely
culprit is r->name - probably some module set the resource name to some
string constant, and then was unloaded, but did not perform the proper
cleanup.  And depth == 0 means that the problematic resource most likely
did not belong to a PCI device - maybe it was some legacy resource.

You have the list of modules which were loaded at oops time (see
"Modules linked in:" above); please also show the lsmod output obtained
when the system is working - then we can find which modules were
unloaded and investigate those more closely.

Attachment: pgp4m5QpwUZGs.pgp
Description: PGP signature

References:
- Re: oops in 2.6.14-rc3
  - From: Sasa Ostrouska <sasa.ostrouska@volja.net>
- Re: oops in 2.6.14-rc3
  - From: Adam Kropelin <akropel1@rochester.rr.com>

Prev by Date: Re: Information on ioctl32
Next by Date: Re: [PATCH] RCU torture-testing kernel module
Previous by thread: Re: oops in 2.6.14-rc3
Next by thread: How to interpret a kernel bug output from dmesg?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]