Re: 2.6.18-rc3-mm2 (+ hotfixes): GPF related to skge on suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Saturday 12 August 2006 14:28, Andrew Morton wrote:
> On Sat, 12 Aug 2006 12:07:42 +0200
> "Rafael J. Wysocki" <[email protected]> wrote:
> 
> > Hi,
> > 
> > On 2.6.18-rc3-mm2 with hotfixes I get things like the appended one on attempts
> > to suspend to disk.  It occurs while devices are being suspended and is fairly
> > reproducible.
> > 
> > Greetings,
> > Rafael
> > 
> > 
> > Suspending device 0000:01:00.0
> > Suspending device 0000:02:02.0
> > Suspending device 0000:02:01.4
> > Suspending device 0000:02:01.3
> > Suspending device 0000:02:01.2
> > Suspending device 0000:02:01.1
> > Suspending device 0000:02:01.0
> > Suspending device 0000:02:00.0
> > skge Ram read data parity error
> > skge Ram write data parity error
> > skge eth0: receive queue parity error
> > skge <NULL>: receive queue parity error

This stuff comes from the interrupt handler which apparently races with
something.

> > skge 0000:02:00.0: PCI error cmd=0x110 status=0x2b0
> > general protection fault: 0000 [1] PREEMPT
> > last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/subsystem_device
> > CPU 0
> > Modules linked in: ide_cd cdrom usbserial asus_acpi thermal ipv6 processor fan button battery ac af_packet snd_pcm_oss snd_mixer_oss snd_seq
> > snd_seq_device bcm43xx ieee80211softmac ieee80211 ieee80211_crypt pcmcia firmware_class ohci1394 ieee1394 skge yenta_socket rsrc_nonstatic pc
> > mcia_core usbhid ff_memless snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ehci_hcd ohci_hcd i2c_nfo
> > rce2 i2c_core parport_pc lp parport dm_mod
> > Pid: 4, comm: events/0 Not tainted 2.6.18-rc3-mm2 #17
> > RIP: 0010:[<ffffffff88107287>]  [<ffffffff88107287>] :skge:skge_poll+0x547/0x570
> > RSP: 0018:ffffffff80621e70  EFLAGS: 00010202
> > RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: 0000000000000040
> 
> RAX doesn't look good.

Yup.

> > RDX: ffff81005addf128 RSI: ffffffff80621eec RDI: ffff81005addeb60
> > RBP: ffffffff80621ed0 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000040 R11: 0000000000000000 R12: ffff81005addf0a0
> > R13: 0000000000000000 R14: ffff810057fe9180 R15: 00000000ffffffff
> > FS:  00002b4b98df4b00(0000) GS:ffffffff808c2000(0000) knlGS:00000000558b4d00
> > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 00002adeb0d7d0b0 CR3: 0000000025147000 CR4: 00000000000006e0
> > Process events/0 (pid: 4, threadinfo ffff810037f44000, task ffff810037fef100)
> > Stack:  ffffffff80621eb0 ffffffff80621eec ffff81005addeb60 ffff81005ad61488
> >  ffff81005addf128 000000400000000a 00000001008e6a25 0000000000000000
> >  ffff81005addeb60 0000000000000000 00000001008e6a25 00000000ffffffff
> > Call Trace:
> >  [<ffffffff8040b1ba>] net_rx_action+0xba/0x1f0
> >  [<ffffffff80233640>] __do_softirq+0x70/0xf0
> >  [<ffffffff8020aa7c>] call_softirq+0x1c/0x30
> > DWARF2 unwinder stuck at call_softirq+0x1c/0x30
> > Leftover inexact backtrace:
> >  <IRQ> [<ffffffff8020ca4d>] do_softirq+0x3d/0xb0
> >  [<ffffffff8023349e>] irq_exit+0x4e/0x60
> >  [<ffffffff8020cbf5>] do_IRQ+0x135/0x140
> >  [<ffffffff80427b9e>] rt_run_flush+0x8e/0xd0
> >  [<ffffffff8020a266>] ret_from_intr+0x0/0xf
> >  <EOI> [<ffffffff80233367>] local_bh_enable_ip+0xe7/0x110
> >  [<ffffffff804718b9>] _spin_unlock_bh+0x39/0x40
> >  [<ffffffff80427b9e>] rt_run_flush+0x8e/0xd0
> >  [<ffffffff80427c8b>] rt_cache_flush+0xab/0x100
> >  [<ffffffff8045a1c9>] fib_netdev_event+0xa9/0xc0
> >  [<ffffffff8023c2af>] notifier_call_chain+0x2f/0x50
> >  [<ffffffff8023c4b9>] raw_notifier_call_chain+0x9/0x10
> >  [<ffffffff80409789>] netdev_state_change+0x29/0x40
> >  [<ffffffff80415122>] linkwatch_run_queue+0x162/0x190
> >  [<ffffffff8041517a>] linkwatch_event+0x2a/0x40
> >  [<ffffffff8023fd72>] run_workqueue+0xc2/0x120
> >  [<ffffffff80415150>] linkwatch_event+0x0/0x40
> >  [<ffffffff8023fff1>] worker_thread+0x121/0x160
> >  [<ffffffff80229370>] default_wake_function+0x0/0x10
> >  [<ffffffff8023fed0>] worker_thread+0x0/0x160
> >  [<ffffffff802436f9>] kthread+0xd9/0x110
> >  [<ffffffff8024b1ad>] trace_hardirqs_on+0x11d/0x150
> >  [<ffffffff8020a706>] child_rip+0x8/0x12
> >  [<ffffffff80471e5b>] _spin_unlock_irq+0x2b/0x60
> >  [<ffffffff8020a2c0>] restore_args+0x0/0x30
> >  [<ffffffff80243620>] kthread+0x0/0x110
> >  [<ffffffff8020a6fe>] child_rip+0x0/0x12
> > Code: 44 8b 28 c7 45 d0 00 00 00 00 45 85 ed 0f 89 29 fb ff ff e9
> > RIP  [<ffffffff88107287>] :skge:skge_poll+0x547/0x570
> >  RSP <ffffffff80621e70>
> >  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> ksymoops says:
> 
> Code;  ffffffff88107287 <_end+7ac9287/7efc2000>
> 00000000 <_EIP>:
> Code;  ffffffff88107287 <_end+7ac9287/7efc2000>   <=====
>    0:   44                        inc    %esp   <=====
> Code;  ffffffff88107288 <_end+7ac9288/7efc2000>
>    1:   8b 28                     mov    (%eax),%ebp
> Code;  ffffffff8810728a <_end+7ac928a/7efc2000>
>    3:   c7 45 d0 00 00 00 00      movl   $0x0,0xffffffd0(%ebp)
> Code;  ffffffff88107291 <_end+7ac9291/7efc2000>
>    a:   45                        inc    %ebp
> Code;  ffffffff88107292 <_end+7ac9292/7efc2000>
>    b:   85 ed                     test   %ebp,%ebp
> Code;  ffffffff88107294 <_end+7ac9294/7efc2000>
>    d:   0f 89 29 fb ff ff         jns    fffffb3c <_EIP+0xfffffb3c>
> Code;  ffffffff8810729a <_end+7ac929a/7efc2000>
>   13:   e9 00 00 00 00            jmp    18 <_EIP+0x18>
> 
> So even if we didn't deref a kfree'd pointer, we're about to.

Hm, but the code should be 64-bit?

> It would be good if you could poke around in gdb, work out exactly which
> statement it's oopsing at, please.

(gdb) l *skge_poll+0x547
0x5287 is in skge_poll (skge.c:2719).
2714                    struct skge_rx_desc *rd = e->desc;
2715                    struct sk_buff *skb;
2716                    u32 control;
2717
2718                    rmb();
2719                    control = rd->control;
2720                    if (control & BMU_OWN)
2721                            break;
2722
2723                    skb = skge_rx_get(skge, e, control, rd->status, rd->csum2);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux