We are getting a repeatable Oops on boot up on an ASUS P5ND2-SLI board. It only happens intermittently but often enough to be a serious problem. The machines boot with root over NFS, but there is a SATA drive installed for some logs. It only happens during boot, never later. Here is Oops number 1: ********************* Unable to handle kernel NULL pointer dereferenceACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22 printing eip: *pde = 00426001 Oops: 0000 [#1] SMP Modules linked in: sata_nv libata scsi_mod e1000 nfs lockd nfs_acl sunrpc CPU: 0 EIP: 0060:[<e09528e6>] Not tainted VLI EFLAGS: 00010286 (2.6.14-1.1644_FC4smp) EIP is at scsi_run_queue+0x10/0xaf [scsi_mod] eax: 00000000 ebx: dd98007c ecx: dffef880 edx: 00000001 esi: ddc99e00 edi: 00000246 ebp: dd9071fc esp: c042af14 ds: 007b es: 007b ss: 0068 Process ksoftirqd/0 (pid: 3, threadinfo=c042a000 task=dfc41ab0) Stack: dd9071fc dd98007c ddc99e00 00000246 dd9071fc e0952a7d ddc99e00 00000000 00000000 dd98007c e0952e88 00000001 e088260b dcd3a570 c15e3380 dcd3a570 00000000 00000000 00040000 00000024 dd9071fc 00000000 00000000 00000292 Call Trace: [<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod] [<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod] [<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000] [<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod] [<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod] [<c02bdf4e>] net_rx_action+0xb7/0x1bb [<c01258c2>] __do_softirq+0x72/0xdc [<c0105c43>] do_softirq+0x4b/0x4f ======================= [<c0125ec2>] ksoftirqd+0x9c/0xe8 [<c0125e26>] ksoftirqd+0x0/0xe8 [<c0133d89>] kthread+0x93/0x97 [<c0133cf6>] kthread+0x0/0x97 [<c0101d5d>] kernel_thread_helper+0x5/0xb Code: c5 8f df 8b 14 24 8b 42 44 e8 37 b6 9c df 89 44 24 04 89 d8 e8 2e b6 ff ff eb b1 55 57 56 53 83 ec 04 89 04 24 8b 80 10 01 00 00 <8b> 38 80 b8 85 01 00 00 00 0f 88 86 00 00 00 8b 47 44 e8 03 b6 <0>Kernel panic - not syncing: Fatal exception in interrupt ata3: no device found (phy stat 00000000) scsi2 : sata_nv [<c0120358>] panic+0x45/0x1c4 [<c0104caf>] die+0x17b/0x185 [<c031ec40>] do_page_fault+0x0/0x700 [<c031ee49>] do_page_fault+0x209/0x700 [<c031ec40>] do_page_fault+0x0/0x700 [<c010457f>] error_code+0x4f/0x54 [<e09528e6>] scsi_run_queue+0x10/0xaf [scsi_mod] [<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod] [<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod] [<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000] [<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod] [<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod] [<c02bdf4e>] net_rx_action+0xb7/0x1bb [<c01258c2>] __do_softirq+0x72/0xdc [<c0105c43>] do_softirq+0x4b/0x4f ======================= [<c0125ec2>] ksoftirqd+0x9c/0xe8 [<c0125e26>] ksoftirqd+0x0/0xe8 [<c0133d89>] kthread+0x93/0x97 [<c0133cf6>] kthread+0x0/0x97 [<c0101d5d>] kernel_thread_helper+0x5/0xb And here is Oops number 2: ********************* Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: *pde = 00426001 Oops: 0000 [#1] SMP Modules linked in: sata_nv libata scsi_mod e1000 nfs lockd nfs_acl sunrpc CPU: 0 EIP: 0060:[<e09528e6>] Not tainted VLI EFLAGS: 00010286 (2.6.14-1.1644_FC4smp) EIP is at scsi_run_queue+0x10/0xaf [scsi_mod] eax: 00000000 ebx: dd9acb1c ecx: dffef880 edx: 00000001 esi: dd50fc80 edi: 00000246 ebp: dd49a3b8 esp: c042af14 ds: 007b es: 007b ss: 0068 Process ksoftirqd/0 (pid: 3, threadinfo=c042a000 task=dfc41ab0) Stack: dd49a3b8 dd9acb1c dd50fc80 00000246 dd49a3b8 e0952a7d dd50fc80 00000000 00000000 dd9acb1c e0952e88 00000001 e088260b c04ac408 ded6b380 c1407fe0 00000000 00000000 00040000 00000024 dd49a3b8 00000000 00000000 00000292 Call Trace: [<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod] [<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod] [<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000] [<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod] [<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod] [<c02bdf4e>] net_rx_action+0xb7/0x1bb [<c01258c2>] __do_softirq+0x72/0xdc [<c0105c43>] do_softirq+0x4b/0x4f ======================= [<c0125ec2>] ksoftirqd+0x9c/0xe8 [<c0125e26>] ksoftirqd+0x0/0xe8 [<c0133d89>] kthread+0x93/0x97 [<c0133cf6>] kthread+0x0/0x97 [<c0101d5d>] kernel_thread_helper+0x5/0xb Code: c5 8f df 8b 14 24 8b 42 44 e8 37 b6 9c df 89 44 24 04 89 d8 e8 2e b6 ff ff eb b1 55 57 56 53 83 ec 04 89 04 24 8b 80 10 01 00 00 <8b> 38 80 b8 85 01 00 00 00 0f 88 86 00 00 00 8b 47 44 e8 03 b6 <0>Kernel panic - not syncing: Fatal exception in interrupt [<c0120358>] panic+0x45/0x1c4 [<c0104caf>] die+0x17b/0x185 [<c031ec40>] do_page_fault+0x0/0x700 [<c031ee49>] do_page_fault+0x209/0x700 [<c031ec40>] do_page_fault+0x0/0x700 [<c010457f>] error_code+0x4f/0x54 [<e09528e6>] scsi_run_queue+0x10/0xaf [scsi_mod] [<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod] [<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod] [<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000] [<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod] [<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod] [<c02bdf4e>] net_rx_action+0xb7/0x1bb [<c01258c2>] __do_softirq+0x72/0xdc [<c0105c43>] do_softirq+0x4b/0x4f ======================= [<c0125ec2>] ksoftirqd+0x9c/0xe8 [<c0125e26>] ksoftirqd+0x0/0xe8 [<c0133d89>] kthread+0x93/0x97 [<c0133cf6>] kthread+0x0/0x97 [<c0101d5d>] kernel_thread_helper+0x5/0xb Both Oopses are at the same spot with scsi_mod and e1000 being the culprits. Hardware: ASUS P5ND2-SLI P4 630+ Double Intel e1000 on motherboard SATA Disk (only for some logs, not booted from) XFX NVidia GeForce 6800XT (driver not loaded so not tainted) Kernel: 2.6.14-1.1644_FC4smp standard Fedora Core 4 kernel with a custom initrd to be able to boot off network. It might be connected to when sata_nv looking for drives, but I'm not sure. Has anybody seen this before? Is there a patch? Workaround? Please CC me as I'm not on the list. /Carl-Johan Kjellander -- begin 644 carljohan_at_kjellander_dot_com.gif Y1TE&.#=A(0`F`(```````/___RP`````(0`F```"@XR/!\N<#U.;+MI`<[U(>\!UGQ9BGT%>'D2I Y*=NX,2@OUF2&<827ILW;^822C>\7!!Z1,!K'B5(6H<SH-"E*TJ3%*/>QI6:7"A>Y?):D2^*U@NCV R<MOQ=]V(B6>LZYD-_T1U<@3W]A4(^$-W4]A#V")W6#.R"$;IR'@).46BN7$9>5D``#L`
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
- Prev by Date: Re: [BUG] Xserver startup locks system... git bisect results
- Next by Date: Bdev->bd_disk is NULL @ initrd time
- Previous by thread: [patch] ioctl BLKGETSIZE64 fix
- Next by thread: Bdev->bd_disk is NULL @ initrd time
- Index(es):