Re: A couple of oops.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2006-05-23 at 17:40 -0700, Andrew Morton wrote:
> Carlos Martín <[email protected]> wrote:
> >
> > Hi,
> > 
> > I've nailed this down to something that happened in 2.6.17-rc4. The
> > system locks up with either a NULL dereference or an unhandable paging
> > request. The stack trace shows this:
> > 
> > paging request            NULL dereference
> > 
> > _raw_spin_trylock+12    _raw_spin_trylock+20
> > __spin_lock+22
> > main_timer_handler+22
> > timer_interrupt+18
> > handle_IRQ_event+41
> > __do_IRQ+156
> > do_IRQ+51
> > default_idle+0
> > _spin_unlock_irq+43
> > thread_return+187
> > generic_unplug_device+0
> > default_idle+45
> > dev_idle+95 (I can't read the func clearly in this handwriting)
> > start_secondary+1129
> > 
> > I'm guessing this is the same problem only that it once manifests itself
> > as one and another time as the other. The problem is in the call to
> > write_seqlock(&xtime_lock) from main_timer_handler().
> > 
> > I've not been able to determine what patch has caused this to happen,
> > but it is between 2.6.17-rc3 and -rc4. I'm bisecting, but if anybody has
> > a good candidate, it'd probably be faster than doing a complete bisect.
> 
> hm, so an attempt to access xtime_lock.lock results in a null-pointer deref?

It seems to be xtime_lock.sequence:
main_timer_handler:
        pushq   %r13    #
        movq    %rdi, %r13      # regs, regs
        movq    $xtime_lock+8, %rdi     #,
        pushq   %r12    #
        pushq   %rbp    #
        pushq   %rbx    #
        pushq   %rbp    #
        call    _spin_lock      #
OOPS--> incl    xtime_lock(%rip)        # xtime_lock.sequence
        xorl    %r12d, %r12d    # offset

> 
> x86_64 does novel things, putting xtime_lock into a linker section all of
> its own.  At a guess I'd say that your compiler/assembler/linker toolchain
> got confused and generated the incorrect address for xtime_lock.  But if
> that was the case you'd get oopses 100% of the time - it wouldn't be
> intermittent, as your description seems to imply (although it's quite
> unclear?).

Once I've seen the NULL dereference, the rest are paging request errors.
Most of the time I'm on X, so I don't actually see output, but the ones
I've seen have been like that.

It starts somewhere between rc3 and rc4.

With the bisect, now I'm left with a bunch of SCSI patches which don't
seem to bear any relationship and which I don't use (except for
usb-storage).

> 
> You could do:
> 
> gdb vmlinux
> (gdb) p &xtime_lock
> (gdb) x/40i main_timer_handler
> 
-- 
Carlos Martín Nieto    |   http://www.cmartin.tk
Hobbyist programmer    |

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux