On Mon, Jul 23, 2007 at 03:27:12PM -0700, Andrew Morton wrote:
> On Tue, 24 Jul 2007 02:04:46 +0400
> Alexey Dobriyan <[email protected]> wrote:
>
> > On Mon, Jul 23, 2007 at 02:11:37PM -0700, Andrew Morton wrote:
> > > On Tue, 24 Jul 2007 01:01:53 +0400
> > > Alexey Dobriyan <[email protected]> wrote:
> > >
> > > > On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > > > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > > > >
> > > > > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > > > >
> > > > > > I haven't worked out where that kmap_atomic() call is coming from yet.
> > > > > > Both traces point up into the page allocator, but I _think_ that's stack
> > > > > > gunk.
> > > > >
> > > > > Ahh, you suspect networking.
> > > > >
> > > > > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > > > > receiving ~20 junk packets per second, one gathering netconsole output
> > > > > and ssh to it, no conntracks and fancy stuff.
> > > > >
> > > > > [reboots with cables physically unplugged]
> > > >
> > > > OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> > > > small files) with both cables unplugged. It all went fine for ~5 minutes
> > > > after that it crashed exactly same way after 10 secs after plugging one
> > > > of them.
> > >
> > > It'd be nice to get a clean trace. Are you able to obtain the full
> > > trace with CONFIG_FRAME_POINTER=y?
> >
> > Sorry, no camera shot, finding camera requires wakening up M. :)
> >
> > It took longer that usual, but here it is
> >
> > kmap_atomic
> > get_page_from_freelist
> > __alloc_pages
> > cache_alloc_refill
> > __alloc_pages
> > cache_alloc_refill
> > kmem_cache_alloc
> > dst_alloc
> > ip_route_input
> > ip_rcv
> > netif_receive_skb
> > rtl8139_poll
> > net_rx_action
> > __do_softirq
> > do_softirq
> > irq_exit
> > do_IRQ
> > common_interrupt
> > handle_mm_fault
> > do_page_fault
> > error_core
> >
> > much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
> > turned on, using atl1 driver doesn't experience any crashes.
> >
> > And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
> > top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
> > _without_ any debugging was OK.
>
> I worked out that the crash I saw was in
>
> BUG_ON(!pte_none(*(kmap_pte-idx)));
>
> in the read of kmap_pte[idx]. Which would be weird as the caller is using
> a literal KM_USER0.
>
> So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
> unable to reproduce it now).
>
> If that BUG_ON _is_ triggering then it might indicate that someone is doing
> a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.
>
> If they're holding an atomic kmap then they'll be running in_atomic so it
> is unlikely that they accidentally added __GFP_WAIT because lots of people
> would be getting lots of might_sleep() warnings.
>
> Hence that first VM_BUG_ON in prep_zero_page() _should_ be triggering.
>
> Do you have CONFIG_DEBUG_VM enabled?
Yes.
> Also, it might be useful to apply -mm's kmap_atomic-debugging.patch. it
> will detect lots of abuse.
I hit it only once with this patch applied, but there were no additional
warnings.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]