Re: [PATCH] netpoll can lock up on low memory.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2005-08-05 at 18:53 -0700, Matt Mackall wrote:
> On Fri, Aug 05, 2005 at 08:23:55PM -0400, Steven Rostedt wrote:
[...]
> > If you need to really get the data out, then the design should be
> > changed.  Have some return value showing the failure, check for
> > oops_in_progress or whatever, and try again after turning interrupts
> > back on, and getting to a point where the system can free up memory
> > (write to swap, etc).  Just a busy loop without ever getting a skb is
> > just bad.
> 
> Why, pray tell, do you think there will be a second chance after
> re-enabling interrupts? How does this work when we're panicking or
> oopsing where we most care? How does this work when the netpoll client
> is the kernel debugger and the machine is completely stopped because
> we're tracing it?

What I meant was to check for an oops and maybe then don't break out.
Otherwise let the system try to reclaim memory. Since this is locked
when the alloc_skb called with GFP_ATOMIC and fails.

> 
> As for busy loops, let me direct you to the "poll" part of the name.
> It is in fact the whole point.

In the kernel I would think that a poll would probe for an event and let
the system continue if the event hasn't arrived.  Not block all
activities until an event has arrived.

> 
> > > > So even a long timeout would not do?  So you don't even get a message to
> > > > the console?
> > > 
> > > In general, there's no way to measure time here. And if we're
> > > using netconsole, what makes you think there's any other console?
> > 
> > Why assume that there isn't another console?  The screen may be used
> > with netconsole, you just lose whatever has been scrolled too far.
> 
> Yes, there may be another console, but we should by no means depend on
> that being the case. We should in fact assume it's not.
> 
> > > > > > Also, as Andi told me, the printk here would probably not show up
> > > > > > anyway if this happens with netconsole.
> > > > > 
> > > > > That's fine. But in fact, it does show up occassionally - I've seen
> > > > > it.
> > > > 
> > > > Then maybe what Andi told me is not true ;-)
> > > > 
> > > > Oh, and did your machine crash when you saw it?  Have you seen it with
> > > > the e1000 driver?
> > > 
> > > No and no. Most of my own testing is done with tg3.
> > > 
> > 
> > If you saw the message and the system didn't crash, then that's proof
> > that if the driver is not working properly, you would have lock up the
> > system, and the system was _not_ in a state that it _had_ to get the
> > message out.
> 
> Let me be more precise. I've seen it in the middle of an oops dump,
> where it complained, then made further progress, and then died. In
> other words, the code works. And I've since upped the pool size.

OK, this is more clear than what you said previously.  When I asked if
the system crashed, I should have asked if the system was crashing.  I
thought that you meant that you saw this in normal activity with no
oops.

So, if anything, this discussion has pointed out that the e1000 has a
problem with its netpoll.  I wrote an earlier patch, but since I don't
own a e1000, someone will need to test it, or at least check to see if
it looks OK.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux