Re: [PATCH] netpoll can lock up on low memory.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2005-08-05 at 14:28 -0700, Matt Mackall wrote:
> 
> Netpoll generally must assume it won't get a second chance, as it's
> being called by things like oops() and panic() and used by things like
> kgdb. If netpoll fails, the box is dead anyway.
> 

But it is also being called by every printk in the kernel. What happens
when the printk that causes this lock up is not a panic but just some
info print.  One would use netconsole when they turn on more verbose
printing, to keep the output fast, right?  So if the system gets a
little memory tight, but not to the point of failing, this will cause a
lock up and no one would know why. 

If you need to really get the data out, then the design should be
changed.  Have some return value showing the failure, check for
oops_in_progress or whatever, and try again after turning interrupts
back on, and getting to a point where the system can free up memory
(write to swap, etc).  Just a busy loop without ever getting a skb is
just bad.

> > > The netpoll philosophy is to assume that its traffic is an absolute
> > > priority - it is better to potentially hang trying to deliver a panic
> > > message than to give up and crash silently.
> > 
> > So even a long timeout would not do?  So you don't even get a message to
> > the console?
> 
> In general, there's no way to measure time here. And if we're
> using netconsole, what makes you think there's any other console?

Why assume that there isn't another console?  The screen may be used
with netconsole, you just lose whatever has been scrolled too far.

> 
> > > > Also, as Andi told me, the printk here would probably not show up
> > > > anyway if this happens with netconsole.
> > > 
> > > That's fine. But in fact, it does show up occassionally - I've seen
> > > it.
> > 
> > Then maybe what Andi told me is not true ;-)
> > 
> > Oh, and did your machine crash when you saw it?  Have you seen it with
> > the e1000 driver?
> 
> No and no. Most of my own testing is done with tg3.
> 

If you saw the message and the system didn't crash, then that's proof
that if the driver is not working properly, you would have lock up the
system, and the system was _not_ in a state that it _had_ to get the
message out.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux