Re: additional oom-killer tuneable worth submitting?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/12/06, Chris Friesen <[email protected]> wrote:
Jesper Juhl wrote:
>> Jesper Juhl wrote:

>> > What happens in the case where the OOM killer really, really needs to
>> > kill one or more processes since there is not a single drop of memory
>> > available, but all processes are below their configured thresholds?

> I realize that if this case happens the system is misconfigured as far
> as oomthresh goes, but if this is a knob that we put in the mainline
> kernel then I believe there should be some sort of emergency handling
> code that takes this situation into account.  Perhaps throw some very
> nasty looking log messages and then fall back to the classic OOM
> killer behaviour..?

Yeah, I can see that the reboot might be a bit drastic for mainline.  I
think the fallback to classic behaviour might work okay.

Anyway, the chances of hitting that case are likely pretty slim.  The
way we've been using this is to only set the threshold for fairly
important long-lived daemons.  Much of the "standard" stuff (shell, cat,
cp, mv, etc.) is left unprotected.

Sure, that's sensible, to only protect the important stuff.
But even if the chances of hitting this are slim, we still need a way
out. For most people anything is better than a hung box.

Some examples;

For a desktop (where people may be experimenting with the feature) -
seeing your firefox process evaporate due to the OOM killer and then
finding a message explaining what happened in dmesg is a lot less
frustrating than a hang or sudden reboot.

For a server - If you mis-configure the new feature you may be in for
a long drive to reboot a box whereas falling back to the classic OOM
killer (+ nasty messages in dmesg) will likely save you the trip and
clue you in as to what you mis-configured.

For an embedded box - triggering a reboot would probably be better
than both a hang or classic OOM kill in many cases (better to have the
device reboot and come back working than to hang or start
malfunctioning due to a missing process).

So maybe what's needed is an additional knob for people to tweak - one
that selects what should happen in this rare case: 1) fallback to
classic OOM (default), 2) reboot, 3) hang.   In all cases messages
should be logged explaining what happened.
Or is that overkill?  If so I'd personally prefer just falling back to
classic OOM kill in this case.

A way out for the "OOM but all processes below threshold" case +
perhaps coupled with oomthresh applying to process groups instead of
just processes and I personally start to like this feature.

Let's see some code...

--
Jesper Juhl <[email protected]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux