Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andi Kleen <[email protected]> wrote:

> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > no. (that's why i added the '(or a kill -9)' qualification above - if 
> > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
> > should not have an interrupting effect.)
> 
> NFS is already interruptible with umount -f (I use that all the 
> time...), but softlockup won't know that and throw the warning 
> anyways.

umount -f is a spectacularly unintelligent solution (it requires the 
user to know precisely which path to umount, etc.), TASK_KILLABLE is a 
lot more useful.

> > your syslet snide comment aside (which is quite incomprehensible - a
> 
> For the record I have no principle problem with syslets, just I do 
> consider them roughly equivalent in end result to a explicit retry 
> based AIO implementation.

which suggests you have not really understood syslets. Syslets have no 
"retry" component, they just process straight through the workflow. 
Retry based AIO has a retry component, which - as its name suggests 
already - retries operations instead of processing through the workload 
intelligently. Depending on how "deep" the context of an operation the 
retries might or might not make a noticeable difference in performance, 
but it sure is an inferior approach.

> > retry based asynchonous IO model is clearly inferior even if it were 
> > implemented everywhere), i do think that most if not all of these 
> > supposedly "difficult to fix" codepaths are just on the backburner 
> > out of lack of a clear blame vector.
> 
> Hmm. -ENOPARSE. Can you please clarify?

which bit was unclear to you? The retry bit i've explained above, lemme 
know if there's any other unclarity.

> > "audit thousands of callsites in 8 million lines of code first" is a 
> > nice euphemism for hiding from the blame forever. We had 10 years 
> > for it
> 
> Ok your approach is then to "let's warn about it and hope it will go 
> away"

s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
the warnings are not excessively bogus, of course]

and note that this is just a happy side-effect - the primary motivation 
is to get warnings about tasks that are uninterruptible forever. (which 
is a quite common kernel bug pattern.)

> Anyways I think I could live with it a one liner warning (if it's 
> seriously rate limited etc.) and a sysctl to enable the backtraces; 
> off by default. Or if you prefer that record the backtrace always in a 
> buffer and make it available somewhere in /proc or /sys or /debug. 
> Would that work for you?

you are over-designing it way too much - a backtrace is obviously very 
helpful and it must be printed by default. There's enough 
configurability in it already so that you can turn it off if you want. 
(And you said SLES has softlockup turned off already so it shouldnt 
affect you anyway.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux