At 05:34 PM 1/7/2006 +1100, Peter Williams wrote:
Mike Galbraith wrote:
I'm trying to think of ways to quell the nasty side of sleep_avg without
destroying the good. One method I've tinkered with in the past with
encouraging results is to compute a weighted slice_avg, which is a
measure of how long it takes you to use your slice, and scale it to match
MAX_SLEEPAVG for easy comparison. A possible use thereof: In order to
be classified interactive, you need the sleep_avg, but that's not
enough... you also have to have a record of sharing the cpu. When your
slice_avg degrades enough as you burn cpu, you no longer get to loop in
the active queue. Being relegated to the expired array though will
improve your slice_avg and let you regain your status. Your priority
remains, so you can still preempt, but you become mortal and have to
share. When there is a large disparity between sleep_avg and slice_avg,
it can be used as a general purpose throttle to trigger
TASK_NONINTERACTIVE flagging in schedule() as negative feedback for the
ill behaved. Thoughts?
Sounds like the kind of thing that's required. I think the deferred shift
from active to expired is safe as long as CPU hogs can't exploit it and
your scheme sounds like it might provide that assurance. One problem this
solution will experience is that when the system gets heavily loaded every
task will have small CPU usage rates (even the CPU hogs) and this makes it
harder to detect the CPU hogs.
True. A gaggle of more or less equally well (or not) behaving tasks will
have their 'hogginess' diluted. I'll have to think more about scaling
with nr_running or maybe starting the clock at first tick of a new slice...
that should still catch most of the guys who are burning hard without being
preempted, or only sleeping for short intervals only to keep coming right
back to beat up poor cc1. I think the real problem children should stick
out enough for a proof of concept even without additional complexity.
One slight variation of your scheme would be to measure the average
length of the CPU runs that the task does (i.e. how long it runs without
voluntarily relinquishing the CPU) and not allowing them to defer the
shift to the expired array if this average run length is greater than
some specified value. The length of this average for each task shouldn't
change with system load. (This is more or less saying that it's ok for a
task to stay on the active array provided it's unlikely to delay the
switch between the active and expired arrays for very long.)
Average burn time would indeed probably be a better metric, but that would
require doing bookkeeping is the fast path. I'd like to stick to tick time
or even better, slice renewal time if possible to keep it down on the 'dead
simple and dirt cheap' shelf. After all, this kind of thing is supposed to
accomplish absolutely nothing meaningful the vast majority of the time :)
Thanks for the feedback,
-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]