Darren Hart wrote:
My last mail specifically addresses preempt-rt, but I'd like to know people's
thoughts regarding this issue in the mainline kernel. Please see my previous
post "realtime-preempt scheduling - rt_overload behavior" for a testcase that
produces unpredictable scheduling results.
Part of the issue here is to define what we consider "correct behavior" for
SCHED_FIFO realtime tasks. Do we (A) need to strive for "strict realtime
priority scheduling" where the NR_CPUS highest priority runnable SCHED_FIFO
tasks are _always_ running? Or do we (B) take the best effort approach with
an upper limit RT priority imbalances, where an imbalance may occur (say at
wakeup or exit) but will be remedied within 1 tick. The smpnice patches
improve load balancing, but don't provide (A).
More details in the previous mail...
I'm currently researching some ideas to improve smpnice that may help in
this situation. The basic idea is that as well as trying to equally
distribute the weighted load among the groups/queues we should also try
to achieve equal "average load per task" for each group/queue. (As well
as helping with problems such as yours, this will help to restore the
"equal distribution of nr_running" amongst groups/queues aim that is
implicit without smpnice due to the fact that load is just a smoothed
version of nr_running.)
In find_busiest_group(), I think that load balancing in the case where
*imbalance is greater than busiest_load_per_task will tend towards this
result and also when *imbalance is less than busiest_load_per_task AND
busiest_load_per_task is less than this_load_per_task. However, in the
case where *imbalance is less than busiest_load_per_task AND
busiest_load_per_task is greater than this_load_per_task this will not
be the case as the amount of load moved from "busiest" to "this" will be
less than or equal to busiest_load_per_task and this will actually
increase the value of busiest_load_per_task. So, although it will
achieve the aim of equally distributing the weighted load, it won't help
the second aim of equal "average load per task" values for groups/queues.
The obvious way to fix this problem is to alter the code so that more
than busiest_load_per_task is moved from "busiest" to "this" in these
cases while at the same time ensuring that the imbalance between their
loads doesn't get any bigger. I'm working on a patch along these lines.
Changes to find_idlest_group() and try_to_wake_up() taking into account
the "average load per task" on the candidate queues/groups as well as
their weighted loads may also help and I'll be looking at them as well.
It's not immediately obvious to me how this can be done so any ideas
would be welcome. It will likely involve taking the load weight of the
waking task into account as well.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]