Steven Rostedt wrote:
On Wed, 2006-02-01 at 14:36 +1100, Peter Williams wrote:
However, a newly woken task that preempts the current task isn't the
only way that needs_resched() can become true just before load balancing
is started. E.g. scheduler_tick() calls set_tsk_need_resched(p) when a
task finishes a time slice and this patch would cause rebalance_tick()
to be aborted after a lot of work has been done in this case.
No real work is lost. This is a loop that individually pulls tasks. So
the bail only stops the work of looking for more tasks to pull and we
don't lose the tasks that have already been pulled.
Once we've gone to the trouble of deciding which tasks to move and how
many (and the estimate should be very conservative), and locked the source
and destination runqueues, it is a very good idea to follow up with our
threat of actually moving the tasks rather than bail out early.
It is quite likely (though perhaps less so in this braindead benchmark)
that a subsequent load balance operation will need to move the remaining
tasks anyway, so decreasing the efficiency of this routine is going to cause
more damage to your RT task even if you "fix" it to not actually show up as
a single latency blip.
In summary, I think that the bail out is badly placed and needs some way
of knowing if the reason need_resched() has become true is because of
preemption of a newly woken task and not some other reason.
I need that bail in the loop, so it can stop if needed. Like I said, it
can be a task that is pulled to cause the bail. Also, having the run
queue locks and interrupts off for over a msec is really a bad idea.
I don't think we need to change it in the mainline kernel just yet. At least
not something that will bail after moving just a single task every time in
the worst case.
Peter
PS I've added Nick Piggin to the CC list as he is interested in load
balancing issues.
Thanks, and thanks for the comments too. I'm up for all suggestions and
ideas. I just feel it is important that we don't have unbounded
latencies of spin locks and interrupts off.
That's a long way off (if ever) if you want to run crazy code that simply
increases some resource loading until something breaks. Take an rwsem for
writing and then queue up thousands of readers on the lock. Release the
writer. (this will be very similar to run queue balancing, in effect).
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]