Steven Rostedt wrote:
On Thu, 2006-02-02 at 12:26 +1100, Peter Williams wrote:
Actually, one of the tasks that was moved might need to resched right
away, since it preempts the current task that is doing the moving.
Good point but I'd say that this was an instance when you didn't want to
bail out of the load balance. E.g. during idle balancing the very first
task moved would trigger it regardless of its priority. Also, if the
task was of sufficiently high priority for it to warrant bailing out of
the load balance why wasn't the current task (i.e. why didn't it preempt
on its current CPU).
Because the task running on the current CPU is higher in priority. That
doesn't mean that the next one down shouldn't get scheduled on another
CPU if it is a higher priority than the currently running one.
Yes, but I don't think that it warrants interrupting the load balancing.
Of
course one needs to be careful not to cause too much cache blasting by
popping RT tasks all over CPUS.
However, a newly woken task that preempts the current task isn't the
only way that needs_resched() can become true just before load balancing
is started. E.g. scheduler_tick() calls set_tsk_need_resched(p) when a
task finishes a time slice and this patch would cause rebalance_tick()
to be aborted after a lot of work has been done in this case.
No real work is lost. This is a loop that individually pulls tasks. So
the bail only stops the work of looking for more tasks to pull and we
don't lose the tasks that have already been pulled.
I disagree. A bunch of work is done to determine which CPU to pull from
and how many tasks to pull and then it will bail out before any of them
are moved (and for no good reason).
Yeah, that was my mistake. There is work lost. So nuke that argument of
mine :)
In summary, I think that the bail out is badly placed and needs some way
of knowing if the reason need_resched() has become true is because of
preemption of a newly woken task and not some other reason.
I need that bail in the loop, so it can stop if needed. Like I said, it
can be a task that is pulled to cause the bail. Also, having the run
queue locks and interrupts off for over a msec is really a bad idea.
Clearly, the way to handle this is to impose some limit on the number of
tasks to be moved or split large moves into a number of smaller moves
(releasing and reacquiring locks in between). This could be done in the
bits of code that set things up before move_tasks() is called.
I think that's something like what Ingo wants to do. Or something other
than my first brain dead patch.
Peter
PS I've added Nick Piggin to the CC list as he is interested in load
balancing issues.
Thanks, and thanks for the comments too. I'm up for all suggestions and
ideas. I just feel it is important that we don't have unbounded
latencies of spin locks and interrupts off.
Well, you seem to have succeeded in starting a discussion :-)
:)
-- Steve
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]