On Wed, 2006-02-01 at 14:36 +1100, Peter Williams wrote:
> Steven Rostedt wrote:
> > I found this in the -rt kernel. While running "hackbench 20" I hit
> > latencies of over 1.5 ms. That is huge! This latency was created by
> > the move_tasks function in sched.c to rebalance the queues over CPUS.
> >
> > There currently isn't any check in this function to see if it should
> > stop, thus a large number of tasks can drive the latency high.
> >
> > With the below patch, (tested on -rt with latency tracing), the latency
> > caused by hackbench disappeared below my notice threshold (100 usecs).
> >
> > I'm not convinced that this bail out is in the right location, but it
> > worked where it is. Comments are welcome.
> >
>
> I presume that the intention here is to allow a newly woken task that
> preempts the current task to stop the load balancing?
>
> As I see it (and I may be wrong), for this to happen, the task must have
> woken before the run queue locks were taken (otherwise it wouldn't have
> got as far as activation) i.e. before move_tasks() is called and
> therefore you may as well just do this check at the start of move_tasks().
Actually, one of the tasks that was moved might need to resched right
away, since it preempts the current task that is doing the moving.
>
> However, a newly woken task that preempts the current task isn't the
> only way that needs_resched() can become true just before load balancing
> is started. E.g. scheduler_tick() calls set_tsk_need_resched(p) when a
> task finishes a time slice and this patch would cause rebalance_tick()
> to be aborted after a lot of work has been done in this case.
No real work is lost. This is a loop that individually pulls tasks. So
the bail only stops the work of looking for more tasks to pull and we
don't lose the tasks that have already been pulled.
>
> In summary, I think that the bail out is badly placed and needs some way
> of knowing if the reason need_resched() has become true is because of
> preemption of a newly woken task and not some other reason.
I need that bail in the loop, so it can stop if needed. Like I said, it
can be a task that is pulled to cause the bail. Also, having the run
queue locks and interrupts off for over a msec is really a bad idea.
>
> Peter
> PS I've added Nick Piggin to the CC list as he is interested in load
> balancing issues.
Thanks, and thanks for the comments too. I'm up for all suggestions and
ideas. I just feel it is important that we don't have unbounded
latencies of spin locks and interrupts off.
-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]