Re: [PATCH] Avoid moving tasks when a schedule can be made.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2006-02-01 at 14:36 +1100, Peter Williams wrote:
> Steven Rostedt wrote:
> > I found this in the -rt kernel.  While running "hackbench 20" I hit
> > latencies of over 1.5 ms.  That is huge!  This latency was created by
> > the move_tasks function in sched.c to rebalance the queues over CPUS.  
> > 
> > There currently isn't any check in this function to see if it should
> > stop, thus a large number of tasks can drive the latency high.
> > 
> > With the below patch, (tested on -rt with latency tracing), the latency
> > caused by hackbench disappeared below my notice threshold (100 usecs).
> > 
> > I'm not convinced that this bail out is in the right location, but it
> > worked where it is.  Comments are welcome.
> > 

> 
> I presume that the intention here is to allow a newly woken task that 
> preempts the current task to stop the load balancing?
> 
> As I see it (and I may be wrong), for this to happen, the task must have 
> woken before the run queue locks were taken (otherwise it wouldn't have 
> got as far as activation) i.e. before move_tasks() is called and 
> therefore you may as well just do this check at the start of move_tasks().

Actually, one of the tasks that was moved might need to resched right
away, since it preempts the current task that is doing the moving.

> 
> However, a newly woken task that preempts the current task isn't the 
> only way that needs_resched() can become true just before load balancing 
> is started.  E.g. scheduler_tick() calls set_tsk_need_resched(p) when a 
> task finishes a time slice and this patch would cause rebalance_tick() 
> to be aborted after a lot of work has been done in this case.

No real work is lost.  This is a loop that individually pulls tasks.  So
the bail only stops the work of looking for more tasks to pull and we
don't lose the tasks that have already been pulled.

> 
> In summary, I think that the bail out is badly placed and needs some way 
> of knowing if the reason need_resched() has become true is because of 
> preemption of a newly woken task and not some other reason.

I need that bail in the loop, so it can stop if needed. Like I said, it
can be a task that is pulled to cause the bail. Also, having the run
queue locks and interrupts off for over a msec is really a bad idea.

> 
> Peter
> PS I've added Nick Piggin to the CC list as he is interested in load 
> balancing issues.

Thanks, and thanks for the comments too.  I'm up for all suggestions and
ideas.  I just feel it is important that we don't have unbounded
latencies of spin locks and interrupts off.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux