Re: [PATCH] sched: prevent high load weight tasks suppressing balancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This breaks HT and MC optimizations.. Consider a DP system with each
physical processor having two HT logical threads.. if there are two 
runnable processes running on package-0, with this patch scheduler 
will never move one of those processes to package-1..

thanks,
suresh

On Mon, Mar 27, 2006 at 05:33:30PM +1100, Peter Williams wrote:
> Problem:
> 
> On systems with more than 2 CPUs it is possible for a single task with a 
> high smpnice load weight to suppress load balancing on other CPUs (to 
> the one that it's running on) if it is the only runnable task on its 
> CPU. E.g. consider a 4-way system (simple SMP system with no HT and 
> cores) scenario where a high priority task (nice-20) is running on P0 
> and two normal priority tasks running on P1. load balance with smp nice 
> code will never be able to detect an imbalance and hence will never move 
> one of the normal priority tasks on P1 to idle cpus P2 or P3 as P0 will 
> always be identified as the busiest CPU but it has no tasks that can be 
> moved.
> 
> Solution:
> 
> Make sure that only CPUs with tasks that can be moved get selected as 
> the busiest queue.  This involves ensuring that find_busiest_group() 
> only considers groups that have at least one CPU with more than one task 
> running as candidates for the busiest group and that 
> find_busiest_queue() only considers CPUs that have more than one task 
> running as candidates for the busiest run queue.
> 
> One effect of this is that load balancing will be abandoned earlier in 
> the sequence (i.e. before the double run queue locks are taken prior to 
> calling move_tasks() rather than in move_tasks() itself) when there are 
> no tasks that can be moved than would be the case without this patch.
> 
> Signed-off-by: Peter Williams <[email protected]>
> 
> Peter
> PS This doesn't take into account tasks that can't be moved because they 
> are pinned to a particular CPU.  At this stage, I don't think that it's 
> worth the effort to make the changes that would enable this.
> -- 
> Peter Williams                                   [email protected]
> 
> "Learning, n. The kind of ignorance distinguishing the studious."
>   -- Ambrose Bierce

> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c	2006-03-27 16:00:12.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c	2006-03-27 17:02:53.000000000 +1100
> @@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
>  		int local_group;
>  		int i;
>  		unsigned long sum_nr_running, sum_weighted_load;
> +		unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
>  
>  		local_group = cpu_isset(this_cpu, group->cpumask);
>  
> @@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
>  
>  			avg_load += load;
>  			sum_nr_running += rq->nr_running;
> +			if (rq->nr_running > 1)
> +				++nr_loaded_cpus;
>  			sum_weighted_load += rq->raw_weighted_load;
>  		}
>  
> @@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
>  			this = group;
>  			this_nr_running = sum_nr_running;
>  			this_load_per_task = sum_weighted_load;
> -		} else if (avg_load > max_load) {
> +		} else if (avg_load > max_load && nr_loaded_cpus) {
>  			max_load = avg_load;
>  			busiest = group;
>  			busiest_nr_running = sum_nr_running;
> @@ -2158,7 +2161,7 @@ find_busiest_group(struct sched_domain *
>  		group = group->next;
>  	} while (group != sd->groups);
>  
> -	if (!busiest || this_load >= max_load || busiest_nr_running <= 1)
> +	if (!busiest || this_load >= max_load)
>  		goto out_balanced;
>  
>  	avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
> @@ -2260,16 +2263,16 @@ out_balanced:
>  static runqueue_t *find_busiest_queue(struct sched_group *group,
>  	enum idle_type idle)
>  {
> -	unsigned long load, max_load = 0;
> -	runqueue_t *busiest = NULL;
> +	unsigned long max_load = 0;
> +	runqueue_t *busiest = NULL, *rqi;
>  	int i;
>  
>  	for_each_cpu_mask(i, group->cpumask) {
> -		load = weighted_cpuload(i);
> +		rqi = cpu_rq(i);
>  
> -		if (load > max_load) {
> -			max_load = load;
> -			busiest = cpu_rq(i);
> +		if (rqi->raw_weighted_load > max_load && rqi->nr_running > 1) {
> +			max_load = rqi->raw_weighted_load;
> +			busiest = rqi;
>  		}
>  	}
>  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux