Re: [git] CFS-devel, group scheduler, fixes

On Tue, Sep 18, 2007 at 11:03:59PM -0700, Tong Li wrote:
> This patch attempts to improve CFS's SMP global fairness based on the new 
> virtual time design.
> 
> Removed vruntime adjustment in set_task_cpu() as it skews global fairness.
> 
> Modified small_imbalance logic in find_busiest_group(). If there's small 
> imbalance, move tasks from busiest to local sched_group only if the local 
> group contains a CPU whose min_vruntime is the maximum among all CPUs in 
> the same sched_domain. This prevents any CPU from advancing too far ahead 
> in virtual time and avoids tasks thrashing between two CPUs without 
> utilizing other CPUs in the system. For example, for 10 tasks on 8 CPUs, 
> since the load is not evenly divisible by the number of CPUs, we want the 
> extra load to have a fair use of every CPU in the system.
> 
> Tested with a microbenchmark running 10 nice-0 tasks on 8 CPUs. Each task 

Just as an experiment, can you run 82 tasks on 8 CPUs. Current
imbalance_pct logic will not detect and fix the global fairness issue
even with this patch.

>  	if (*imbalance < busiest_load_per_task) {
> -		unsigned long tmp, pwr_now, pwr_move;
> -		unsigned int imbn;
> -
>  small_imbalance:
> -		pwr_move = pwr_now = 0;
> -		imbn = 2;
> -		if (this_nr_running) {
> -			this_load_per_task /= this_nr_running;
> -			if (busiest_load_per_task > this_load_per_task)
> -				imbn = 1;
> -		} else
> -			this_load_per_task = SCHED_LOAD_SCALE;
> -
> -		if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
> -					busiest_load_per_task * imbn) {
> -			*imbalance = busiest_load_per_task;
> -			return busiest;
> -		}

This patch removes quite a few lines and all this is logic is not for
fairness :( Especially the above portion handles some of the HT/MC
optimizations.

> -
> -		/*
> -		 * OK, we don't have enough imbalance to justify moving 
> tasks,
> -		 * however we may be able to increase total CPU power used by
> -		 * moving them.
> +		/* 
> +		 * When there's small imbalance, move tasks only if this
> +		 * sched_group contains a CPU whose min_vruntime is the 
> +		 * maximum among all CPUs in the same domain.
>  		 */
> -
> -		pwr_now += busiest->__cpu_power *
> -				min(busiest_load_per_task, max_load);
> -		pwr_now += this->__cpu_power *
> -				min(this_load_per_task, this_load);
> -		pwr_now /= SCHED_LOAD_SCALE;
> -
> -		/* Amount of load we'd subtract */
> -		tmp = sg_div_cpu_power(busiest,
> -				busiest_load_per_task * SCHED_LOAD_SCALE);
> -		if (max_load > tmp)
> -			pwr_move += busiest->__cpu_power *
> -				min(busiest_load_per_task, max_load - tmp);
> -
> -		/* Amount of load we'd add */
> -		if (max_load * busiest->__cpu_power <
> -				busiest_load_per_task * SCHED_LOAD_SCALE)
> -			tmp = sg_div_cpu_power(this,
> -					max_load * busiest->__cpu_power);
> -		else
> -			tmp = sg_div_cpu_power(this,
> -				busiest_load_per_task * SCHED_LOAD_SCALE);
> -		pwr_move += this->__cpu_power *
> -				min(this_load_per_task, this_load + tmp);
> -		pwr_move /= SCHED_LOAD_SCALE;
> -
> -		/* Move if we gain throughput */
> -		if (pwr_move > pwr_now)
> +		if (max_vruntime_group == this)
>  			*imbalance = busiest_load_per_task;
> +		else
> +			*imbalance = 0;

Not sure how this all interacts when some of the cpu's are idle. I have to
look more closely.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [git] CFS-devel, group scheduler, fixes
  - From: Tong Li <tong.n.li@intel.com>

References:
- Re: [git] CFS-devel, group scheduler, fixes
  - From: dimm <dmitry.adamushko@gmail.com>
- Re: [git] CFS-devel, group scheduler, fixes
  - From: Ingo Molnar <mingo@elte.hu>
- Re: [git] CFS-devel, group scheduler, fixes
  - From: Tong Li <tong.n.li@intel.com>

Prev by Date: Re: iso9660 vs udf
Next by Date: Re: 2.6.23-rc6-mm1 powerpc - kgdb is broken
Previous by thread: Re: [git] CFS-devel, group scheduler, fixes
Next by thread: Re: [git] CFS-devel, group scheduler, fixes
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]