Re: [PATCH] sched: smpnice work around for active_load_balance()

Siddha, Suresh B wrote:

On Thu, Mar 30, 2006 at 10:40:24AM +1100, Peter Williams wrote:
Siddha, Suresh B wrote:
On Wed, Mar 29, 2006 at 02:42:45PM +1100, Peter Williams wrote:
I meant that it doesn't explicitly address your problem. What it doesis ASSUME that failure of load balancing to move tasks is because therewas exactly one task on the source run queue and that this makes it asuitable candidate to have that single task moved elsewhere in the blindhope that it may fix an HT/MC imbalance that may or may not exist. Inmy mind this is very close to random.
That so called assumption happens only when load balancing has
failed for more than the domain specific cache_nice_tries. Only reason
why it can fail so many times is because of all pinned tasks or only a single
task is running on that particular CPU. load balancing code takes care of both
these scenarios..

sched groups cpu_power controls the mechanism of implementing HT/MC
optimizations in addition to active balance code... There is no randomness
in this.
The above explanation just increases my belief in the randomness of thissolution. This code is mostly done without locks and is therefore veryracy and any assumptions made based on the number of times loadbalancing has failed etc. are highly speculative.
Isn't it the same case with regular cpu load calculations during load
balance?

Yes.  Which is why move_tasks() is designed to cope.

But this doesn't effect the argument w.r.t. your code.

And even if there is only one task on the CPU there's no guarantee that
that CPU is in a package that meets the other requirements to make themove desirable. So there's a good probability that you'll be movingtasks unnecessarily.
sched groups cpu_power and domain topology information cleanly
encapsulates the imbalance identification and source/destination groups
to fix the imbalance.

But you don't look at the rest of the queues in the package to see ifthe need is REALLY required.

It's a poor solution and it's being inflicted on architectures thatdon't need it. Even if cache_nice_tries is used to suppress thisbehaviour on architectures that don't need it they have to carry thecode in their kernel.
We can clearly throw CONFIG_SCHED_MC/SMT around that code.. Nick/Ingo
do you see any issue?

That just makes it a poor solution and ugly. :-)

Also back to front and inefficient.
HT/MC imbalance is detected in a normal way.. A lightly loaded group
finds an imbalance and tries to pull some load from a busy group (which
is inline with normal load balance)... pull fails because the only task
on that cpu is busy running and needs to go off the cpu (which is triggered
by active load balance)... Scheduler load balance is generally done by apull mechansim and here (HT/MC) it is still a pull mechanism(triggering afinal push only because of the single running task)
If you have any better generic and simple method, please let us know.
I gave an example in a previous e-mail. Basically, at the end ofscheduler_tick() if rebalance_tick() doesn't move any tasks (it would befoolish to contemplate moving tasks of the queue just after you've movedsome there) and the run queue has exactly one running task and it's timefor a HT/MC rebalance check on the package that this run queue belongsto then check that package to to see if it meets the rest of criteriafor needing to lose some tasks. If it does look for a package that is asuitable recipient for the moved task and if you find one then mark thisrun queue as needing active load balancing and arrange for its migrationthread to be started.
Simple, direct and amenable to being only built on architectures thatneed the functionality.
First of all we will be doing unnecessary checks to see if there is
an imbalance.. Current code triggers the checks and movement only when
it is necessary.. And second, finding the correct destination cpu in thepresence of SMT and MC is really complicated.. Look at different examples
in the OLS paper.. Domain topology provides all this info with no added
complexity...
Another (more complex) solution that would also allow improvements toother HT related code (e.g. the sleeping dependent code) would be tomodify the load balancing code so that all CPUs in a package share a runqueue and load balancing is then done between packages. As long as thenumber of CPUs in a package is small this shouldn't have scalabilityissues. The big disadvantage of this approach is its complexity whichis probably too great to contemplate doing it in 2.6.X kernels.
Presence of SMT and MC, implementation of power-savings scheduler
policy will present more challenges...

And I would recommend a similar approach to what I've suggested above.They could probably be combined into a single neat well encapsulatedsolution.

Peter
--
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- [PATCH] sched: smpnice work around for active_load_balance()
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: Peter Williams <pwil3058@bigpond.net.au>
- Re: [PATCH] sched: smpnice work around for active_load_balance()
  - From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>

Prev by Date: Re: Correction: 2.6.16-git12 killed networking -- 3c900 card
Next by Date: Re: [PATCH][RFC] splice support
Previous by thread: Re: [PATCH] sched: smpnice work around for active_load_balance()
Next by thread: Suspend2-2.2.2 for 2.6.16.
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]