Re: [patch] smpnice: don't consider sched groups which are lightly loaded for balancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Siddha, Suresh B wrote:
On Thu, Apr 20, 2006 at 03:19:52PM +1000, Peter Williams wrote:
This patch doesn't fix this issue for example:
4-way simple MP system. P0 containing two high priority tasks, P1 containing
one high priority and two normal priority tasks, one high priotity task
each on P2, P3. Current load balance doesn't detect/fix the
imbalance by moving one of the normal priority task running on P1 to P2 or P3.
Is this always the case or just a possibility? Please describe the hole it slips through (and please do that every time you provide a scenario).

I thought a scenario is enough to show the hole :) Anyhow, I brought this issue before also..
http://www.ussg.iu.edu/hypermail/linux/kernel/0604.0/0517.html

Load balance on P2 or P3 will always show P0 as max load but it will not
be able to move any load from P0. As
imbalance will be always < busiest_load_per_task and
max_load - this_load will be < imbn(2) * busiest_load_per_task...
and pwr_move will be <= pwr_now...

This will depend on how high the priority of the high priority tasks are relative to normal tasks. E.g. it's quite possible to have two high priority tasks whose combined load weight is less than that of two normal tasks and a high priority task.


Basically sched groups with highest priority tasks can mask the imbalance between the other sched groups with in the same domain.

Sometimes.

I don't think that this stable state is so bad that anything special needs to be done especially as the fact that high priority tasks tend to only use the CPU in short bursts means that it probably won't exist for very long.

To paraphrase Ingo (from another thread), load balancing is a probabilistic exercise. For a start, achieving a deterministic optimal distribution would be an NP algorithm and by the time you determined the correct distribution (which could be a very long time) the "state" information on which the determination was based would have changed (possibly a lot). This latter bit (probably minus the possibly a lot) is true anyway as find_busiest_group() and find_busiest_queue() are called without locks meaning that the state upon which their results are determined may change before move_tasks() is called.

I think this justifies saying that this scenario probably doesn't matter and, therefore, fixing it isn't urgent.

BTW I agree with your earlier statements that the modification to move_tasks() to circumvent the skip mechanism in some circumstances needs to be refined so that it doesn't move the highest priority task of the busiest queue. I'll be submitting a patch later today.

I think that the next thing that needs to be addressed after that is a modification to try_to_wake_up() to improve the distribution of high priority tasks across CPUs. I think that just sticking them on any CPU and waiting for the load balancing code to kick in and move them unnecessarily increases their latency.

Peter
--
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux