Re: [patch] CFS scheduler, -v12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingo Molnar wrote:
* Peter Williams <[email protected]> wrote:

I've now done this test on a number of kernels: 2.6.21 and 2.6.22-rc1 with and without CFS; and the problem is always present. It's not "nice" related as the all four tasks are run at nice == 0.

could you try -v13 and did this behavior get better in any way?

It's still there but I've got a theory about what the problems is that is supported by some other tests I've done.

What I'd forgotten is that I had gkrellm running as well as top (to observe which CPU tasks were on) at the same time as the spinners were running. This meant that between them top, gkrellm and X were using about 2% of the CPU -- not much but enough to make it possible that at least one of them was running when the load balancer was trying to do its thing.

This raises two possibilities: 1. the system looked balanced and 2. the system didn't look balanced but one of top, gkrellm or X was moved instead of one of the spinners.

If it's 1 then there's not much we can do about it except say that it only happens in these strange circumstances. If it's 2 then we may have to modify the way move_tasks() selects which tasks to move (if we think that the circumstances warrant it -- I'm not sure that this is the case).

To examine these possibilities I tried two variations of the test.

a. run the spinners at nice == -10 instead of nice == 0. When I did this the load balancing was perfect on 10 consecutive runs which according to my calculations makes it 99.9999997 certain that this didn't happen by chance. This supports theory 2 above.

b. run the tests without gkrellm running but use nice == 0 for the spinners. When I did this the load balancing was mostly perfect but was quite volatile (switching between a 2/2 and 1/3 allocation of spinners to CPUs) but the %CPU allocation was quite good with the spinners all getting approximately 49% of a CPU each. This also supports theory 2 above and gives weak support to theory 1 above.

This leaves the question of what to do about it. Given that most CPU intensive tasks on a real system probably only run for a few tens of milliseconds it probably won't matter much on a real system except that a malicious user could exploit it to disrupt a system.

So my opinion is that we probably do need to do something about it but that it's not urgent.

One thing that might work is to jitter the load balancing interval a bit. The reason I say this is that one of the characteristics of top and gkrellm is that they run at a more or less constant interval (and, in this case, X would also be following this pattern as it's doing screen updates for top and gkrellm) and this means that it's possible for the load balancing interval to synchronize with their intervals which in turn causes the observed problem. A jittered load balancing interval should break the synchronization. This would certainly be simpler than trying to change the move_task() logic for selecting which tasks to move.

What do you think?
Peter
--
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux