Re: Database regression due to scheduler changes ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick wrote:

I would also take a look at removing SD_WAKE_IDLE from the flags.
This flag should make balancing more aggressive, but it can have
problems when applied to a NUMA domain due to too much task
movement.

Anton wrote:
I was wondering how ppc64 ended up with different parameters in the NODE
definitions (added SD_BALANCE_NEWIDLE and SD_WAKE_IDLE)    and it looks
like it was Andrew :)

http://lkml.org/lkml/2004/11/2/205

FWIW I changed all arch's, but most (except ppc) got changed back. At the time we had data showing the more aggressive wake idle and newidle was good for things like OLTP.

Brian, do you have cpu util numbers and runqueue lengths for both tests?


It looks like balancing was not agressive enough on his workload too.
Im a bit uneasy with only ppc64 having the two flags though.

Brian wrote:

We suspect the regression was introduced in the scheduler changes
that went into 2.6.13-rc1.  However, the regression was hidden
from us by a bug in include/asm-ppc64/topology.h that made ppc64
look non-NUMA from 2.6.13-rc1 through 2.6.13-rc4.  That bug was
fixed in 2.6.13-rc5.  Unfortunately the workload does not run to
completion on 2.6.12 or 2.6.13-rc1.

Brian, I am not sure if you were thinking of a particular set of sched changes, but I suspect it might be one or more in the list below (my guess is the first and last). Would it be possible to back out these change-sets from 2.6.13-rc5 and see if there is any difference? FWIW, even if they do help, I am not suggesting, yet, that they should be reverted. I am hoping there is some compromise that can work better in all situations.

-Andrew

commit cafb20c1f9976a70d633bb1e1c8c24eab00e4e80
Author: Nick Piggin <[email protected]>
Date:   Sat Jun 25 14:57:17 2005 -0700

   [PATCH] sched: no aggressive idle balancing
Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
   going against the direction we are trying to go.  Hopefully we can regain
   performance through other methods.
Signed-off-by: Nick Piggin <[email protected]>
   Signed-off-by: Andrew Morton <[email protected]>
   Signed-off-by: Linus Torvalds <[email protected]>

commit a3f21bce1fefdf92a4d1705e888d390b10f3ac6f
Author: Nick Piggin <[email protected]>
Date:   Sat Jun 25 14:57:15 2005 -0700

   [PATCH] sched: tweak affine wakeups
Do less affine wakeups. We're trying to reduce dbt2-pgsql idle time
   regressions here...  make sure we don't don't move tasks the wrong way in an
   imbalance condition.  Also, remove the cache coldness requirement from the
   calculation - this seems to induce sharp cutoff points where behaviour will
   suddenly change on some workloads if the load creeps slightly over or under
   some point.  It is good for periodic balancing because in that case have
   otherwise have no other context to determine what task to move.
But also make a minor tweak to "wake balancing" - the imbalance tolerance is
   now set at half the domain's imbalance, so we get the opportunity to do wake
   balancing before the more random periodic rebalancing gets preformed.
Signed-off-by: Nick Piggin <[email protected]>
   Signed-off-by: Andrew Morton <[email protected]>
   Signed-off-by: Linus Torvalds <[email protected]>

commit 7897986bad8f6cd50d6149345aca7f6480f49464
Author: Nick Piggin <[email protected]>
Date:   Sat Jun 25 14:57:13 2005 -0700

   [PATCH] sched: balance timers
Do CPU load averaging over a number of different intervals. Allow each
   interval to be chosen by sending a parameter to source_load and target_load.
   0 is instantaneous, idx > 0 returns a decaying average with the most recent
   sample weighted at 2^(idx-1).  To a maximum of 3 (could be easily increased).
So generally a higher number will result in more conservative balancing. Signed-off-by: Nick Piggin <[email protected]>
   Signed-off-by: Andrew Morton <[email protected]>
   Signed-off-by: Linus Torvalds <[email protected]>

commit 99b61ccf0bf0e9a85823d39a5db6a1519caeb13d
Author: Nick Piggin <[email protected]>
Date:   Sat Jun 25 14:57:12 2005 -0700

   [PATCH] sched: less aggressive idle balancing
Remove the special casing for idle CPU balancing. Things like this are
   hurting for example on SMT, where are single sibling being idle doesn't really
   warrant a really aggressive pull over the NUMA domain, for example.
Signed-off-by: Nick Piggin <[email protected]>
   Signed-off-by: Andrew Morton <[email protected]>
   Signed-off-by: Linus Torvalds <[email protected]>




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux