Re: Processes spinning forever, apparently in lock_timer_base()?

On Fri, 21 Sep 2007 16:58:15 +0100 (BST) Hugh Dickins
<hugh@veritas.com> wrote:

> But once I look harder at it, I wonder what would have kept
> 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
> the global vm stats too low to get synched back to global, yet
> adding up to something which misleads balance_dirty_pages into
> an indefinite loop e.g. total nr_writeback actually 0, but
> appearing more than dirty_thresh in the global approximation.

This could only happen when: dirty_thresh < nr_cpus * per_cpu_max_delta

> Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
> There is a refresh_vm_stats function which we could call (then retest
> the break condition) just before resorting to congestion_wait.  But
> the big NUMA people might get very upset with me calling that too
> often: causing a thundering herd of bouncing cachelines which that
> was all designed to avoid.  And it's not obvious to me what condition
> to test for dirty_thresh "too low".

That could be modeled on the error limit I have. For this particular
case that would end up looking like:

  nr_online_cpus * pcp->stat_threshold.

> I believe Peter gave all this quite a lot of thought when he was
> making the rc6-mm1 changes, and I'd rather defer to him for a
> suggestion of what best to do in earlier releases.  Or maybe he'll
> just point out how this couldn't have been a problem before.

As outlined above, and I don't think we'll ever have such a low
dirty_limit. But who knows :-)

> Or there is is Richard's patch, which I haven't considered, but
> Andrew was not quite satisfied with it - partly because he'd like
> to understand how the situation could come about first, perhaps
> we have now got an explanation.

I'm with Andrew on this, that is, quite puzzled on how all this arises.

Testing those writeback-fix-* patches might help rule out (or point to)
a mis-function of pdflush.

The theory that one task will spin in balance_dirty_pages() on a
bdi that does not actually have many dirty pages, doesn't sound
plausible because eventually the total dirty count (well, actually
dirty+unstable+writeback) should subside again. This theory can
cause crappy latencies, but should not 'hang' the machine.

> (The original bug report was indeed on SMP, but I haven't seen
> anyone say that's a necessary condition for the hang: it would
> be if this is the issue.  And Richard writes at one point of the
> system only responding to AltSysRq: that would be surprising for
> this issue, though it's possible that a task in balance_dirty_pages
> is holding an i_mutex that everybody else comes to need.)

Are we actually holding i_mutex on paths that lead into
balance_dirty_pages? that does (from my admittedly limited knowledge of
the vfs) sound like trouble, since we'd need it to complete writeback.

All quite puzzling.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <cebbert@redhat.com>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <akpm@linux-foundation.org>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <matthias@wspse.de>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <matthias@wspse.de>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <akpm@linux-foundation.org>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <cebbert@redhat.com>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <akpm@linux-foundation.org>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andy Whitcroft <apw@shadowen.org>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Hugh Dickins <hugh@veritas.com>

Prev by Date: Re: x86_64: potential critical issue with quicklists and page table pages
Next by Date: [PATCH] aio: don't confuse DEBUG define location
Previous by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Next by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]