Re: Processes spinning forever, apparently in lock_timer_base()?

On Fri, 21 Sep 2007, Andy Whitcroft wrote:
> This sounds an awful lot like the same problem I reported with fsck
> hanging.  I believe that Hugh had a candidate patch for that, which was
> related to dirty tracking limits.  It seems that that patch tested, and
> acked by Peter.  All on lkml under:
> 
> 	2.6.23-rc6-mm1 -- mkfs stuck in 'D'

You may well be right.

My initial reaction was to point out that my patch was to a 
2.6.23-rc6-mm1 implementation detail, whereas this thread is
about a problem seen since 2.6.20 or perhaps earlier.

But once I look harder at it, I wonder what would have kept
2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
the global vm stats too low to get synched back to global, yet
adding up to something which misleads balance_dirty_pages into
an indefinite loop e.g. total nr_writeback actually 0, but
appearing more than dirty_thresh in the global approximation.

Certainly my rc6-mm1 patch won't work here, and all it was doing
was apply some safety already added by Peter to one further case.

Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
There is a refresh_vm_stats function which we could call (then retest
the break condition) just before resorting to congestion_wait.  But
the big NUMA people might get very upset with me calling that too
often: causing a thundering herd of bouncing cachelines which that
was all designed to avoid.  And it's not obvious to me what condition
to test for dirty_thresh "too low".

I believe Peter gave all this quite a lot of thought when he was
making the rc6-mm1 changes, and I'd rather defer to him for a
suggestion of what best to do in earlier releases.  Or maybe he'll
just point out how this couldn't have been a problem before.

Or there is is Richard's patch, which I haven't considered, but
Andrew was not quite satisfied with it - partly because he'd like
to understand how the situation could come about first, perhaps
we have now got an explanation.

(The original bug report was indeed on SMP, but I haven't seen
anyone say that's a necessary condition for the hang: it would
be if this is the issue.  And Richard writes at one point of the
system only responding to AltSysRq: that would be surprising for
this issue, though it's possible that a task in balance_dirty_pages
is holding an i_mutex that everybody else comes to need.)

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Peter Zijlstra <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>

References:
- Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andy Whitcroft <[email protected]>

Prev by Date: Re: [PATCH] change inotifyfs magic as the same magic is used for futexfs
Next by Date: Re: [PATCH] Reduce __print_symbol/sprint_symbol stack usage. (v3)
Previous by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Next by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]