Re: Processes spinning forever, apparently in lock_timer_base()?

On Thu, 2007-09-20 at 15:36 -0700, Andrew Morton wrote:
> On Thu, 20 Sep 2007 18:04:38 -0400
> Chuck Ebbert <[email protected]> wrote:
> 
> > > 
> > >> Can we get some kind of band-aid, like making the endless 'for' loop in
> > >> balance_dirty_pages() terminate after some number of iterations? Clearly
> > >> if we haven't written "write_chunk" pages after a few tries, *and* we
> > >> haven't encountered congestion, there's no point in trying forever...
> > > 
> > > Did my above questions get looked at?
> > > 
> > > Is anyone able to reproduce this?
> > > 
> > > Do we have a clue what's happening?
> > 
> > There are a ton of dirty pages for one disk, and zero or close to zero dirty
> > for a different one. Kernel spins forever trying to write some arbitrary
> > minimum amount of data ("write_chunk" pages) to the second disk...
> 
> That should be OK.  The caller will sit in that loop, sleeping in
> congestion_wait(), polling the correct backing-dev occasionally and waiting
> until the dirty limits subside to an acceptable limit, at which stage this:
> 
> 			if (nr_reclaimable +
> 				global_page_state(NR_WRITEBACK)
> 					<= dirty_thresh)
> 						break;
> 
> 
> will happen and we leave balance_dirty_pages().
> 
> That's all a bit crappy if the wrong races happen and some other task is
> somehow exceeding the dirty limits each time this task polls them.  Seems
> unlikely that such a condition would persist forever.
> 
> So the question is, why do we have large amounts of dirty pages for one
> disk which appear to be sitting there not getting written?

The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
large files (1Gb+) on sda & a small read-mainly mysql db app running on
sdb. The lockup seems to happen just after the copies finish -- there
are lots of dirty pages but nothing left to write them until kupdate
gets round to it. 
BTW kupdate can loop for long periods of time when a disk is under this
kind of load -- I regularly see it take over 20 seconds. and often it's
unable to start as there are no pdflush threads available.

> Do we know if there's any writeout at all happening when the system is in
> this state?
> 
No there doesn't seem to be any activity at all -- my machine is
completely unresponsive only sysrq works.


> I guess it's possible that the dirty inodes on the "other" disk got
> themselves onto the wrong per-sb inode list, or are on the correct list,
> but in the correct place.  If so, these:
> 
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
> writeback-fix-comment-use-helper-function.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch
> 
> from 2.6.23-rc6-mm1 should help.
 

> Did anyone try running /bin/sync when the system is in this state?
I'm not able to run anything in this state, but sysrq-s doesn't make any
difference.

Richard 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>

References:
- Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Matthias Hensler <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Chuck Ebbert <[email protected]>
- Re: Processes spinning forever, apparently in lock_timer_base()?
  - From: Andrew Morton <[email protected]>

Prev by Date: Problem: LTP linkat01 test fails on nfs directory (NFS v3)
Next by Date: Re: [PATCH] ahci: enable GHC.AE bit before set GHC.HR
Previous by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Next by thread: Re: Processes spinning forever, apparently in lock_timer_base()?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]