Re: 2.6.18 ext3 panic.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, Oct 09, 2006 at 02:59:25PM -0700, Badari Pulavarty wrote:
> 
>  > journal_dirty_data() would do submit_bh() ONLY if its part of the older
>  > transaction.
>  > 
>  > I need to take a closer look to understand the race.
>  > 
>  > BTW, is this 1k or 2k filesystem ?
> 
> (18:41:11:davej@gelk:~)$ sudo tune2fs -l /dev/md0  | grep size
> Block size:               1024
> Fragment size:            1024
> Inode size:               128
> (18:41:16:davej@gelk:~)$ 
> 
>  > How easy is to reproduce the problem ?
> 
> I can reproduce it within a few hours of stressing, but only
> on that one box.  I've not figured out exactly what's so
> special about it yet (though the 1k block thing may be it).
> I had been thinking it was a raid0 only thing, as none of
> my other boxes have that.
> 
> I'm not entirely sure how it got set up that way either.
> The Fedora installer being too smart for its own good perhaps.
  I think it's really the 1KB block size that makes it happen.
I've looked at journal_dirty_data() code and I think the following can
happen:
  sync() eventually ends up in journal_dirty_data(bh) as Eric writes.
There is finds dirty buffer attached to the comitting transaction. So it drops
all locks and calls sync_dirty_buffer(bh).
  Now in other process, file is truncated so that 'bh' gets just after EOF.
As we have 1kb buffers, it can happen that bh is in the partially
truncated page. Buffer is marked unmapped and clean. But in a moment the page
is marked dirty and msync() is called. That eventually calls
set_page_dirty() and all buffers in the page are marked dirty.
  The first process now wakes up, locks the buffer, clears the dirty bit
and does submit_bh() - Oops.

  This is essentially the same problem Badari found but in a different
place. There are two places that are arguably wrong...
  1) We mark buffer dirty after EOF. But actually that may be needed -
or what is the expected behaviour when we write into mmapped file after
EOF, then extend the file and do msync()?
  2) We submit a buffer after EOF for IO. This should be clearly avoided
but getting the needed info from bh is really ugly...

  What we could do is: Instead od calling sync_dirty_buffer() we do
something like:

lock_buffer(bh);
jbd_lock_bh_state(bh);
if (!buffer_jbd(bh) || jh != bh2jh(bh) || jh->b_transaction !=
  journal->j_committing_transaction) {
	jbd_unlock_bh_state(bh);
	unlock_buffer(bh);
}
jbd_unlock_bh_state(bh);
if (test_clear_buffer_dirty(bh)) {
	get_bh(bh);
	bh->b_end_io = end_buffer_write_sync;
	submit_bh(WRITE, bh);
	wait_on_buffer(bh);
}
else
	unlock_buffer(bh);

That should deal with the problem... Much more adventurous change would
be to remove the syncing code altogether - the new commit code makes
sure to write out each buffer just once so the livelock should not
happen now. But then we'd have to put running transaction in
j_next_transaction and refile data buffers instead of unfiling them.
That should actually give quite some performance improvement when
intensively rewriting files. But I guess that is JBD2 matter ;).

								Honza
-- 
Jan Kara <[email protected]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux