Re: 2.6.18 ext3 panic. — Linux Kernel

On Tue, 10 Oct 2006 16:11:45 +0200
Jan Kara <[email protected]> wrote:

> > On Mon, Oct 09, 2006 at 02:59:25PM -0700, Badari Pulavarty wrote:
> > 
> >  > journal_dirty_data() would do submit_bh() ONLY if its part of the older
> >  > transaction.
> >  > 
> >  > I need to take a closer look to understand the race.
> >  > 
> >  > BTW, is this 1k or 2k filesystem ?
> > 
> > (18:41:11:davej@gelk:~)$ sudo tune2fs -l /dev/md0  | grep size
> > Block size:               1024
> > Fragment size:            1024
> > Inode size:               128
> > (18:41:16:davej@gelk:~)$ 
> > 
> >  > How easy is to reproduce the problem ?
> > 
> > I can reproduce it within a few hours of stressing, but only
> > on that one box.  I've not figured out exactly what's so
> > special about it yet (though the 1k block thing may be it).
> > I had been thinking it was a raid0 only thing, as none of
> > my other boxes have that.
> > 
> > I'm not entirely sure how it got set up that way either.
> > The Fedora installer being too smart for its own good perhaps.
>   I think it's really the 1KB block size that makes it happen.
> I've looked at journal_dirty_data() code and I think the following can
> happen:
>   sync() eventually ends up in journal_dirty_data(bh) as Eric writes.
> There is finds dirty buffer attached to the comitting transaction. So it drops
> all locks and calls sync_dirty_buffer(bh).
>   Now in other process, file is truncated so that 'bh' gets just after EOF.
> As we have 1kb buffers, it can happen that bh is in the partially
> truncated page. Buffer is marked unmapped and clean. But in a moment the page
> is marked dirty and msync() is called. That eventually calls
> set_page_dirty() and all buffers in the page are marked dirty.
>   The first process now wakes up, locks the buffer, clears the dirty bit
> and does submit_bh() - Oops.
> 
>   This is essentially the same problem Badari found but in a different
> place. There are two places that are arguably wrong...
>   1) We mark buffer dirty after EOF. But actually that may be needed -
> or what is the expected behaviour when we write into mmapped file after
> EOF, then extend the file and do msync()?

yup.

>   2) We submit a buffer after EOF for IO. This should be clearly avoided
> but getting the needed info from bh is really ugly...

Things like __block_write_full_page() avoid this by checking the block's
offset against i_size.  (Not racy against truncate-down because the page is
locked, not racy against truncate-up because the bh is zero and
up-to-date).

But for jbd writeout we don't hold the page lock, so checking against
bh->b_page->host->i_size is a bit racy.

hm.  But we do lock the buffers in journal_invalidatepage(), so checking
i_size after locking the buffer in the writeout path might get us there.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- 2.6.18 ext3 panic.
  - From: Dave Jones <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Dave Jones <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Eric Sandeen <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Andrew Morton <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Eric Sandeen <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Badari Pulavarty <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Dave Jones <[email protected]>
- Re: 2.6.18 ext3 panic.
  - From: Jan Kara <[email protected]>

Prev by Date: Re: [PATCH] ecryptfs: reduce legacy API usage
Next by Date: Re: [PATCH 12/12] i386 boot: Add an ELF header to bzImage
Previous by thread: Re: 2.6.18 ext3 panic.
Next by thread: Re: 2.6.18 ext3 panic.
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]