Re: Boot failure with ext2 and initrds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:
On Wed, 15 Nov 2006 14:17:01 +0000 (GMT)
Hugh Dickins <[email protected]> wrote:


On Tue, 14 Nov 2006, Hugh Dickins wrote:

On Tue, 14 Nov 2006, Andrew Morton wrote:

The below might help.

Indeed it does (with Martin's E2FSBLK warning fix),
seems to be running well on all machines now.

i386 and ppc64 still doing builds, but after an hour on x86_64,
an ld got stuck in a loop under ext2_try_to_allocate_with_rsv,
alternating between ext2_rsv_window_add and rsv_window_remove.
Send me a patch and I'll try it...

ext2_try_to_allocate_with_rsv+0x288
ext2_new_blocks+0x21e
ext2_get_blocks+0x398
ext2_get_block+0x46
__block_prepare_write+0x171
block_prepare_write+0x39
ext2_prepare_write+0x2c
generic_file_buffered_write+0x2b0
__generic_file_aio_write_nolock+0x4bc
generic_file_aio_write+0x6d
do_sync_write+0xf9
vfs_write+0xc8
sys_write+0x51


OK, I have a theory.

This must have been the seventeenth damn time I've stared at
find_next_zero_bit() wondering what the damn return value is and wondering
how any even slightly non-sadistic person could write a damn function like
that and not damn well document it.

int find_next_zero_bit(const unsigned long *addr, int size, int offset)

It returns the offset of the first zero bit relative to addr.
ext3's bitmap_search_next_usable_block() assumed that find_next_zero_bit()
returns the offset of the first zero bit relative to (addr+offset).

The while loop in ext3's bitmap_search_next_usable_block() serendipitously
covered that bug up.

ext2's bitmap_search_next_usable_block() doesn't need that while loop, so
ext3's benign bug became ext2's fatal bug.

So...

--- a/fs/ext2/balloc.c~a
+++ a/fs/ext2/balloc.c
@@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
 	ext2_grpblk_t next;

 	next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
-	if (next >= maxblocks)
+	if (next >= start + maxblocks)
 		return -1;
 	return next;
 }
_

Anyway, I think that's the bug.  Or a bug, at least.  If so, the cause of
this bug is inadequate code commenting, pure and simple.  And ext3 and ext4
need fixing.

Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end block number of the range to search, not the lengh of the range. maxblocks get passed to ext2_find_next_zero_bit(), where it expecting to take the _size_ of the range to search instead...

Something like this: (this is not a patch)
 @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
  	ext2_grpblk_t next;

  -  	next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
+ next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, start);
	if (next >= maxblocks)
  		return -1;
  	return next;
  }


Mingming



Yes, it's quite confusing and probably we should replace it a better name......

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux