[PATCH] sector_t overflow in block layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 18, 2006  11:23 +0100, Stephen C. Tweedie wrote:
> On Wed, 2006-05-17 at 17:58 -0600, Andreas Dilger wrote:
> > > the next question then is should we fail mounting of any >2TB fs
> > > w/o CONFIG_LBD or it would be better to mount IFF the fs has
> > > no allocated blocks beyond 2TB?
> > 
> > This could actually be the primary source of the > 2TB filesystem
> > corruption problem that I've seen reported occasionally.  The question
> > is if CONFIG_LBD isn't enabled will the kernel silently truncate
> > block-layer offsets?  
>
> It appears so, yes.  bread() goes through submit_bh(), which converts
> blocknr to sector with:
> 
> 	bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> 
> without checking for overflow.
> 
> > This would seem to be a critical kernel bug that should be addressed in
> > the block subsystem to prevent use of block devices > 2TB if sector_t
> > is only 32 bits.
> 
> I think the arguments are a little less strong about restricting block
> device access in such cases, but certainly filesystem access needs to be
> properly checked.  We need to check the *filesystem* size, not the
> device size (they are usually the same but they don't strictly have to
> be), both on mount and on attempted resize.

At least the submit_bh() code should return an error in this case
regardless of who the caller is.  If ext[23] and other filesystems
check this themselves at mount time that is of course also prudent.

Patch has not been more than compile tested but it is a pretty serious
problem dating back a long time so I'm posting it for review anyways.

There are several ways to check for the 64-bit overflow, the efficiency
of which depends on the architecture.  We could alternately shift b_blocknr
by >> 41 or change the mask, for example.  The primary concern is really
the 32-bit overflow and resulting silent and massive data corruption
when CONFIG_LBD isn't enabled.

It is coincidental that mke2fs will clobber the primary group's metadata
(bitmaps, inode table) when formatting such a filesystem, but since this
data is identical at format time (group 0 offsets and group 16384 offsets
modulo 2TB are the same) it is very hard to detect before corruption hits.

==================== buffer_overflow.diff ===============================
--- ./fs/buffer.c.orig	2006-05-18 12:15:45.000000000 -0600
+++ ./fs/buffer.c	2006-05-18 12:09:20.000000000 -0600
@@ -2758,12 +2784,32 @@ static int end_bio_bh_io_sync(struct bio
 int submit_bh(int rw, struct buffer_head * bh)
 {
 	struct bio *bio;
+	unsigned long long sector;
 	int ret = 0;
 
 	BUG_ON(!buffer_locked(bh));
 	BUG_ON(!buffer_mapped(bh));
 	BUG_ON(!bh->b_end_io);
 
+	/* Check if we overflow sector_t when computing the sector offset.  */
+	sector = (unsigned long long)bh->b_blocknr * (bh->b_size >> 9);
+#if !defined(CONFIG_LBD) && BITS_PER_LONG == 32
+	if (unlikely(sector != (sector_t)sector))
+#else
+	if (unlikely(((bh->b_blocknr >> 32) * (bh->b_size >> 9)) >=
+		     0xffffffff00000000ULL))
+#endif
+	{
+		printk(KERN_ERR "IO past maximum addressable sector"
+#if !defined(CONFIG_LBD) && BITS_PER_LONG == 32
+		       "- CONFIG_LBD not enabled"
+#endif
+		       "\n");
+		buffer_io_error(bh);
+
+		return -EOVERFLOW;
+	}
+
 	if (buffer_ordered(bh) && (rw == WRITE))
 		rw = WRITE_BARRIER;
 
@@ -2780,7 +2828,7 @@ int submit_bh(int rw, struct buffer_head
 	 */
 	bio = bio_alloc(GFP_NOIO, 1);
 
-	bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio->bi_sector = sector;
 	bio->bi_bdev = bh->b_bdev;
 	bio->bi_io_vec[0].bv_page = bh->b_page;
 	bio->bi_io_vec[0].bv_len = bh->b_size;


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux