Re: [Ext2-devel] [PATCH 1/2] ext2/3: Support 2^32-1 blocks(Kernel)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 16, 2006 at 03:59:13PM -0700, Andreas Dilger wrote:
> While I agree with that in theory, in practise we never end up doing
> this and it just ends up delaying the acceptance of the trivial patches.
> It may also be a burden later on when some of the features that could
> be e.g. ROCOMPAT are bundled with an INCOMPAT change and we then
> make the filesystem gratuitously INCOMPAT.
> 
> In the end, I don't think having a couple of separate flags is any more
> effort than having a single one.  As yet we only have about 6 of each 32
> feature bits used, and if we get close to running out we can make an
> EXT3_FEATURE_{,RO,IN}COMPAT_NEXT_WORD flag to continue it on.

The overhead is not running out of feature bit flags.  After all, it's
easy to add more if we need to; we just define the MSB as meaning
"check the auxiliary features compat/rocompat/incompat mask", and then
define a new 32-bit extension bitmask in the superblock.

What I'm trying to simplify is the overhead of users trying to
understand a tangled mess of features, some compat, some incompat,
etc.

> Note that I'm not against this in practise, but I wouldn't hold up any
> feature for this reason.  How long has large i_blocks been pending,
> and usecond timestamps?  Many years already, even though they are trivial
> to implement, so I'm hesitant to tie them together and delay further.

i_blocks has been pending because the people who could push it haven't
had the time, and usec timestamps because the trivial way (without an
at least an ROCOMPAT flag).

> I think i_blocks can be considered an ROCOMPAT feature, and the large
> inode reservation for usecond timestamps could be COMPAT I think (since
> an unsupporting kernel would still update all the timestamps consistently
> even if the useconds on disk would be some constant instead of 0.

i_blocks can ROCOMPAT only if it is acceptable for stat(2) to return
erroneous i_blocks return values.  I'm not entirely convinced that's a
good thing, and at the very least it would be extremely confusing, but
maybe.

usecond timestamps must be at least ROCOMPAT, because of the
requirement that all newly created inodes must reserve extra space and
guarantee that i_extra_isize must be at least n bytes (where n is the
size of the guaranteed extra inode fields).  If you don't do that,
then when the filesystem is mounted one again on a kernel that does
understand usec timestamps, some inodes will have room for the usec
time fields, and other inodes won't (because they have too much of the
space used for EA's), and that will cause serious problems for make(1).

> I think testing with reiserfs showed that tail packing was a net loss in
> most cases, since basically every benchmark I've ever seen with reiserfs
> disables tail packing or suffers.  For space constrained systems (if
> there ever exists such a thing again ;-) it would probably be better to
> go to compressed files.

I have to wonder if that's because of the way reiserfs implemented
tail-packing more than anything else.  I don't belive fragments hurt
performance on UFS systems quite as much as it does on reiserfs
systems.  I'm not worried about this as much for space constrained
systems, but for cases where we find that increasing the blocksize to
8k or even larger (32k?  64k) really helps, but we don't want to pay
the internal fragmentation penalty for small files.  There are other
ways to solve the problem, yes, such as by assuming that we can use a
different filesystem for database or video streams separate from the
/, /usr, and/or /var filesystems, for example.  

If we are ready to forever forswear wanting to use large block sizes,
then maybe we don't need to worry about fragmentations support (or
maybe the 1.8" pedabyte disk drives will show up and be cheap enough
that we just won't care about wasting space on small files).  But
that's I think a decision which we need to formally make.

					- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux