On Jun 09, 2006 20:26 -0700, Valerie Henson wrote:
> To be honest, continuation inodes and these ext3 patches are
> addressing different problems. ext3 48-bit extents are an advanced
> solution to a complex problem - growing ext3 beyond 8TB while keeping
> as much of the existing on-disk format and associated stable code as
> possible.
The 48-bit support was acutally only a small of the originalreason for
extents, while it seems to be the most popular right now. The other
issues that are being addressed are:
- performance issues like avoiding 0.1%+ indirect block metadata overhead
for each file which is bad for the cache, and also hurts unlinks)
- the extent index blocks are also more robust than indirect blocks (they
have a magic and internally verifiable structure, and the possibility
to easily add metadata checksums and extent->inode backpointers to
allow improved filesystem checking). With large ext3 filesystems the
{d,t,}indirect blocks can have random garbage in them and there is no
way for the kernel to know unless it overlaps with other fixed metadata
- the ability to do things like preallocation of files efficiently (via
uninitialized extents), instead of zero-filling the whole file.
> Continuation inodes/chunkfs are an idea Arjan and I came up with,
> inspired loosely by the ext2 dirty bit code. The problem we were
> trying to solve is how to isolate the effects of file system
> corruption (from crash, bug, or I/O error) so that we didn't have to
> run fsck over the entire file system in order to repair it.
I think this is a great idea, and one that is very similar to what
we are doing with ext3 filesystems in Lustre. There is definitely
a desire to harden the ext3 code in many ways against such failures,
and being able to check independent parts of the filesystem is a
very desirable part of this.
> The solution we came up with is to create a "continuation inode" in
> every file system chunk which contains data for a particular file or
> directory. For example, if file "foo" has its inode in chunk A, and
> some file data in chunk B, we would create a continuation inode in
> chunk B. The continuation inode has a back pointer to the parent
> inode.
This needs some extra data in the directory entry, which I've already
been thinking about for ext3, so if you are looking at implementing
this for ext3 I'd be happy to share some ideas.
> One interesting possibility would be to combine this with the ext2
> dirty bit patches.
Put on your asbestos vest before suggesting any changes to ext2 :-).
> If we implement chunkfs on top of this, we could get away with fsck'ing
> only a few of the file systems each time, getting ext2-style performance
> with ext3-style fast recovery.
While fast recovery is one aspect of ext3 journaling, the other one
is that this allows multiple filesystem changes to be made atomically
and they are rolled back as a set if the system crashes in the middle.
> We'll talk about it more next week, I hope.
I look forward to it.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]