Re: Proposal and plan for ext2/3 future development work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2006-06-30 at 11:24 -0700, Joel Becker wrote:
> On Fri, Jun 30, 2006 at 10:13:06AM -0700, Badari Pulavarty wrote:
> > I tried adding "delayed allocation" for ext3 earlier. Yes. VFS level
> > infrastructure would be nice. But, I haven't found much that we can
> > do at VFS - which is common across all the filesystems (except
> > mpage_writepage(s) handling). Most of the stuff is specific to 
> > filesystem implementation (even though it could be common) - coming
> > out with VFS level interfaces to suite all the different filesystem
> > delalloc would be *interesting* exercise.
> 
> 	Well, to be fair, I'm just going by what little I know about
> XFS.  They maintain a cache of all pages waiting on delayed allocation
> for writepack.  Why have this entire cache (hash, list, whatever) when
> we could create some state on in the pagecache?  We save a large chunk
> of memory and some complex writeback code.  I suspect you were thinking
> of this when you said "mpage_writepage(s) handling".  But this is a
> large complexity win if we can do it.
> 	The same with metadata/data ordering issues.  ie, data=ordered
> or even plain "creat(2); write(2)".  I don't know how generic the
> ordering is for each filesystem, but there is always room for play.
> 	On-disk, of course each filesystem is going to be different.
> I'm not sure we could fit a fully-generic aops->reserve_space() &
> aops->commit_space() API.  But I don't think we need to.

Unfortunately, I haven't looked at XFS delalloc implementation indetail
to understand what exact they would need from VFS (or could be pushed to
VFS). I purely tried to work with current ext3 code and current VFS
support. What I find is that -

1) Instead of allocating a block at prepare time, we need to be able to
"reserve" a block (so it won't file as part of writeback). And, as 
part of writeback - we need a way to figure out if a given page did
indeed really reserve the block. (we need to make sure the allocation
succeeds for those). We might need a pageflag for this (but I haven't
decided that its absolutely needed).

2) Needed a way to cluster bunch of (contig) pages and allocate disk
blocksfor those in a single shot - which is NOT a direct delalloc
requirement, but that is the whole reason for doing delalloc. 
(Suprana did few radix_tree interfaces for this).

Other than these general VFS level ones - I had to play with journal
lock ordering issues (very specific to ext3 stuff). To work around
the journalling issues, I had to do my own mpage_writepages() since 
the changes I need are specific to ext3 journalling - I am not sure
if they are going to be useful for other filesystems or not.

If you can think of general infrastructure you need for OCFS2, please
let me know - we can come with commonality.


Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux