Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

On Fri, 29 Dec 2006 18:32:07 -0500
Theodore Tso <tytso@mit.edu> wrote:

> On Fri, Dec 29, 2006 at 02:42:51PM -0800, Linus Torvalds wrote:
> > I think ext3 is terminally crap by now. It still uses buffer heads in 
> > places where it really really shouldn't, and as a result, things like 
> > directory accesses are simply slower than they should be. Sadly, I don't 
> > think ext4 is going to fix any of this, either.
> 
> Not just ext3; ocfs2 is using the jbd layer as well.  I think we're
> going to have to put this (a rework of jbd2 to use the page cache) on
> the ext4 todo list, and work with the ocfs2 folks to try to come up
> with something that suits their needs as well.  Fortunately we have
> this filesystem/storage summit thing coming up in the next few months,
> and we can try to get some discussion going on the linux-ext4 mailing
> list in the meantime.  Unfortunately, I don't think this is going to
> be trivial.

I suspect it would be insane to move any part of JBD (apart from the
ordered-data flush) to use pagecache.  The whole thing is fundamentally
block-based.  But only for metadata - there's no strong reason why ext3/4
needs to manipulate file data via buffer_heads if data=journal and chattr
+j aren't in use.

We could possibly move ext3/4 directories out of the blockdev pagecache and
into per-directory pagecache, but that wouldn't change anything - the
journalling would still be block-based.

Adam Richter spent considerable time a few years ago trying to make the
mpage code go direct-to-BIO in all cases and we eventually gave up.  The
conceptual layering of page<->blocks<->bio is pretty clean, and it is hard
and ugly to fully optimise away the "block" bit in the middle.

buffer_heads become more important with large PAGE_CACHE_SIZE.  I'd expect
nobh mode to be quite inefficient with some workloads on 64k pages.  We
need that representation of the state (and location) of the block-sized
hunks which make up the page.

> If we do get this fixed for ext4, one interesting question is whether
> people would accept a patch to backport the fixes to ext3, given the
> the grief this is causing the page I/O and VM routines.  OTOH, reiser3
> probably has the same problems, and I suspect the changes to ext3 to
> cause it to avoid buffer heads, especially in order to support for
> filesystem blocksizes < pagesize, are going to be sufficiently risky
> in terms of introducing regressions to ext3 that they would probably
> be rejected on those grounds.  So unfortunately, we probably are going
> to have to support flushes via buffer heads for the foreseeable
> future.

We'll see.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Linus Torvalds <torvalds@osdl.org>

References:
- Re: [PATCH] mm: fix page_mkclean_one
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: [PATCH] mm: fix page_mkclean_one
  - From: Andrew Morton <akpm@osdl.org>
- Re: [PATCH] mm: fix page_mkclean_one
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: [PATCH] mm: fix page_mkclean_one
  - From: David Miller <davem@davemloft.net>
- Re: [PATCH] mm: fix page_mkclean_one
  - From: Segher Boessenkool <segher@kernel.crashing.org>
- Re: [PATCH] mm: fix page_mkclean_one
  - From: Linus Torvalds <torvalds@osdl.org>
- Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Andrew Morton <akpm@osdl.org>
- Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
  - From: Theodore Tso <tytso@mit.edu>

Prev by Date: Re: Want comments regarding patch
Next by Date: Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
Previous by thread: Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
Next by thread: Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]