On Friday May 12, [email protected] wrote:
> NeilBrown <[email protected]> wrote:
> >
> > ./drivers/md/bitmap.c | 115 ++----------------------------------------
>
> hmm. I hope we're not doing any of that filesystem I/O within the context
> of submit_bio() or kblockd or anything like that. Looks OK from a quick
> scan.
No. We do all the I/O from the context of the per-array thread.
However some IO requests cannot complete until the filesystem I/O
completes, so we need to be sure that the filesystem I/O won't block
waiting for memory, or fail with -ENOMEM.
>
> a_ops->commit_write() already ran set_page_dirty(), so you don't need that
> in there.
Is that documented somewhere..... but yes, that seems to be right. Thanks.
>
> I assume this always works in units of a complete page? It's strange to do
> prepare_write() followed immediately by commit_write(). Normally
> prepare_write() will do some prereading, but it's smart enough to not do
> that if the caller is preparing to write the whole page.
>
Yes, it is strange. That was one of the things that made me want to
review this code and figure out how to do it "properly".
As far as I can see, much of 'address_space' is really an internal
interface to support routines used by the filesystem. A filesystem
may choose to use address spaces, and has a fair degree of freedom
when it comes to which bits to make use of and exactly what they
mean.
About the only thing that *has* to be supported is ->writepages --
which has a fair degree of latitude in exactly what it does -- and
->writepage -- which can only be called after locking a page and
rechecking the ->mapping.
bitmap.c is currently trying to do something every different.
It uses ->readpage to get pages in the page cache (even though some
address spaces don't define ->readpage) and then holds onto those
pages without holding the page lock, and then calls ->writepage to
flush them out from time to time.
Before calling writepage it gets the pagelock, but doesn't re-check
that ->mapping is correct (there is nothing much it can do if it isn't
correct..).
I noticed this is particularly a problem with tmpfs. When you call
writepage on a tmpfs page, the page is swizzled into the swap cache,
and ->mapping becomes NULL - not the behaviour that bitmap is
expecting.
Now I agree that tmpfs is an unusual case, and that storing a bitmap
in tmpfs doesn't make a lot of sense (though it can make some...) but
the point is that if a filesystem is allowed to move pages around like
that, then bitmap cannot hold on to pages in the page cache like it
wants to. It simply isn't a well defined thing to do.
> We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE. Same diff.
>
Yeah.... why is that? Why have two names for exactly the same value?
How does a poor develop know when to use one and when the other? More
particularly, how does one remember.
I would argue that the "same diff" should be no difference - not even
in spelling....
> If you have a page and you want to write the whole thing out then there's
> really no need to run prepare_write or commit_write at all. Just
> initialise the whole page, run set_page_dirty() then write_one_page().
>
I see that now. But only after locking the page, and rechecking that
->mapping is correct, and if it isn't .... well, more work is involved
that bitmap is in a position to do.
> Perhaps it should check that the backing filesystem actually implements
> commit_write(), prepare_write(), readpage(), etc. Some might not, and the
> user will get taught not to do that via an oops.
>
Might help, but as I think you've gathered, I really want a whole
different approach to writing to the file. One that I can justify as
being a "correct" use of interfaces, and also that I can be certain
will not block or fail on a kmalloc or similar. Hence the bmap thing
later.
Thanks for your feedback.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]