On Sat, 1 Apr 2006, Andrew Morton wrote:
> Andrea Arcangeli <[email protected]> wrote:
> >
> > The new PG_truncate bitflag can be used as well to eliminate the
> > truncate_count from all vmas etc... that would save substantial memory
> > and remove some complexity, truncate_count grown a lot since the time we
> > introduced it.
truncate_count is playing a useful role with the vma tree, allowing
us to find our place again after allowing preemption in. Though I'm
sometimes wondering whether just to make the vma _bigger_ and add a
list instead of that truncate_count: construct a list of vmas from
the prio_tree for unmap_mapping_range, so it can keep place in that,
instead of having to restart the tree whenever preempted. But I
think NFS doesn't hold i_sem across unmapping, so it wouldn't work.
> > The PG_truncate is needed as well because we can't know in do_no_page if
> > page->mapping is legitimate null or not (think bttv and other device
> > drivers returning page->mapping null because they're private but not
> > reserved pages etc..)
That part is easily dealt with, as Nick and I have suggested,
just by marking vmas liable to having their pages truncated.
But that's certainly no more than one part of a larger solution.
> The patch will pretty clearly fix that. But it really would be better to
> place all the cost over on the invalidate side, rather than adding overhead
> to the pagefault path.
I see ~2% slowdown in relevant faulting tests e.g. the lmbench ones.
I'd certainly prefer not to be locking pages here: or not without
word from the high-scalability people that Andrea's patch really in
practice doesn't hurt them (but the ~2% I see is on UP as much as MP).
> Could we perhaps check the page_count(page) and/or page_mapped(page) in
> invalidate_complete_page(), when we have tree_lock?
I'm thinking on it, but not finding it easy. And I've just realized
that invalidate_inode_pages2 isn't the only problematic case here
(though agreed the more urgent one): MADV_REMOVE likewise needs to
invalidate pages without adjusting i_size, so also cannot rely on
the truncate_count/i_size method.
> Do you have handy any test code which others can use to recreate this bug?
I'd be glad of that too.
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]