Re: [rfc][patch] remove racy sync_page?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 30 May 2006 10:08:06 +1000
Nick Piggin <[email protected]> wrote:

> > 
> > Try disabling kblockd completely, see what effect that has on performance.
> 
> Which is what I want to know. I don't exactly have an interesting
> disk setup.

You don't need one - just a single disk should show up such problems.  I
forget which workloads though.  Perhaps just a linear read (readahead
queues the I/O but doesn't unplug, subsequent lock_page() sulks).

> >>Can we get rid of the whole thing, confusing memory barriers and all? Nobody
> >>uses anything but the default sync_page, and if block rq plugging is terribly
> >>bad for performance, perhaps it should be reworked anyway? It shouldn't be a
> >>correctness thing, right?
> > 
> > 
> > What this means is that it is not legal to run lock_page() against a
> > pagecache page if you don't have a ref on the inode.
> 
> Yes. So set_page_dirty_lock is broken, right?

yup.

> And the wait_on_page_stuff needs an inode ref.
> Also splice seems to have broken sync_page.

Please describe the splice() problem which you've observed.

> > 
> > iirc the main (only?) offender here is direct-io reads into MAP_SHARED
> > pagecache.  (And similar things, like infiniband and nfs-direct).
> 
> Well yes, writing to a page would be the main reason to set it dirty.
> Is splice broken as well? I'm not sure that it always has a ref on the
> inode when stealing a page.

Whereabouts?

> It sounds like you think fixing the set_page_dirty_lock callers wouldn't
> be too difficult? I wouldn't know (although the ptrace one should be
> able to be turned into a set_page_dirty, because we're holding mmap_sem).

No, I think it's damn impossible ;)

get_user_pages() has gotten us a random pagecache page.  How do we
non-racily get at the address_space prior to locking that page?

I don't think we can.

> You're sure about all other lock_page()rs? I'm not, given that
> set_page_dirty_lock got it so wrong. But you'd have a better idea than
> me.

No, I'm not sure.

However it is rare for the kernel to play with pagecache pages against
which the caller doesn't have an inode ref.  Think: how did the caller look
up that page in the first place if not from the address_space in the first
place?

- get_user_pages(): the current problem

- page LRU: OK, uses trylock first.

- pagetable walk??
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux