Zach Brown <[email protected]> wrote:
>
> > So where is the lock inversion?
> >
> > Perhaps if you were to cook up one of those little threadA/threadB ascii
> > diagrams we could see where the inversion occurs?
>
> Yeah, let me give that a try. I'll try to trim it down to the relevant
> bits. First let's start with a totally fresh node and have a read get a
> read DLM lock and populate the page cache on this node:
>
> sys_read
> generic_file_aio_read
> ocfs2_readpage
> ocfs2_data_lock
> block_read_full_page
> ocfs2_data_unlock
>
> So it was only allowed to proceed past ocfs2_data_lock() towards
> block_read_full_page() once the DLM granted it a read lock. As it calls
> ocfs2_data_unlock() it only is dropping this caller's local reference on
> the lock. The lock still exists on that node and is still valid and
> holding data in the page cache until it gets a network message saying
> that another node, who is probably going to be writing, would like the
> lock dropped.
>
> DLM kernel threads respond to the network messages and truncate the page
> cache. While the thread is busy with this inode's lock other paths on
> that node won't be able get locks. Say one of those messages arrives.
> While a local DLM thread is invalidating the page cache another user
> thread tries to read:
>
> user thread dlm thread
>
>
> kthread
> ...
> ocfs2_data_convert_worker
I assume there's an ocfs2_data_lock
hereabouts?
> truncate_inode_pages
> sys_read
> generic_file_aio_read
> * gets page lock
> ocfs2_readpage
> ocfs2_data_lock
> (stuck waiting for dlm)
> lock_page
> (stuck waiting for page)
>
Why does ocfs2_readpage() need to take ocfs2_data_lock? (Is
ocfs2_data_lock a range-based read-lock thing, or what?)
> The user task holds a page lock while waiting for the DLM to allow it to
> proceed. The DLM thread is preventing lock granting progress while
> waiting for the page lock that the user task holds.
>
> I don't know how far to go in explaining what leads up to laying out the
> locking like this. It is typical (and OCFS2 used to do this) to wait
> for the DLM locks up in file->{read,write} and pin them for the duration
> of the IO. This avoids the page lock and DLM lock inversion problem,
> but it suffers from a host of other problems -- most fatally needing
> that vma walking to govern holding multiple DLM locks during an IO.
Oh.
Have you considered using invalidate_inode_pages() instead of
truncate_inode_pages()? If that leaves any pages behind, drop the read
lock, sleep a bit, try again - something klunky like that might get you out
of trouble, dunno.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]