Re: [RFC] page lock ordering and OCFS2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zach Brown <[email protected]> wrote:
>
> > So where is the lock inversion?
> > 
> > Perhaps if you were to cook up one of those little threadA/threadB ascii
> > diagrams we could see where the inversion occurs?
> 
> Yeah, let me give that a try.  I'll try to trim it down to the relevant
> bits.  First let's start with a totally fresh node and have a read get a
> read DLM lock and populate the page cache on this node:
> 
>  sys_read
>    generic_file_aio_read
>      ocfs2_readpage
>        ocfs2_data_lock
>        block_read_full_page
>        ocfs2_data_unlock
> 
> So it was only allowed to proceed past ocfs2_data_lock() towards
> block_read_full_page() once the DLM granted it a read lock.  As it calls
> ocfs2_data_unlock() it only is dropping this caller's local reference on
> the lock.  The lock still exists on that node and is still valid and
> holding data in the page cache until it gets a network message saying
> that another node, who is probably going to be writing, would like the
> lock dropped.
> 
> DLM kernel threads respond to the network messages and truncate the page
> cache.  While the thread is busy with this inode's lock other paths on
> that node won't be able get locks.  Say one of those messages arrives.
> While a local DLM thread is invalidating the page cache another user
> thread tries to read:
> 
> user thread				dlm thread
> 
> 
> 					kthread
> 					...
> 					ocfs2_data_convert_worker

                                        I assume there's an ocfs2_data_lock
                                        hereabouts?

> 					  truncate_inode_pages
> sys_read
>   generic_file_aio_read
>     * gets page lock
>     ocfs2_readpage
>       ocfs2_data_lock
>         (stuck waiting for dlm)
> 					    lock_page
> 					      (stuck waiting for page)
> 

Why does ocfs2_readpage() need to take ocfs2_data_lock?  (Is
ocfs2_data_lock a range-based read-lock thing, or what?)

> The user task holds a page lock while waiting for the DLM to allow it to
> proceed.  The DLM thread is preventing lock granting progress while
> waiting for the page lock that the user task holds.
> 
> I don't know how far to go in explaining what leads up to laying out the
> locking like this.  It is typical (and OCFS2 used to do this) to wait
> for the DLM locks up in file->{read,write} and pin them for the duration
> of the IO.  This avoids the page lock and DLM lock inversion problem,
> but it suffers from a host of other problems -- most fatally needing
> that vma walking to govern holding multiple DLM locks during an IO.

Oh.

Have you considered using invalidate_inode_pages() instead of
truncate_inode_pages()?  If that leaves any pages behind, drop the read
lock, sleep a bit, try again - something klunky like that might get you out
of trouble, dunno.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux