Re: PREEMPT_RCU breaks anon_vma locking ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 24 Feb 2007, Oleg Nesterov wrote:

> If my understanding correct, vmscan can find a page which lives in a already
> anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is
> not cleared until free_hot_cold_page().

That's about right.  The page_mapped checks, at several levels, make
it very hard to hit this case in practice; but it is possible for the
page to be unmapped and the anon_vma unlinked and kmem_cache_freed
while vmscan is racing through page_lock_anon_vma.

> 
> So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
> anon_vma_unlink() has already freed anon_vma. In that case we should see
> list_empty(&anon_vma->head), we are safe.

(It doesn't affect your argument, but we won't necessarily see list_empty
there: the anon_vma slot may already have got reused for a different
bundle of vmas completely; but its lock remains a lock and its list
remains a list of vmas, and the worst that happens is that
page_referenced_anon or try_to_unmap_anon wanders through an irrelevant
bundle of vmas, looking for a page that cannot be found there.)

> 
> However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(),
> and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock()
> on return.
> 
> This worked before because spin_lock() implied rcu_read_lock(), so rcu was
> blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this
> is not true (yes?), so it is possible that the slab returns the memory to
> the system and it is re-used when we write to anon_vma->lock.

I had been meaning to take a fresh look at page_lock_anon_vma, to see
whether recent RCU developments in -mm affected it.  Thanks for saving
me the trouble.

You and Paul know infinitely more about CONFIG_PREEMPT_RCU than I do,
so if you believe that the change below is enough, that's great, it's
much simpler than I'd feared might be needed.  CONFIG_PREEMPT_RCU
rather scares me, but perhaps it's less worrying than I'd imagined.

Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c?
Is what that's doing still valid?

> 
> IOW, don't we need something like this
> 
> 	static struct anon_vma *page_lock_anon_vma(struct page *page)
> 	{
> 		struct anon_vma *anon_vma;
> 		unsigned long anon_mapping;
> 
> 		rcu_read_lock();
> 		anon_mapping = (unsigned long) page->mapping;
> 		if (!(anon_mapping & PAGE_MAPPING_ANON))
> 			goto out;
> 		if (!page_mapped(page))
> 			goto out;
> 
> 		anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> 		spin_lock(&anon_vma->lock);
> 		return anon_vma;
> 
> 	out:
> 		rcu_read_unlock();
> 		return NULL;
> 	}
> 
> 	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)

It might be wiser to call that one page_unlock_anon_vma ;)

(I'm slightly disgruntled that page_lock_anon_vma takes a struct page *,
but page_unlock_anon_vma no struct page *.  But it would be silly to do
it differently, or mess with the naming: besides, it's a static function
and the prototype guards against error anyway.)

> 	{
> 		spin_unlock(&anon_vma->lock);
> 		rcu_read_unlock();
> 	}
> ?

Looks fine to me: please go ahead a make a patch for -mm, Oleg: thanks.

I've CC'ed Christoph for several reasons.

One, I think he was trying to persuade me a year ago to change
page_lock_anon_vma as you're now proposing; but I resisted, preferring
to keep the RCU stuff self-contained within page_lock_anon_vma: let's
credit him for prescience, and admit you've proved me wrong.

Two, in his SLUB thread it sounds like he's toying with mixing up kmem
caches of similar sizes: that seems a really bad idea to me (for reasons
Andi and others have made) and I expect it'll get dropped; but in case
not, let's note that the anon_vma cache is one which would fare very
badly from getting mixed in with others (lock would no longer remain
a lock, list would no longer remain a list, across that race).

Three, he has a recurring interest in this unusual page_lock_anon_vma,
mainly for page migration reasons, so let's keep him informed.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux