Re: [BUG] Lockdep recursive locking in kmem_cache_free

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 28, 2006 at 07:53:56AM -0700, Christoph Lameter wrote:
> On Fri, 28 Jul 2006, Pekka Enberg wrote:
> 
> > > [   57.976447]  [<ffffffff802542fc>] __lock_acquire+0x8cc/0xcb0
> > > [   57.976562]  [<ffffffff80254a02>] lock_acquire+0x52/0x70
> > > [   57.976675]  [<ffffffff8028f201>] kmem_cache_free+0x141/0x210
> > > [   57.976790]  [<ffffffff804a6b74>] _spin_lock+0x34/0x50
> > > [   57.976903]  [<ffffffff8028f201>] kmem_cache_free+0x141/0x210
> > > [   57.977018]  [<ffffffff8028f388>] slab_destroy+0xb8/0xf0
> 
> Huh? _spin_lock calls kmem_cache_free?
> 
> >  cache_reap
> >  reap_alien	(grabs l3->alien[node]->lock)
> >  __drain_alien_cache
> >  free_block
> >  slab_destroy	(slab management off slab)
> >  kmem_cache_free
> >  __cache_free
> >  cache_free_alien (recursive attempt on l3->alien[node] lock)
> > 
> > Christoph?
> 
> This should not happen. __drain_alien_cache frees node local elements
> thus cache_free_alien should not be called. However, if the slab 
> management was allocated on a different node from the slab data then we 
> may have an issue. However, both slab managemnt and the slab data are 
> allocated on the same node (with alloc_pages_node() and kmalloc_node).

cache_free_alien could get called, but there is no recursion here:

1. reap_alien tries dropping remote objects freed by local node (A) to the 
remote node (B) shared array cache (choosing a remote node as indicated by the 
node rotor), to do this, it takes the local alien cache lock (A), and calls 
__drain_alien_cache. The remote object comes from a slab cache X say.

2. __drain_alien_cache. takes the remote node l3 lock (B), transfers as many
objects as shared array cache of the remote node can hold, and calls
free_block to free remaining objects that could not be dropped in into the
shared array cache of remote node (B).  Now free_block is being called from
(A) to free objects on (B). 

3. free_block calls slab_destroy for the slab belonging to B. calls
kmem_cache_free for the slab management, which calls __cache_free, and 
hence cache_free_alien().  Now since this is being called from A for a local
object of B, the check in cache_free_alien fails, and cache_free_alien
*does* get executed.  Since slab management of a slab from B, local to B is
freed from A, A tries to write to the local alien cache corresponding to B,
which comes from a slab cache Y.  There is a recursion if X and Y are the
same caches.   But that is not a possibility at all, as the off slab management
for a slab cache cannot come from the same slab cache.  So this looks like a
false positive from lockdep.  

tglx, does the machine boot without lockdep?  If yes, then this is a false 
positive IMO.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux