Re: Very rare crash in prune_dcache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We've seen the below on at 2.6.5 kernel (SuSE SLES9) at SGI.
Does it look like your crash?

The panic is by kswapd0:

    <1>Unable to handle kernel NULL pointer dereference (address
0000000000000078)
    <4>kswapd0[122]: Oops 8813272891392 [1]

whose stack shows:
[<a0000001001cecf0>] clear_inode+0x1b0/0x2c0
[<a0000001001d03d0>] generic_drop_inode+0x3b0/0x400
[<a0000001001ccf30>] iput+0x130/0x1c0
[<a00000020b6f0cd0>] nfs_dentry_iput+0x170/0x1c0 [nfs]
[<a0000001001ca050>] prune_dcache+0x510/0x540
[<a0000001001ca0c0>] shrink_dcache_memory+0x40/0x80
[<a00000010014c360>] shrink_slab+0x2e0/0x440

Both generic_shutdown_super()'s calls to shrink_dcache_parent() or
shrink_dcache_anon(), and kswapd0's call to shrink_dcache_memory()
call prune_dcache().
I suspect a race condition inside prune_dcache().

The prune_dcache() function:
        lock dcache_lock
        scan the dentry_unused list of dentry's for a given number ("count") of
        dentry's to free:
                if a dentry to free, call prune_one_dentry()
                        dentry_iput()
                                unlock dcache_lock
                                iput() any associated inode
                        d_free() the dentry
                        lock dcache_lock
        unlock dcache_lock

Two processors entering prune_dcache() near the same time will both scan
the dentry_unused list and could try to iput() the same inode twice.  That is
because the dcache_lock is released while running iput().

I suppose the dcache_lock must be released here because the iput() may take
a long time.  And the dcache_lock is used many places in the system
to protect the dentry cache's lists.

It would seem to me that a straighforward fix would be to add another
lock to protect just the scan of the dentry_unused list only here in
prune_dcache()

-Cliff Wickman

On Mon, Dec 19, 2005 at 04:38:55PM -0500, John F Flynn III wrote:
> Good evening, folks...
> 
> We have been experiencing a very rare (on average once every two to 
> three months) crash on some of our servers.
> 
> uname -a:
> Linux cheetah 2.6.9-22.0.1.ELsmp #1 SMP Thu Oct 27 13:14:25 CDT 2005 
> i686 i686 i386 GNU/Linux
> 
> (This is a CentOS provided kernel)
> 
> Here is a photo of the bottom of the panic. Unfortunately the kernel has 
> no chance to log this anywhere else:
> 
> http://www.cs.fiu.edu/~flynnj/cheetah-crash.jpg
> 
> 
> The crash appears to be in prune_dcache, and has happened on several 
> distinct machines, so we do not believe it is a hardware problem.
> 
> If anyone has pointers on what bug could be causing this crash, or if 
> it's been fixed in newer kernels we could try, it would be greatly 
> appreciated. This only seems to happen on loaded production machines, 
> and it happens so rarely that more detailed debugging is nearly impossible.
> 
> Thanks in advance,
> -John Flynn
> 
> -- 
> John Flynn                              [email protected]
> =========================================================
> Systems and Network Administration             /\_/\
> School of Computer Science                    ( O.O )
> Florida International University               >   <
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux