2.6.15.4: NFS-related BUG in radix_tree_tag_set()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

after a kernel upgrade from 2.6.11 to 2.6.15.4, we were experiencing
crashes on all four web servers.  These web servers obtain their data
from NFSv3 from a NetApp server.  The servers were under heavy load -
mostly reading, but also a lot of writing to NFS.

Hardware: Compaq ProLiant with two (physical) Xeon 2.4 CPUs, 4 GB
memory, Broadcom Tigon3 network interfaces.  Kernel config is appended
to this mail.

After one of the crashes, an administrator made a screenshot
(http://www.duempel.org/~max/linux/nfs_radix_tree_crash.png) and
rebooted.  Unfortunately, part of the stack trace is missing (25 lines
console only), and I had no access to the KDB console.  I am currently
waiting for the next crash to happen so I can provide more
information.

The BUG_ON() failed in lib/radix-tree.c:372 :

    slot = slot->slots[offset];
    BUG_ON(slot == NULL);

I believe the missing stack trace calls are nfs_mark_request_dirty(),
nfs_flush_one(), nfs_flush_list(), nfs_flush_inode().

That would mean that req->wb_index was somehow removed from
nfsi->nfs_page_tree, maybe in another thread on another CPU?  I see
the spinlock nfsi->req_lock is only held for very short timespans - is
it possible that another CPU tries to flush the same NFS write request
which is currently in the middle of being handled by the first CPU?
Any other explanation?

Max

Attachment: .config.gz
Description: Binary data


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux