Hi, after a kernel upgrade from 2.6.11 to 2.6.15.4, we were experiencing crashes on all four web servers. These web servers obtain their data from NFSv3 from a NetApp server. The servers were under heavy load - mostly reading, but also a lot of writing to NFS. Hardware: Compaq ProLiant with two (physical) Xeon 2.4 CPUs, 4 GB memory, Broadcom Tigon3 network interfaces. Kernel config is appended to this mail. After one of the crashes, an administrator made a screenshot (http://www.duempel.org/~max/linux/nfs_radix_tree_crash.png) and rebooted. Unfortunately, part of the stack trace is missing (25 lines console only), and I had no access to the KDB console. I am currently waiting for the next crash to happen so I can provide more information. The BUG_ON() failed in lib/radix-tree.c:372 : slot = slot->slots[offset]; BUG_ON(slot == NULL); I believe the missing stack trace calls are nfs_mark_request_dirty(), nfs_flush_one(), nfs_flush_list(), nfs_flush_inode(). That would mean that req->wb_index was somehow removed from nfsi->nfs_page_tree, maybe in another thread on another CPU? I see the spinlock nfsi->req_lock is only held for very short timespans - is it possible that another CPU tries to flush the same NFS write request which is currently in the middle of being handled by the first CPU? Any other explanation? Max
Attachment:
.config.gz
Description: Binary data
- Follow-Ups:
- Re: 2.6.15.4: NFS-related BUG in radix_tree_tag_set()
- From: Max Kellermann <[email protected]>
- Re: 2.6.15.4: NFS-related BUG in radix_tree_tag_set()
- Prev by Date: Re: [PATCH] pcmcia: add another ide-cs CF card id
- Next by Date: Re: [PATCH] pcmcia: add another ide-cs CF card id
- Previous by thread: s_vfs_rename_sem and cifs (was Re: Possible deadlock in vfs layer, namei.c)
- Next by thread: Re: 2.6.15.4: NFS-related BUG in radix_tree_tag_set()
- Index(es):