Re: 2.6.19 xfs crash spawns tcp reclen errors off nfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 20, 2006 at 02:55:06PM -0600, Mr. Berkley Shands wrote:
> We have had several really strange NFS issues, reported out of
> net/sunrpc/svcsock.c in
> 
> static int
> svc_tcp_recvfrom(struct svc_rqst *rqstp)
> 
>        svsk->sk_reclen &= 0x7fffffff;
>        dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen);
>        if (svsk->sk_reclen > serv->sv_bufsz) {
>            printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n",
>                   (unsigned long) svsk->sk_reclen);
>            printk(KERN_NOTICE "sockaddr_in 0x%04x port 0x%04x max 
> 0x%08lx\n",
>               rqstp->rq_addr.sin_addr.s_addr,
>               rqstp->rq_addr.sin_port, (long) serv->sv_bufsz);
>            goto err_delete;
>        }
> 
> The size reported grows, and grows, and...
> 
> the server runs 2.6.19 when XFS has its issues (as follows)
> 2TB partition, NFS exported. All machines are x86_64 AMD 275's with 16GB
> and running redhat AS 4.4 (Centos 4.4)
> 
> a typical crash under 2.6.19 is
> 
> Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": XFS internal error 
> xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 
> 0xffffffff881ceaa8

Hmmmm - that is showing up up a lot more frequently in 2.6.18 and 2.6.19
that we've ever seen before. Kind of strange given we haven't
changed anything the create transaction code for quite some time.

However, a machine named "sanandreas" is bound to go kaboom
at some point in time ;)

> this then causes the heavy write clients to write garbage NFS request that
> grow...

XFS will be returning an EUCLEAN or EIO to indicate the filesystem
has been shut down to all these write calls. Sounds like the clients
aren't handling the errors properly....

> Neil Brown indicated that this NFS packet growth should have been fixed in
> 2.6.18-rc5, but it still happens, easily triggered by the XFS shutdown.
> the packet sizes reported can grow to > 1GB.
> Dropping back to 2.6.18.1 stops the XFS crashing, and hence stops the 
> NFS mess.

Do you have any idea of the workload that is triggering the XFS shutdown?

We need more error reporting traps in the create transaction so we can
track this down. As a first pass to narrowing down which code path
is failing, can you run this quick patch - we should get an extra line
in the kernel output just before the shutdown indicating what the error
was and which branch returned it.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/xfs_vnodeops.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c	2006-12-12 12:05:21.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c	2006-12-21 10:40:22.341657119 +1100
@@ -1932,6 +1932,7 @@ xfs_create(
 	error = xfs_trans_reserve(tp, resblks, XFS_CREATE_LOG_RES(mp), 0,
 			XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
 	if (error == ENOSPC) {
+		printk(KERN_WARNING "xfs_create: resv at enospc\n");
 		resblks = 0;
 		error = xfs_trans_reserve(tp, 0, XFS_CREATE_LOG_RES(mp), 0,
 				XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
@@ -1962,6 +1963,7 @@ xfs_create(
 			rdev, credp, prid, resblks > 0,
 			&ip, &committed);
 	if (error) {
+		printk(KERN_WARNING "xfs_create: dir_ialloc error %d\n", error);
 		if (error == ENOSPC)
 			goto error_return;
 		goto abort_return;
@@ -1991,6 +1993,7 @@ xfs_create(
 					resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
 	if (error) {
 		ASSERT(error != ENOSPC);
+		printk(KERN_WARNING "xfs_create: dir_createname error %d\n", error);
 		goto abort_return;
 	}
 	xfs_ichgtime(dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -2025,6 +2028,7 @@ xfs_create(
 	error = xfs_bmap_finish(&tp, &free_list, first_block, &committed);
 	if (error) {
 		xfs_bmap_cancel(&free_list);
+		printk(KERN_WARNING "xfs_create: xfs_bmap_finish error %d\n", error);
 		goto abort_rele;
 	}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux