We have had several really strange NFS issues, reported out of net/sunrpc/svcsock.c in static int svc_tcp_recvfrom(struct svc_rqst *rqstp) svsk->sk_reclen &= 0x7fffffff; dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen); if (svsk->sk_reclen > serv->sv_bufsz) { printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n", (unsigned long) svsk->sk_reclen);printk(KERN_NOTICE "sockaddr_in 0x%04x port 0x%04x max 0x%08lx\n",
rqstp->rq_addr.sin_addr.s_addr, rqstp->rq_addr.sin_port, (long) serv->sv_bufsz); goto err_delete; } The size reported grows, and grows, and... the server runs 2.6.19 when XFS has its issues (as follows) 2TB partition, NFS exported. All machines are x86_64 AMD 275's with 16GB and running redhat AS 4.4 (Centos 4.4) a typical crash under 2.6.19 isDec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff881ceaa8
Dec 14 09:13:48 sanandreas kernel: Dec 14 09:13:48 sanandreas kernel: Call Trace:Dec 14 09:13:48 sanandreas kernel: [<ffffffff881c79e9>] :xfs:xfs_trans_cancel+0x5f/0xfa Dec 14 09:13:48 sanandreas kernel: [<ffffffff881ceaa8>] :xfs:xfs_create+0x62e/0x686
Dec 14 09:13:48 sanandreas kernel: [<ffffffff802f62be>] __up_read+0x10/0x8aDec 14 09:13:48 sanandreas kernel: [<ffffffff881d8730>] :xfs:xfs_vn_mknod+0x1a2/0x3c5 Dec 14 09:13:48 sanandreas kernel: [<ffffffff804440a6>] _spin_lock_irqsave+0x9/0xe
Dec 14 09:13:48 sanandreas kernel: [<ffffffff802f62be>] __up_read+0x10/0x8aDec 14 09:13:48 sanandreas kernel: [<ffffffff881c8413>] :xfs:xfs_trans_unlocked_item+0x22/0x3c Dec 14 09:13:48 sanandreas kernel: [<ffffffff881cd4c3>] :xfs:xfs_access+0x3d/0x46 Dec 14 09:13:48 sanandreas kernel: [<ffffffff881d8dce>] :xfs:xfs_vn_permission+0x11/0x15 Dec 14 09:13:48 sanandreas kernel: [<ffffffff8027fe9f>] permission+0xcd/0xd6 Dec 14 09:13:48 sanandreas kernel: [<ffffffff80280615>] __link_path_walk+0x16c/0xdb7 Dec 14 09:13:48 sanandreas kernel: [<ffffffff8028d8f1>] mntput_no_expire+0x17/0x8f Dec 14 09:13:48 sanandreas kernel: [<ffffffff80281322>] link_path_walk+0xc2/0xd0 Dec 14 09:13:48 sanandreas kernel: [<ffffffff80281bde>] vfs_create+0xdf/0x14b Dec 14 09:13:48 sanandreas kernel: [<ffffffff8028200c>] open_namei+0x18e/0x662 Dec 14 09:13:48 sanandreas kernel: [<ffffffff802789a1>] do_filp_open+0x1c/0x3d Dec 14 09:13:48 sanandreas kernel: [<ffffffff80278b92>] get_unused_fd+0xea/0xf9 Dec 14 09:13:48 sanandreas kernel: [<ffffffff80278cba>] do_sys_open+0x44/0xbe Dec 14 09:13:48 sanandreas kernel: [<ffffffff8020979e>] system_call+0x7e/0x83
Dec 14 09:13:48 sanandreas kernel:Dec 14 09:13:48 sanandreas kernel: xfs_force_shutdown(sdb1,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff881c7a07 Dec 14 09:13:48 sanandreas kernel: Filesystem "sdb1": Corruption of in-memory data detected. Shutting down filesystem: sdb1 Dec 14 09:13:48 sanandreas kernel: Please umount the filesystem, and rectify the problem(s)
this then causes the heavy write clients to write garbage NFS request that grow...Dec 20 14:34:15 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:18 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:18 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:21 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:21 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:24 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:24 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:27 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:27 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:30 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:30 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
Dec 20 14:34:33 newmadrid kernel: RPC: bad TCP reclen 0x0000fda8 (large)Dec 20 14:34:33 newmadrid kernel: sockaddr_in 0x7004130a port 0x5c03 max 0x00008400
The clients run 2.6.19. the servers crash once a day when the heavy simulations start up. eth0 and eth1 are bonded into BOND0, tyan 2895 or SuperMicro H8DC8 motherboards.
Attached is a tarball, bzip'ed of the tcpdump traces and the /var/log/messages file
[email protected] home/bshands/Documents> ls -l reclen.tar.bz2 -rw-r--r-- 1 bshands dssi 24379 Dec 20 14:47 reclen.tar.bz2 [email protected] home/bshands/Documents> md5sum reclen.tar.bz2 f449a3c83ca79cad04d8a63e6e253604 reclen.tar.bz2 [email protected] /bulk/tmp> ls -l messages reclen.tcp* -rw-r--r-- 1 bshands dssi 37005 Dec 20 14:35 messages -rw-r--r-- 1 root dssi 57946 Dec 20 14:35 reclen.tcp0 -rw-r--r-- 1 root dssi 40678 Dec 20 14:35 reclen.tcp1 Neil Brown indicated that this NFS packet growth should have been fixed in 2.6.18-rc5, but it still happens, easily triggered by the XFS shutdown. the packet sizes reported can grow to > 1GB.Dropping back to 2.6.18.1 stops the XFS crashing, and hence stops the NFS mess.
berkley -- //E. F. Berkley Shands, MSc// **Exegy Inc.** 3668 S. Geyer Road, Suite 300 St. Louis, MO 63127 Direct: (314) 450-5348 Cell: (314) 303-2546 Office: (314) 450-5353 Fax: (314) 450-5354This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.
Attachment:
reclen.tar.bz2
Description: BZip2 compressed data
- Follow-Ups:
- Re: 2.6.19 xfs crash spawns tcp reclen errors off nfs
- From: David Chinner <[email protected]>
- Re: 2.6.19 xfs crash spawns tcp reclen errors off nfs
- Prev by Date: Re: How to interpret PM_TRACE output
- Next by Date: Re: [git patches] libata fixes
- Previous by thread: What's in git.git (stable), and Announcing GIT 1.4.4.3
- Next by thread: Re: 2.6.19 xfs crash spawns tcp reclen errors off nfs
- Index(es):