hung nfs mount hangs all disk writes on the system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a repeatable problem, where all disk writes on the system hang if an
NFS server disappears from the network during heavy writes.

This affects at least 2.6.11.10, 2.6.12-rc4, and Redhat's 2.6.9-5.ELsmp.
It happens whether I do an NFS version 2, 3, or 4 mount.

To reproduce:

mount somewhere:/v1 /v1
dd if=/dev/zero of=/v1/foo bs=1024k count=4096 &
route add -host somewhere bogushost
strace -tT dd if=/dev/zero of=/tmp/foo bs=4k count=1024

dd will usually get stuck in a write() call after a few blocks.

The system recovers after removing the blackhole route and umount -f'ing the
mount to the unreachable NFS server.

Here's sysrq-t output, for those tasks stuck in the D state. dd 5232 is
writing over NFS, the only one legitimately in the D state. The rest
are innocent victims, writing to local disk.

  task             PC      pid father child younger older
syslogd       D 00000004  5548  2662      1          2666  2618 (NOTLB)
f72d5bcc 00000082 f7ff0b14 00000004 00000000 f72d5be0 f72d5bcc 00000246 
       c01a377b 0000000a 00000286 c0125c82 00000046 f7d12c10 00000000 32bb4700 
       000f45f8 f7db6708 003e67a0 f72d5be0 f72d5c58 f72d5c28 c035c606 f7f2c538 
Call Trace:
 [<c035c606>] schedule_timeout+0x85/0xd2
 [<c035c565>] io_schedule_timeout+0xe/0x16
 [<c02813e9>] blk_congestion_wait+0x6d/0x82
 [<c014f475>] balance_dirty_pages+0x83/0x116
 [<c014ade3>] generic_file_buffered_write+0x251/0x56e
 [<c014b3e9>] __generic_file_aio_write_nolock+0x2e9/0x540
 [<c014b74c>] __generic_file_write_nolock+0x7e/0xa0
 [<c014ba8d>] generic_file_writev+0x4c/0xba
 [<c0173c99>] do_readv_writev+0x1f7/0x21f
 [<c0173d4f>] vfs_writev+0x40/0x51
 [<c0173e34>] sys_writev+0x3d/0x97
 [<c01036bd>] sysenter_past_esp+0x52/0x75
sshd          D F7FFF080  5928  3811   2965          3858       (NOTLB)
f5fd1f00 00000086 f5d62888 f7fff080 f5fd1ec8 c0151857 f7fff080 f5fd1ef0 
       c0153ce9 f7ff8580 f5fd1ee0 c0151857 c01742f0 00000020 00000000 a2280500 
       000f45c7 f7d12808 f72902b4 f72902bc 00000246 f5fd1f54 c035a2e5 f5fd1f20 
Call Trace:
 [<c035a2e5>] __down+0xb9/0x2db
 [<c035a822>] __down_failed+0xa/0x10
 [<c01742cc>] .text.lock.read_write+0xb/0x2f
 [<c017335a>] vfs_llseek+0x33/0x3a
 [<c0173430>] sys_llseek+0x44/0x8a
 [<c01036bd>] sysenter_past_esp+0x52/0x75
sshd          D F7FFF080  5772 10805   2965         14747  4973 (NOTLB)
f3f1ff00 00000086 f5329eb8 f7fff080 f3f1fec8 c0151857 f7fff080 f3f1fef0 
       c0153ce9 f7ff8580 f3f1fee0 c0151857 c01742f0 00000020 00000000 35803ec0 
       000f45c9 f7f1cd78 f72902b4 f72902bc 00000246 f3f1ff54 c035a2e5 f3f1ff20 
Call Trace:
 [<c035a2e5>] __down+0xb9/0x2db
 [<c035a822>] __down_failed+0xa/0x10
 [<c01742cc>] .text.lock.read_write+0xb/0x2f
 [<c017335a>] vfs_llseek+0x33/0x3a
 [<c0173430>] sys_llseek+0x44/0x8a
 [<c01036bd>] sysenter_past_esp+0x52/0x75
dd            D C844FC2C  5232 14672   3860                     (NOTLB)
c844fbf8 00000086 00000000 c844fc2c 00000000 c844fc0c c844fbf8 00000246 
       c02f3e75 c21ca114 00000286 003a5ded c844fbec f7d12c10 00000000 32bb4700 
       000f45f8 f7d2e888 003e67a0 c844fc0c c844fc84 c844fc54 c035c606 c844fc0c 
Call Trace:
 [<c035c606>] schedule_timeout+0x85/0xd2
 [<c035c565>] io_schedule_timeout+0xe/0x16
 [<c02813e9>] blk_congestion_wait+0x6d/0x82
 [<c014f475>] balance_dirty_pages+0x83/0x116
 [<c014ade3>] generic_file_buffered_write+0x251/0x56e
 [<c014b3e9>] __generic_file_aio_write_nolock+0x2e9/0x540
 [<c014b873>] generic_file_aio_write+0x65/0xd3
 [<f8e54724>] nfs_file_write+0x80/0xe2 [nfs]
 [<c017377a>] do_sync_write+0x91/0xdd
 [<c01738d1>] vfs_write+0x10b/0x10d
 [<c0173974>] sys_write+0x3d/0x64
 [<c01036bd>] sysenter_past_esp+0x52/0x75
sshd          D 00000004  5476 14747   2965         14906 10805 (NOTLB)
cb177c14 00000086 f7ff0b14 00000004 00000000 cb177c28 cb177c14 00000246 
       c01a377b 00000001 00000286 86fdc12b 000f45e4 f7db65a0 00000000 32bb4700 
       000f45f8 f7d9a598 003e67a0 cb177c28 cb177ca0 cb177c70 c035c606 f7c0b1dc 
Call Trace:
 [<c035c606>] schedule_timeout+0x85/0xd2
 [<c035c565>] io_schedule_timeout+0xe/0x16
 [<c02813e9>] blk_congestion_wait+0x6d/0x82
 [<c014f475>] balance_dirty_pages+0x83/0x116
 [<c014ade3>] generic_file_buffered_write+0x251/0x56e
 [<c014b3e9>] __generic_file_aio_write_nolock+0x2e9/0x540
 [<c014b873>] generic_file_aio_write+0x65/0xd3
 [<f88c66ce>] ext3_file_write+0x27/0xad [ext3]
 [<c017377a>] do_sync_write+0x91/0xdd
 [<c01738d1>] vfs_write+0x10b/0x10d
 [<c0173974>] sys_write+0x3d/0x64
 [<c01036bd>] sysenter_past_esp+0x52/0x75
sshd          D F7FFF080  6932 14908  14906                     (NOTLB)
f3847f00 00000082 f5329ee4 f7fff080 f3847ec8 c0151857 f7fff080 f3847ef0 
       c0153ce9 f7ff8580 f3847ee0 c0151857 c01742f0 f7d13180 00000000 eb5b4f40 
       000f45ca f7f1d858 f72902b4 f72902bc 00000246 f3847f54 c035a2e5 f3847f20 
Call Trace:
 [<c035a2e5>] __down+0xb9/0x2db
 [<c035a822>] __down_failed+0xa/0x10
 [<c01742cc>] .text.lock.read_write+0xb/0x2f
 [<c017335a>] vfs_llseek+0x33/0x3a
 [<c0173430>] sys_llseek+0x44/0x8a
 [<c01036bd>] sysenter_past_esp+0x52/0x75
dd            D 00000001  6020 14948      1               14669 (NOTLB)
c948bc14 00000082 f5debd44 00000001 00000000 c948bc28 c948bc14 00000246 
       c01a377b f76cd3f8 00000286 f88ca956 00001000 f7d2e720 00000000 32bb4700 
       000f45f8 f7d12d78 003e67a0 c948bc28 c948bca0 c948bc70 c035c606 c948bc54 
Call Trace:
 [<c035c606>] schedule_timeout+0x85/0xd2
 [<c035c565>] io_schedule_timeout+0xe/0x16
 [<c02813e9>] blk_congestion_wait+0x6d/0x82
 [<c014f475>] balance_dirty_pages+0x83/0x116
 [<c014ade3>] generic_file_buffered_write+0x251/0x56e
 [<c014b3e9>] __generic_file_aio_write_nolock+0x2e9/0x540
 [<c014b873>] generic_file_aio_write+0x65/0xd3
 [<f88c66ce>] ext3_file_write+0x27/0xad [ext3]
 [<c017377a>] do_sync_write+0x91/0xdd
 [<c01738d1>] vfs_write+0x10b/0x10d
 [<c0173974>] sys_write+0x3d/0x64
 [<c0103713>] syscall_call+0x7/0xb
sshd          D F7FFF080  6932 14953  14951                     (NOTLB)
c9507f00 00000082 f54e3ddc f7fff080 c9507ec8 c0151857 f7fff080 c9507ef0 
       c0153ce9 f7ff8580 c9507ee0 c0151857 c01742f0 c23bac10 00000000 79a70780 
       000f45d5 c23bb858 f72902b4 f72902bc 00000246 c9507f54 c035a2e5 c9507f20 
Call Trace:
 [<c035a2e5>] __down+0xb9/0x2db
 [<c035a822>] __down_failed+0xa/0x10
 [<c01742cc>] .text.lock.read_write+0xb/0x2f
 [<c017335a>] vfs_llseek+0x33/0x3a
 [<c0173430>] sys_llseek+0x44/0x8a
 [<c01036bd>] sysenter_past_esp+0x52/0x75

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux