Re: Process D-stated in generic_unplug_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Ian E. Morgan" <[email protected]> wrote:
>
> After a recent kernel upgrade (essentially 2.6.15.4 + a selection of other
> patches from -git and -mm), we've had problems with proftpd getting stuck in
> D-state when being shut down. There is no hot-pluging involved here (so I'm
> confused about the call to generic_unplug_device).
> 
> # /cat/proc/6012/wchan
> sync_page
> 
> A task dump shows:
> 
> Feb 14 13:45:12 guru kernel: proftpd       D 00000005     0  6012      1 29519        9328 (NOTLB)
> Feb 14 13:45:12 guru kernel: da507c50 dd83b520 c03db380 00000005 c152a580 dfc74284 c15a8fc c01e84c2 
> Feb 14 13:45:12 guru kernel:        c15af8fc da506000 c01e84f4 c15af8fc 00004a32 9ae11a4 0000cd50 dd83b520 
> Feb 14 13:45:12 guru kernel:        dd83b648 da507ca8 da507cb0 c1400e20 da507c58 c03041b 00000000 c013bd28 
> Feb 14 13:45:12 guru kernel: Call Trace:
> Feb 14 13:45:12 guru kernel:  [<c01e84c2>] __generic_unplug_device+0x1a/0x30
> Feb 14 13:45:13 guru kernel:  [<c01e84f4>] generic_unplug_device+0x1c/0x36
> Feb 14 13:45:13 guru kernel:  [<c030419b>] io_schedule+0xe/0x16
> Feb 14 13:45:13 guru kernel:  [<c013bd28>] sync_page+0x3e/0x4b
> Feb 14 13:45:13 guru kernel:  [<c0304450>] __wait_on_bit_lock+0x41/0x61
> Feb 14 13:45:13 guru kernel:  [<c013bcea>] sync_page+0x0/0x4b
> Feb 14 13:45:13 guru kernel:  [<c013c530>] __lock_page+0x91/0x99
> Feb 14 13:45:13 guru kernel:  [<c012ee49>] wake_bit_function+0x0/0x34
> Feb 14 13:45:13 guru kernel:  [<c012ee49>] wake_bit_function+0x0/0x34
> Feb 14 13:45:13 guru kernel:  [<c013d765>] filemap_nopage+0x29a/0x377
> Feb 14 13:45:13 guru kernel:  [<c014c467>] do_no_page+0x65/0x243
> Feb 14 13:45:13 guru kernel:  [<c014c8ee>] __handle_mm_fault+0x229/0x24d
> Feb 14 13:45:13 guru kernel:  [<c01128ee>] do_page_fault+0x1c2/0x5a5
> Feb 14 13:45:13 guru kernel:  [<c01f8d1e>] __copy_from_user_ll+0x40/0x6c
> Feb 14 13:45:13 guru kernel:  [<c010371c>] apic_timer_interrupt+0x1c/0x24
> Feb 14 13:45:13 guru kernel:  [<c011272c>] do_page_fault+0x0/0x5a5
> Feb 14 13:45:13 guru kernel:  [<c01037e7>] error_code+0x4f/0x54
> Feb 14 13:45:13 guru kernel:  [<c01a83bb>] reiserfs_copy_from_user_to_file_region+0x4d/xd8
> Feb 14 13:45:13 guru kernel:  [<c01a975e>] reiserfs_file_write+0x419/0x6ac
> Feb 14 13:45:13 guru kernel:  [<c01d0efa>] inode_has_perm+0x49/0x6b
> Feb 14 13:45:13 guru kernel:  [<c01f595a>] prio_tree_insert+0xf3/0x187
> Feb 14 13:45:13 guru kernel:  [<c01d3131>] selinux_file_permission+0x113/0x159
> Feb 14 13:45:13 guru kernel:  [<c015abbb>] vfs_write+0xcc/0x18f
> Feb 14 13:45:13 guru kernel:  [<c015ad3d>] sys_write+0x4b/0x74
> Feb 14 13:45:13 guru kernel:  [<c0102cb5>] syscall_call+0x7/0xb
> 
> Obviously this process is non-killable and requires a reboot to get us back
> into working order.
> 

I assume that we're seeing some stack gunk there and it's really stuck in
sync_page()->io_schedule().

This is almost certainly caused by a block layer or (more likely) device
driver bug: we submitted a read I/O, we're waiting for the completion
interrupt and it never came.

Please test vanilla 2.6.16-rc3 and let us know what block device driver is
involved, which I/O scheduler, etc.

Also, try mounting the filesystems with `-o barrier=none'.  That barrier
code seems to have been a source of many problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux