I've noticed that my Fedora systems have recently changed in the way that they deal with dead or dying disks. It used to be the case that if a disk went off-line for any reason, the processes attached to it would die due to I/O errors. This is unfortunate, but otherwise doesn't hobble the rest of the system. Now what is happening is that the processes stick around, and the kernel (i am guessing the journalling system) is stuck waiting for the disk to return. I get kernel messages of the form INFO: task rdiff-backup:19311 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rdiff-backup D de403c5c 0 19311 19239 dd583da8 00200086 000000a2 de403c5c 00000008 c087c67c c087fc00 c087fc00 c087fc00 c1894010 c1894284 c13ebc00 00000000 c13ebc00 c189403c 0000141a dd583d98 c041fbc8 00000000 c1894284 bcba483b 00200246 dd583ddc dd583da8 Call Trace: [<c041fbc8>] ? update_curr+0x8d/0xf0 [<c043f1a0>] ? prepare_to_wait+0x4d/0x54 [<df83a900>] start_this_handle+0x2cc/0x3dd [jbd2] [<c0422b90>] ? dequeue_task_fair+0x3d/0x42 [<c043efa2>] ? autoremove_wake_function+0x0/0x33 [<df83ab7d>] jbd2_journal_start+0x8c/0xb9 [jbd2] [<df895c2f>] ext4_journal_start_sb+0x40/0x42 [ext4] [<df88b7cb>] ext4_da_writepages+0x107/0x2ee [ext4] [<c047684b>] ? pagevec_lookup_tag+0x1c/0x25 [<c04755f5>] ? write_cache_pages+0xfc/0x2ad [<c046f813>] ? find_get_pages_tag+0x2f/0xda [<df88b6c4>] ? ext4_da_writepages+0x0/0x2ee [ext4] [<c04757f0>] do_writepages+0x23/0x34 [<c04ab7e5>] __writeback_single_inode+0x16c/0x2b7 [<c04a37bd>] ? generic_drop_inode+0x67/0x188 [<c04abcab>] generic_sync_sb_inodes+0x202/0x31b [<c04abe32>] sync_inodes_sb+0x6e/0x76 [<c04abe7b>] __sync_inodes+0x41/0x88 [<c04abecf>] sync_inodes+0xd/0x1e [<c04ae547>] do_sync+0x14/0x5a [<c04ae59a>] sys_sync+0xd/0x13 [<c0404c8a>] syscall_call+0x7/0xb ======================= The processes never die, they cannot be killed, and they keep adding to the load average of the system, resulting in a denial-of-service attack. Is there any way to "gracefully" (I know this is a relative term) have the system disconnect from a dead disk? Is there a way to have the kernel kill these hung processes? Thanks! -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines