From: Ingo Molnar <[email protected]>
fix potential swap-prefetch deadlock: swapped.lock must be taken
in an irq-safe manner, because it can be taken while holding an
irq-safe lock in the following code sequence:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c0175e7d>] add_to_swapped_list+0x1f/0x13b
[<c016780d>] remove_mapping+0x84/0xc3
[<c0167c47>] shrink_inactive_list+0x3fb/0x705
[<c016800a>] shrink_zone+0xb9/0xd8
[<c01682bc>] kswapd+0x293/0x38a
[<c013f271>] kthread+0xa6/0xd3
found by the locking correctness validator. The full validator
output is:
======================================================
[ BUG: hard-safe -> hard-unsafe lock order detected! ]
------------------------------------------------------
kswapd0/1173 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(swapped.lock){--..}, at: [<c0175e7d>] add_to_swapped_list+0x1f/0x13b
and this task is already holding:
(swapper_space.tree_lock){+...}, at: [<c01677a5>] remove_mapping+0x1c/0xc3
which would create a new lock dependency:
(swapper_space.tree_lock){+...} -> (swapped.lock){--..}
but this new dependency connects a hard-irq-safe lock:
(swapper_space.tree_lock){+...}
... which became hard-irq-safe at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c52a2>] _write_lock_irqsave+0x2a/0x3a
[<c0164c6f>] test_clear_page_writeback+0x48/0xbc
[<c0165a57>] rotate_reclaimable_page+0x8b/0xb4
[<c015e2c3>] end_page_writeback+0x18/0x45
[<c0173153>] end_swap_bio_write+0x28/0x36
[<c018549a>] bio_endio+0x66/0x6e
[<c0532613>] __end_that_request_first+0x23c/0x559
[<c0532945>] end_that_request_first+0xb/0xd
[<c09cae20>] ide_end_request+0xa8/0xf1
[<c09d1c44>] ide_dma_intr+0x54/0x8f
[<c09cc011>] ide_intr+0x147/0x1a7
[<c015adc7>] handle_IRQ_event+0x1f/0x4f
[<c015ae9d>] __do_IRQ+0xa6/0x111
[<c0106287>] do_IRQ+0x8c/0xad
to a hard-irq-unsafe lock:
(swapped.lock){--..}
... which became hard-irq-unsafe at:
... [<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c01763fc>] kprefetchd+0x3eb/0x40b
[<c013f271>] kthread+0xa6/0xd3
[<c0102005>] kernel_thread_helper+0x5/0xb
which could potentially lead to deadlocks!
other info that might help us debug this:
1 locks held by kswapd0/1173:
#0: (swapper_space.tree_lock){+...}, at: [<c01677a5>] remove_mapping+0x1c/0xc3
the hard-irq-safe lock's dependencies:
-> (swapper_space.tree_lock){+...} ops: 121 {
initial-use at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c505e>] _write_lock_irq+0x27/0x35
[<c017341a>] __add_to_swap_cache+0x65/0x149
[<c0173512>] move_to_swap_cache+0x14/0x62
[<c0178af8>] shmem_writepage+0x10c/0x1b5
[<c01672e9>] pageout+0x119/0x1cc
[<c0167b73>] shrink_inactive_list+0x327/0x705
[<c016800a>] shrink_zone+0xb9/0xd8
[<c01684f5>] try_to_free_pages+0x142/0x221
[<c0163e36>] __alloc_pages+0x1d9/0x325
[<c015f919>] generic_file_buffered_write+0x189/0x5cd
[<c0160f8d>] __generic_file_aio_write_nolock+0x450/0x48d
[<c01611fa>] generic_file_aio_write+0x6e/0xc1
[<c02ab0c2>] ext3_file_write+0x1a/0x8b
[<c017fe22>] do_sync_write+0xb1/0xe6
[<c01807da>] vfs_write+0xcd/0x176
[<c0180e7d>] sys_write+0x3b/0x71
[<c10c5abb>] syscall_call+0x7/0xb
in-hardirq-W at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c52a2>] _write_lock_irqsave+0x2a/0x3a
[<c0164c6f>] test_clear_page_writeback+0x48/0xbc
[<c0165a57>] rotate_reclaimable_page+0x8b/0xb4
[<c015e2c3>] end_page_writeback+0x18/0x45
[<c0173153>] end_swap_bio_write+0x28/0x36
[<c018549a>] bio_endio+0x66/0x6e
[<c0532613>] __end_that_request_first+0x23c/0x559
[<c0532945>] end_that_request_first+0xb/0xd
[<c09cae20>] ide_end_request+0xa8/0xf1
[<c09d1c44>] ide_dma_intr+0x54/0x8f
[<c09cc011>] ide_intr+0x147/0x1a7
[<c015adc7>] handle_IRQ_event+0x1f/0x4f
[<c015ae9d>] __do_IRQ+0xa6/0x111
[<c0106287>] do_IRQ+0x8c/0xad
}
... key at: [<c14d4e64>] swapper_space+0x24/0xfc
the hard-irq-unsafe lock's dependencies:
-> (swapped.lock){--..} ops: 2 {
initial-use at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c01763fc>] kprefetchd+0x3eb/0x40b
[<c013f271>] kthread+0xa6/0xd3
[<c0102005>] kernel_thread_helper+0x5/0xb
softirq-on-W at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c01763fc>] kprefetchd+0x3eb/0x40b
[<c013f271>] kthread+0xa6/0xd3
[<c0102005>] kernel_thread_helper+0x5/0xb
hardirq-on-W at:
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c01763fc>] kprefetchd+0x3eb/0x40b
[<c013f271>] kthread+0xa6/0xd3
[<c0102005>] kernel_thread_helper+0x5/0xb
}
... key at: [<c14d5658>] swapped+0x18/0x60
stack backtrace:
[<c0104e7f>] show_trace+0xd/0xf
[<c0104e96>] dump_stack+0x15/0x17
[<c0144cd3>] check_usage+0x1f4/0x201
[<c0146238>] __lockdep_acquire+0x873/0xa40
[<c014646e>] lockdep_acquire+0x69/0x82
[<c10c4fc1>] _spin_lock+0x21/0x2f
[<c0175e7d>] add_to_swapped_list+0x1f/0x13b
[<c016780d>] remove_mapping+0x84/0xc3
[<c0167c47>] shrink_inactive_list+0x3fb/0x705
[<c016800a>] shrink_zone+0xb9/0xd8
[<c01682bc>] kswapd+0x293/0x38a
[<c013f271>] kthread+0xa6/0xd3
[<c0102005>] kernel_thread_helper+0x5/0xb
Signed-off-by: Ingo Molnar <[email protected]>
----
---
mm/swap_prefetch.c | 17 +++++++++--------
1 files changed, 9 insertions(+), 8 deletions(-)
Index: linux/mm/swap_prefetch.c
===================================================================
--- linux.orig/mm/swap_prefetch.c
+++ linux/mm/swap_prefetch.c
@@ -65,7 +65,7 @@ inline void delay_swap_prefetch(void)
void add_to_swapped_list(struct page *page)
{
struct swapped_entry *entry;
- unsigned long index;
+ unsigned long index, flags;
int wakeup;
if (!swap_prefetch)
@@ -73,7 +73,7 @@ void add_to_swapped_list(struct page *pa
wakeup = 0;
- spin_lock(&swapped.lock);
+ spin_lock_irqsave(&swapped.lock, flags);
if (swapped.count >= swapped.maxcount) {
/*
* We limit the number of entries to 2/3 of physical ram.
@@ -112,7 +112,7 @@ void add_to_swapped_list(struct page *pa
}
out_locked:
- spin_unlock(&swapped.lock);
+ spin_unlock_irqrestore(&swapped.lock, flags);
/* Do the wakeup outside the lock to shorten lock hold time. */
if (wakeup)
@@ -433,6 +433,7 @@ static enum trickle_return trickle_swap(
{
enum trickle_return ret = TRICKLE_DELAY;
struct swapped_entry *entry;
+ unsigned long flags;
/*
* If laptop_mode is enabled don't prefetch to avoid hard drives
@@ -451,10 +452,10 @@ static enum trickle_return trickle_swap(
if (!prefetch_suitable())
break;
- spin_lock(&swapped.lock);
+ spin_lock_irqsave(&swapped.lock, flags);
if (list_empty(&swapped.list)) {
ret = TRICKLE_FAILED;
- spin_unlock(&swapped.lock);
+ spin_unlock_irqrestore(&swapped.lock, flags);
break;
}
@@ -473,7 +474,7 @@ static enum trickle_return trickle_swap(
* be a reason we could not swap them back in so
* delay attempting further prefetching.
*/
- spin_unlock(&swapped.lock);
+ spin_unlock_irqrestore(&swapped.lock, flags);
break;
}
@@ -484,12 +485,12 @@ static enum trickle_return trickle_swap(
* not suitable for prefetching so skip it.
*/
entry = prev_swapped_entry(entry);
- spin_unlock(&swapped.lock);
+ spin_unlock_irqrestore(&swapped.lock, flags);
continue;
}
swp_entry = entry->swp_entry;
entry = prev_swapped_entry(entry);
- spin_unlock(&swapped.lock);
+ spin_unlock_irqrestore(&swapped.lock, flags);
if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]