>>>>> On Sun, 10 Apr 2005 08:43:24 +0200, Ingo Molnar <[email protected]> said:
Ingo> * David S. Miller <[email protected]> wrote:
>> > Yes, of course. The deadlock was due to context-switching, not
>> > switch_mm() per se. Hopefully someone else beats me to
>> remembering > the details before Monday.
>> Sparc64 has a deadlock because we hold mm->page_table_lock during
>> switch_mm(). I bet IA64 did something similar, as I remember it
>> had a very similar locking issue in this area.
>> So the deadlock was, we held the runqueue locks over switch_mm(),
>> switch_mm() spins on mm->page_table_lock, the cpu which does have
>> mm-> page_table_lock tries to do a wakeup on the first cpu's
>> mm-> runqueue.
>> Classic AB-BA deadlock.
Ingo> yeah, i can see that happening - holding the runqueue lock and
Ingo> enabling interrupts. (it's basically never safe to enable irqs
Ingo> with the runqueue lock held.)
Ingo> the patch drops both the runqueue lock and enables interrupts,
Ingo> so this particular issue should not trigger.
I had to refresh my memory with a quick Google search that netted [1]
(look for "Disable interrupts during context switch"). Actually, it
wasn't really a deadlock, but rather a livelock, since a CPU got stuck
on an infinite page-not-present loop.
Fundamentally, the issue was that doing the switch_mm() and
switch_to() with interrupts enabled opened a window during which you
could get a call to flush_tlb_mm() (as a result of an IPI). This, in
turn, could end up activating the wrong MMU-context, since the action
of flush_tlb_mm() depends on the value of current->active_mm. The
problematic sequence was:
1) schedule() calls switch_mm() which calls activate_context() to
switch to the new address-space
2) IPI comes in and flush_tlb_mm(mm) gets called
3) "current" still points to old task and if "current->active_mm == mm",
activate_mm() is called for the old address-space, resetting the
address-space back to that of the old task
Now, Ingo says that the order is reversed with his patch, i.e.,
switch_mm() happens after switch_to(). That means flush_tlb_mm() may
now see a current->active_mm which hasn't really been activated yet.
That should be OK since it would just mean that we'd do an early (and
duplicate) activate_context(). While it does not give me a warm and
fuzzy feeling to have this inconsistent state be observable by
interrupt-handlers (and, in particular, IPI-handlers), I don't see any
problem with it off hand.
--david
[1] http://www.gelato.unsw.edu.au/linux-ia64/0307/6109.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]