The story starts with...
We've been seeing some occasional reports of -ERESTARTSYS and friends
being returned to userland from syscalls like accept(). After some
digging, we beleive that the signal code is inherently racy vs. the
way one task (or interrupt) can cause another task TIG_SIGPENDING
to be cleared.
(And while looking at the code, found more potential issues with
signalfd, see later for that one)
The circumstances are a bit hard to trigger (and thus we haven't yet
made a proper repro-case, the field problem happens every now and then),
but basically, the idea is something around the lines of:
- thread A does accept() or select() or some other syscall that can
restart upon reception of a signal. It also has signal S1 unblocked
and signal S2 blocked.
- Something sends S1 to the process (shared signal), thread A is the
selected victim from the signal code and gets TIF_SIGPENDING set.
- thread A decides to return -ERESTARTSYS. On the way to userland, to
make the race more likely, we also end up rescheduling, thus we
call schedule() before we actually call do_signal().
- thread B has TIF_SIGPENDING set (for some other signal) and ends up
handling and dequeuing both that other signal and S1, thus there is
no more shared signal pending
- something calls recalc_sigpending_tsk() on thread A (for example,
something try to sends it S2 which is blocked). There is no longer
an active signal and thus TIF_SIGPENDING is cleared on thread A
- thread A gets back out of schedule(), there is no more TIF_SIGPENDING,
thus it goes back to userland returning -ERESTARTSYS and never restarts
the syscall
There might be variations on the above scenario. We beleive that the
base problem is that the only one who should ever be able to clear
TIF_SIGPENDING is the task who has it set, nobody else, since it might
have been "seen" by that task long enough to make decisions such as
returning -ERESTARTSYS, at which point it must not be cleared until
do_signal() is called.
In fact, it makes a lot of sense to only ever clear that flag on yourself
anyway.
The patch below does that by making recalc_sigpending() be the only one to
clear it, not recalc_sigpending_tsk(). I also added a WARN_ON in case
we ever call recalc_sigpending() without the siglock since that would mean
a race causing a potential loss of TIF_SIGPENDING.
Note that the code in dequeue_signal() is a bit uglier than I would have
liked. I would have much preferred making dequeue_signal() not take a task
argument anymore and just always operate on "current" but that's being
defeated by signalfd, which seems to be only one to potentially dequeue
signals off another task. So I leave it alone for now.
However, I find that very dodgy since dequeue_signal() calls
__dequeue_signal() which does things vs. a notifier which I don't quite
know what it's for, but it's definitely not going to fly if dequeue_signal()
is called for somebody else than current... and potentially cause another
spurrious clearing of TIF_SIGPENDING.
Thus, I think that signalfd probably needs fixing to only ever allow
read() to be called from the task that created the fd in the first place
thus only allowing a thread to dequeue itself.
Any better idea ? Comments ?
(Patch built & booted on a pSeries box)
Cheers,
Ben.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Index: linux-work/kernel/signal.c
===================================================================
--- linux-work.orig/kernel/signal.c 2007-06-05 10:56:53.000000000 +1000
+++ linux-work/kernel/signal.c 2007-06-05 11:15:14.000000000 +1000
@@ -105,7 +105,6 @@ static int recalc_sigpending_tsk(struct
set_tsk_thread_flag(t, TIF_SIGPENDING);
return 1;
}
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
return 0;
}
@@ -119,9 +118,22 @@ void recalc_sigpending_and_wake(struct t
signal_wake_up(t, 0);
}
+/*
+ * Only when recalculating on yourself are you allowed to clear TIF_SIGPENDING
+ * never when recalculating somebody else. If you were to allow it, then
+ * a syscall could see signal_pending(), decide to return -ERESTARTSYS, and
+ * lose TIF_SIGPENDING before actually hitting do_signal(), thus effectively
+ * returning -ERESTARTSYS to userland and failing to restart the syscall.
+ */
void recalc_sigpending(void)
{
- recalc_sigpending_tsk(current);
+ /* This must be called with the siglock or we might lose
+ * TIF_SIGPENDING
+ */
+ WARN_ON(!spin_is_locked(¤t->sighand->siglock));
+
+ if (!recalc_sigpending_tsk(current))
+ clear_thread_flag(TIF_SIGPENDING);
}
/* Given the mask, find the first available signal that should be serviced. */
@@ -385,7 +397,18 @@ int dequeue_signal(struct task_struct *t
}
}
}
- recalc_sigpending_tsk(tsk);
+ /* We need to call recalc_sigpending() when recalculating ourselves
+ * so that TIF_SIGPENDING is cleated when appropriate. The only reason
+ * we actually have to compare against current rather than
+ * unconditionally doing recalc_sigpending() is the same that prevents
+ * me from removing the task argument from dequeue_signal() and is
+ * signalfd which can dequeue signals of another task (yuck)
+ */
+ if (tsk == current)
+ recalc_sigpending();
+ else
+ recalc_sigpending_tsk(tsk);
+
if (signr && unlikely(sig_kernel_stop(signr))) {
/*
* Set a marker that we have dequeued a stop signal. Our
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]