Paul E. McKenney wrote:
>
> On Mon, Feb 06, 2006 at 07:45:48PM +0300, Oleg Nesterov wrote:
> > The first race is simple, copy_process:
> >
> > /*
> > * Check for pending SIGKILL! The new thread should not be allowed
> > * to slip out of an OOM kill. (or normal SIGKILL.)
> > */
> > if (sigismember(¤t->pending.signal, SIGKILL))
> > return EINTR;
> >
> > This relies on tasklist_lock and is racy now.
>
> Agreed, but is the race any worse than it was before? Since SIGKILL is
> fatal, the bit can be set but never cleared. My belief, quite possibly
> mistaken, is that this is a performance issue rather than a correctness
> issue -- we would like to avoid the overhead of a fork() for a "walking
> dead" process.
My apologies, I was very unclear. I talked about this check because I tried
to unify it with 'if (SIGNAL_GROUP_EXIT)' below. Let me try again.
copy_process(CLONE_THREAD) __group_complete_signal(SIGKILL)
lock(->sighand);
if (->signal->flags & SIGNAL_GROUP_EXIT) // NO
...abort forking...
unlock(->sighand)
->signal->flags = SIGNAL_GROUP_EXIT;
// does not see the new thread yet
for_each_thread_in_thread_group(t) {
sigaddset(t->pending, SIGKILL);
signal_wake_up(t);
}
... finish clone ...
The new thread starts without TIF_SIGPENDING. When any of other threads calls
get_signal_to_deliver() it will notice SIGKILL and call do_group_exit(), which
does:
if (SIGNAL_GROUP_EXIT) // Yes, was set in group_complete_signal()
// don't call zap_other_threads()
do_exit();
So, thread group missed SIGKILL. The new thread runs with SIGNAL_GROUP_EXIT set
and has SIGKILL in ->shared_pending, so it can't be killed via sys_kill(SIGKILL),
and it can't be stopped.
This is not fatal, we can kill this thread via tkill(), even if it blocked other
signals, but still this is a bug (if I am right).
> > The second race is more tricky, copy_process:
> >
> > attach_pid(p, PIDTYPE_PID, p->pid);
> > attach_pid(p, PIDTYPE_TGID, p->tgid);
> >
> > This means that we can find a task in kill_proc_info()->find_task_by_pid()
> > which is not registered as part of thread group yet. Various bad things can
> > happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
> > iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
> > 'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
> > we will use completely unreleated (parent's) thread list.
>
> But I could easily be missing something, still a bit jetlagged. Could
> you please lay out the exact sequence of events in the scenario that you
> are thinking of?
Let's suppose that process with pid == 1000 does fork (no CLONE_THREAD bit),
and a bad boy does sys_kill(1001, SIGXXX)
copy_process:
// it is possible that p->pid == 1001
attach_pid(p, PIDTYPE_PID, p->pid);
kill_proc_info:
p = find_task_by_pid(1001); // Found!
__group_complete_signal(p):
// iterate over thread group
do {
...
} while (next_thread(t) != p)
The (one of) problem is that this loop never stops: next_thread() will iterate
over parent's threads list, because p have a copy of the parent's pids[PIDTYPE_TGID],
and p is not a member of this thread group. Unless I missed something, we have
an endless loop with interrupts disabled.
> And if there is a real problem, is it possible to fix it by changing
> the order of the attach_pid() calls?
I think yes, and I did exactly that in my next attempt to fix this problem.
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]