Re: [PATCH] fix kill_proc_info() vs copy_process() race

Paul E. McKenney wrote:
> 
> On Mon, Feb 06, 2006 at 07:45:48PM +0300, Oleg Nesterov wrote:
> > The first race is simple, copy_process:
> > 
> > 	/*
> > 	 * Check for pending SIGKILL! The new thread should not be allowed
> > 	 * to slip out of an OOM kill. (or normal SIGKILL.)
> > 	 */
> > 	if (sigismember(&current->pending.signal, SIGKILL))
> > 		return EINTR;
> > 
> > This relies on tasklist_lock and is racy now.
> 
> Agreed, but is the race any worse than it was before?  Since SIGKILL is
> fatal, the bit can be set but never cleared.  My belief, quite possibly
> mistaken, is that this is a performance issue rather than a correctness
> issue -- we would like to avoid the overhead of a fork() for a "walking
> dead" process.

My apologies, I was very unclear. I talked about this check because I tried
to unify it with 'if (SIGNAL_GROUP_EXIT)' below. Let me try again.

copy_process(CLONE_THREAD)					__group_complete_signal(SIGKILL)

	lock(->sighand);
	if (->signal->flags & SIGNAL_GROUP_EXIT) // NO
		...abort forking...
	unlock(->sighand)

								->signal->flags = SIGNAL_GROUP_EXIT;
								// does not see the new thread yet
								for_each_thread_in_thread_group(t) {
									sigaddset(t->pending, SIGKILL);
									signal_wake_up(t);
								}

	... finish clone ...

The new thread starts without TIF_SIGPENDING. When any of other threads calls
get_signal_to_deliver() it will notice SIGKILL and call do_group_exit(), which
does:

	if (SIGNAL_GROUP_EXIT) // Yes, was set in group_complete_signal()
		// don't call zap_other_threads()
	do_exit();

So, thread group missed SIGKILL. The new thread runs with SIGNAL_GROUP_EXIT set
and has SIGKILL in ->shared_pending, so it can't be killed via sys_kill(SIGKILL),
and it can't be stopped.

This is not fatal, we can kill this thread via tkill(), even if it blocked other
signals, but still this is a bug (if I am right).

> > The second race is more tricky, copy_process:
> > 
> > 	attach_pid(p, PIDTYPE_PID, p->pid);
> > 	attach_pid(p, PIDTYPE_TGID, p->tgid);
> > 
> > This means that we can find a task in kill_proc_info()->find_task_by_pid()
> > which is not registered as part of thread group yet. Various bad things can
> > happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
> > iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
> > 'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
> > we will use completely unreleated (parent's) thread list.
> 
> But I could easily be missing something, still a bit jetlagged.  Could
> you please lay out the exact sequence of events in the scenario that you
> are thinking of?

Let's suppose that process with pid == 1000 does fork (no CLONE_THREAD bit),
and a bad boy does sys_kill(1001, SIGXXX)

copy_process:

	// it is possible that p->pid == 1001
	attach_pid(p, PIDTYPE_PID, p->pid);

						kill_proc_info:

						    p = find_task_by_pid(1001); // Found!

						    __group_complete_signal(p):

						        // iterate over thread group
						        do {
						        	...
						        } while (next_thread(t) != p)

The (one of) problem is that this loop never stops: next_thread() will iterate
over parent's threads list, because p have a copy of the parent's pids[PIDTYPE_TGID],
and p is not a member of this thread group. Unless I missed something, we have
an endless loop with interrupts disabled.

> And if there is a real problem, is it possible to fix it by changing
> the order of the attach_pid() calls?

I think yes, and I did exactly that in my next attempt to fix this problem.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- [PATCH 1/2] fix kill_proc_info() vs CLONE_THREAD race
  - From: Oleg Nesterov <[email protected]>
- [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race
  - From: Oleg Nesterov <[email protected]>

References:
- [PATCH] fix kill_proc_info() vs copy_process() race
  - From: Oleg Nesterov <[email protected]>
- Re: [PATCH] fix kill_proc_info() vs copy_process() race
  - From: "Paul E. McKenney" <[email protected]>

Prev by Date: Re: [RFT/PATCH] 3c509: use proper suspend/resume API
Next by Date: Re: HELP: Problem with radeonfb setting wrong resolution
Previous by thread: Re: [PATCH] fix kill_proc_info() vs copy_process() race
Next by thread: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]