Re: 2.6.14-rc1 wait()/SIG_CHILD bevahiour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2005-09-19 at 16:34 -0700, Roland McGrath wrote:
> The test program is buggy.  Here is one clue:
> 
> 	elm3b29:~ # strace -p 30023
> 	Process 30023 attached - interrupt to quit
> 	futex(0x2aaaaaddf118, FUTEX_WAIT, 2, NULL
> 
> It's not anywhere near wait4.  It's deadlocked in the rand() call inside
> rand_delay, called from sigchld_handler.  You cannot safely call rand
> inside a signal handler, for exactly this reason.  The signal came during
> another rand call and attempted to reenter.  If this sort of deadlock is
> the failure mode of your real-world case, then it is probably an
> application bug.  If this deadlock is just a mistake in your test program
> here, then you'll need to give us a corrected test program to pursue
> whatever real kernel issue you may have.

Thank you for pointing out the problem with the testcase.

While running webserving benchmark (Zeus), we noticed that process is
not cleaning up its zombie childern. We end up with thousands of them
and eventually fork fails. Interesting thing to note in that case was,
moment we do strace -p <parent>, it suddenly seem to cleanup all its
zombie childern. I don't have access to the webserver source code, so we
tried the behaviour simulate it with a simple testcase. 

I will let you know, if I can come up with a *better* testcase to
reproduce the problem.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux