Jean Delvare <[email protected]> writes:
> On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
>> Eric, Kame, thanks a lot for working on this. I'll be giving some good
>> testing to this patch today, and will return back to you when I'm done.
>
> The original issue is indeed fixed, but there's a problem with the patch.
> When stressing /proc (to verify the bug was fixed), my test machine ended
> up crashing. Here are the 2 traces I found in the logs:
Ugh.
So the death in __put_task_struct() is from:
WARN_ON(!(tsk->exit_state & (EXIT_DEAD | EXIT_ZOMBIE)));
So it appears we have something that is decrementing but not
incrementing the count on the task struct.
Now what is interesting is that there are a couple of other failure modes
present here.
free_uid called from __put_task_struct is failing
And you seem to have a recursive page fault going on somewhere.
I suspect the triggering of this bug is the result of an earlier oops,
that left some process half cleaned up.
Have you tested 2.6.18-rc6 without my patch?
If not can you please test the same 2.6.18-rc6 configuration with my patch?
> Sometimes the machine just hung, with nothing in the logs. The machine is
> a Sony laptop (i686).
>
> I have been testing the patch on another machine (x86_64) and had no
> problem at all, so the reproduceability of the bug might depend on the
> arch or some config option. I'll help nailing down this issue if I can,
> just tell me what to do.
So I don't know what is going on with your laptop. It feels nasty.
I think my patch is just tripping on the problem, rather than causing
it. The previous version of fs/proc/base.c should have tripped over
this problem as well if it happened to have hit the same process.
I'm staring at the patch and I can not think of anything that would
explain your problem. The reference counting is simple and the only
bug I had in a posted version was a failure to decrement the count
on the task_struct.
I guess the practical question is what was your test methodology to
reproduce this problem? A couple of more people running the same
test on a few more machines might at least give us confidence in what
is going on.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]