Re: [PATCH] proc: readdir race fix (take 3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jean Delvare <[email protected]> writes:

> On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
>> Eric, Kame, thanks a lot for working on this. I'll be giving some good
>> testing to this patch today, and will return back to you when I'm done.
>
> The original issue is indeed fixed, but there's a problem with the patch. 
> When stressing /proc (to verify the bug was fixed), my test machine ended 
> up crashing. Here are the 2 traces I found in the logs:

Ugh.  

So the death in __put_task_struct() is from:
WARN_ON(!(tsk->exit_state & (EXIT_DEAD | EXIT_ZOMBIE)));
So it appears we have something that is decrementing but not
incrementing the count on the task struct.

Now what is interesting is that there are a couple of other failure modes
present here.
free_uid called from __put_task_struct is failing


And you seem to have a recursive page fault going on somewhere.

I suspect the triggering of this bug is the result of an earlier oops,
that left some process half cleaned up.

Have you tested 2.6.18-rc6 without my patch?
If not can you please test the same 2.6.18-rc6 configuration with my patch?

> Sometimes the machine just hung, with nothing in the logs. The machine is 
> a Sony laptop (i686).
>
> I have been testing the patch on another machine (x86_64) and had no 
> problem at all, so the reproduceability of the bug might depend on the 
> arch or some config option. I'll help nailing down this issue if I can, 
> just tell me what to do.

So I don't know what is going on with your laptop.  It feels nasty.

I think my patch is just tripping on the problem, rather than causing
it.  The previous version of fs/proc/base.c should have tripped over
this problem as well if it happened to have hit the same process.

I'm staring at the patch and I can not think of anything that would
explain your problem.  The reference counting is simple and the only
bug I had in a posted version was a failure to decrement the count
on the task_struct.

I guess the practical question is what was your test methodology to
reproduce this problem?  A couple of more people running the same
test on a few more machines might at least give us confidence in what
is going on.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux