On Wed, 2006-06-28 at 15:40, Andrew Morton wrote:
> On 28 Jun 2006 13:46:22 +0800
> Zou Nan hai <[email protected]> wrote:
>
> > On Wed, 2006-06-28 at 14:55, Andrew Morton wrote:
> > > On Wed, 28 Jun 2006 08:38:59 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Andrew Morton <[email protected]> wrote:
> > > >
> > > > > > We see system hang in ext3 jbd code
> > > > > > when Linux install program anaconda copying
> > > > > > packages.
> > > > > >
> > > > > > That is because anaconda is invoked from linuxrc
> > > > > > in initrd when system_state is still SYSTEM_BOOTING.
> > > >
> > > > [ argh ...! ]
> > >
> > > That's what I thought ;)
> > >
> > > > > > Thus the cond_resched checks in journal_commit_transaction
> > > > > > will always return 1 without actually schedule,
> > > > > > then the system fall into deadloop.
> > > > >
> > > > > That's a bug in cond_resched().
> > > > >
> > > > > Something like this..
> > > >
> > > > Acked-by: Ingo Molnar <[email protected]>
> > > >
> > >
> > > Thanks. Zou, it'd be great if you could test this in your setup, please.
> > > I've tagged it as 2.6.17.x material.
> >
> > Andrew,
> > I am building the env to test.
> > The patch was my original idea, but I was afraid of breaking any code
> > that rely on the OLD wrong cond_sched semantic.
>
> We prefer the "right" fix, however painful or risky that might be.
>
> > However later I did a
> > grep found that there is very few code that checks the return value of
> > cond_resched. So the patch should be safe.
>
> Hope so.
>
> > However I think cond_resched_lock and cond_resched_softirq also need fix
> > to make the semantic consistent.
> >
> > Please check the following patch.
> >
>
> Ah. I think the return value from these functions should mean "something
> disruptive happened", if you like.
>
> See, the callers of cond_resched_lock() aren't interested in whether
> cond_resched_lock() actually called schedule(). They want to know whether
> cond_resched_lock() dropped the lock. Because if the lock was dropped, the
> caller needs to take some special action, regardless of whether schedule()
> was finally called.
>
> So I think the patch I queued is OK, agree?
I am afraid the code like cond_resched_lock check in
fs/jbd/checkpoint.c log_do_checkpoint may fall into endless retry in
some condition, will it?
Though I have not encountered that.
Zou Nan hai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]